Century-Scale Storage
December 17, 2024
This piece looks at a single question. If you, right now, had the goal of digitally storing something for 100 years, how should you even begin to think about making that happen? How should the bits in your stewardship be stored with such a target in mind? How do our methods and platforms look when considered under the harsh unknowns of a century?
Source: Century-Scale Storage
I’ve had the privilege of being involved in two projects at CERN focussed on the early history of the Web.
In 2013, a group of folks were invited to Geneva to work on a history of the line mode browser.
This was not the first, but the second web browser, designed for the then widely used (particularly in universities and research institutions, the original focus on the Web) mainframe and mini computer terminals.
Such devices were line mode devices–they displayed their output in monochrome, line by line.
When I started university in the mid 1980s, these were how we did our assignments, 120 terminals connected to a single mini computer (a Vax/VMS from memory) which had a recommended number of 4 simultaneous users and a maximum of 20.
At CERN someone had got access to on old IBM terminal. It even ran. But there was no way in my recollection we could get it connected to a network.
We did manage to access the source code of the line mode browser (then again that may have been a different source) and to load web pages (I have a dim memory of typing HTML into whatever text editor it had and saving and loading that).
That computer and code were around 20 years old then. And yet nearly a dozen intelligent motivated people struggled to even it.
In 2019 most of the same group returned to tackle the very first browser.
Where line ,ode devices seem from another era, Next in a way ushers in modern computing. Next was founded by SteveJobs after his ouster from Apple in the late 1980s. It was a Unix based system with a graphical user interface. It introduced a framework, NextStep, and programming language, Objective-C, that until relatively recently were still the heart of programming for the Mac OS and iOS.
We were kindly lent a Next device like Tim Berners Lee would have used. It had Ethernet (but its I/O was a relatively unique removable optical drive).
And while it had the same RJ45 type ethernet physical interface that most folks will be familiar with, it took as a day or two to even get it connected to the CERN network. The early WWW browser running on it sent pre HTTP/1 headers, and so most servers would not respond even when we got networking and DNS working on it.
In the end Remy Sharp wrote a proxy server we got it to call that would take its HTTP request, convert it to modern HTTP, send the request onwards then receive it, and pass it back to the Next device in a format it could understand.
So we got a nearly 30 year old piece of software running on a nearly 30 year old device. Barely.
All this came to mind as I read this great piece by the Harvard Law School’s Library innovation lab. We have (fragments) of writing (so we’ve never even deciphered) from several thousand years ago. On clay, stone, papyrus. But os much of the knowledge we have from antiquity are copies of translations of copies–so much of the foundational text of what you might call ‘Western Civilisation’, the greek philosophers and playwrights, come to us from Medieval transitions into Latin from Arabic sources that were translated from lost manuscripts centuries before.
We have a tiny fraction of even humanity’s most significant writing, and almost nothing of the everyday lived experience of almost anyone.
So you might think that with our lives digitised–photos and videos from birth, daily, hourly thoughts posted to friends privately or publicly on social media, movies and music, and television and writing all digital, that all this will be available 1,000 years hence (provide there are any humans around to read, listen to or view it).
But my experience of those projects at CERN, and far more importantly pieces like this suggest it is a much, much much more challenging project.