Thursday, April 8, 2010

Preserving Our Digital Culture

server room Web 2.0’s defining characteristic is user-generated content. Blog posts are published, Wikipedia pages are edited, and memes are born and die every second of every minute of every day. It goes without saying that this makes for vast quantities of data – a year ago, the amount of data stored on the entire Internet approached 500 billion gigabytes, and it has surely rocketed past that number in the time since.

500,000,000,000 gigabytes – for reference, the largest iPod Apple sells holds a mere 160. This astronomical figure encompasses everything from our emails to our ridiculous captioned pictures of cats, our writing and our movies and our music and our art – it is, in many ways, our generation’s cultural movement.

Here’s the thing, though: unlike many cultural milestones, the Internet in its entirety does not exist in physical form. Sure, there are the rare cases where something becomes a book or makes it onto a DVD, but most of the things that you or I read in a normal day on the interwebs exist only as ones and zeroes on a magnetic platter somewhere, easily stored but also easily lost. Keeping all of this stuff around for posterity quickly becomes problematic.

Forgetting our past

They say that the Internet never forgets, but those people are actually forgetting a fair amount of stuff themselves – as a case in point, recall last year’s shutdown of Geocities, which took a fair amount of orphaned Web 1.0 content with it.

The ever-changing nature of well-maintained Web pages mean that old designs and content are often thrown under the bus in favor of the newer-and-better, and you can take our September redesign as a case and point. One of the reasons why the loss of Geocities got the entire Internet all misty-eyed is because so many of those sites had fallen into disrepair, having gone without updates for years. These pages were relics in the truest sense of the word – serving little practical purpose in the here and now, but a very useful tool for gauging how far the Internet has come since its inception.

One could argue that the data lost when sites like those on Geocities go under is inconsequential, a tangled mess of embedded MIDIs and under construction GIFs. Past is prologue, however, and history is important not just because it shows us where we’ve been but also because it shows us where not to go in the future.

Valiant efforts

That’s not to say that no one’s doing anything to preserve our digital heritage – people like the Archive Team and (where After the Jump is hosted) have made it their job to preserve as much of the Internet’s stray content as possible. The latter site actually takes snapshots of sites for storage on the Internet Wayback Machine – for a real trip to the past, check out what the Google homepage looked like ten years ago.

The problem with these well-meaning efforts is that they’re fighting a steep uphill battle. They work with limited server space, a shoestring budget, and are staffed mostly by volunteers – they can’t possibly expect to save everything, and even the stuff that they “save” is not necessarily safe.

Unstoppable change

This is because “saving” data on the Internet these days mostly involves moving it from one server to another – more a postponement than a true stay of execution. This is a problem that digital archivists are still trying to overcome. Hard drives eventually fail. Optical media like DVDs last longer when stored properly, but they’ll eventually become unreadable anyway. To be preserved, the contents of these servers need to be moved and backed up regularly, and if members of any of these Internet archive projects decide to shut down their sites, the data is just as in danger of deletion as it was before.

The other battle to fight is that of format. Not only does data need to be moved because technology inevitably fails, but it needs to be moved because inertia is foreign to technology. Today’s must-have, cutting edge storage device is tomorrow’s Iomega Zip disk. A little over a year ago, someone at my day job brought to us a case of 5.25” floppy disks saying that he needed some data recovered from them. It took us months to dig up something that was both old enough to read the disks and also, you know, functional. Information kept safe on obsolete media is just as useless as information deleted forever.

What can we do?

Greater minds than mine have been pondering this problem. The aforementioned archival projects are making what headway they can, and some others are trying to create media that lasts longer – you know, like DVDs that last 1,000 years. Others take matters into their own hands, diligently backing up their personal data and correspondence, though I can say from experience that such responsible people are definitely in the minority.

Still, it’s not likely that we’ll be able to jump down the intertubes with our children or their children and show them exactly what the Internet was like on April 8, 2010, or explain to them why Hitler videos and captioned pictures occupied so much of our time. All we can do is make sure that we save just enough of our digital past to keep animated GIFs off our Web pages forever.