1. Skip to content
  2. Skip to main menu
  3. Skip to more DW sites

Lost in space

June 9, 2009

With computer data moving and changing ever more quickly, the phrase "digital preservation" is almost an oxymoron. Modern historians are figuring out how to cope in the digital "Dark Ages."

CD and data-bytes images
With so much ephemeral information out there, how much can be saved?Image: Stadtarchiv Köln

It might not seem obvious, but the concept of "historical records" is a transmutable one.

Take the Domesday Book, written on sheepskin back in 1086. That historical record has out-lasted government records today by a very long way; more than a 1,000 years after it was written, we can still go and see it in its original form.

Yet modern digital government records from only 10 years ago can be all but unreadable, because in terms of the computing world, a decade is a very long time.

Outlasting sheepskin?

Decision-makers and average folks alike used to communicate by letter, leaving a paper trail for historians to follow far, far down the road. Now, with e-mails and text messages moving around the globe at lightning speed - and often being archived badly, if at all - some historians are worried that the present will be hard to read for historians of the future.

Simon Tanner, the director of digital consultancy from the Kings College of London, said the problem is "a bit like climate change: we know it's bad but we haven't quite seen the emergency problems coming up yet. And by the time someone gets round to doing something it will be too late and we will have lost a lot of data."

Tanner called this era "the digital Dark Age," and he worries that future generations won't be able to glean much information about the most recent decade "because so much more of the communication of cultural meaning and technical information has been made in electronic form… but then it was ephemeral and was lost and died very quickly because we didn't preserve it."

British Library's archive attempts

In 2003, in one of the British Parliament's most famous speeches, British politician Robin Cook resigned from parliament in a gesture of opposition to the British entry into the Iraq war. Cook died unexpectedly two years later - and the British Library archived his Web site.

"That Web site no longer exists, if we hadn't done that nobody would have access to the information, the photographs, the interviews on that site,'" explained Stephen Bury, who is in charge of the British Library's Web archiving program.

Legal problems key issue

There are 7.2 million UK domain Web sites, according to Bury. The library is "selectively" collecting around 7,000 of them to archive.

"That's a very small proportion and we're very conscious that we're losing things," he said.

The stumbling blocks are only partly technical. More problematic is the legal aspect of the problem. One of the biggest issues is how to obtain permission from people responsible for the site they want to archive, Bury said.

workmen standing in front of collapsed building
The Cologne Archives collapse raised the question of digital archivingImage: AP

For one thing, a lot of the intellectual property regulations and legislative frameworks are focused on protecting media publishing and the music industry, experts say.

Laissez-faire attitude

"Sometimes that can be at the expense of a memory organization being able to collect and preserve society's basic information about what people are doing and how they did it and what they were thinking at any given time," Tanner said.

For his part, Bury has complained that some governments - including Britain's - are not giving the issue a high enough priority.

In France, which has a different tradition to state interference, there is a regulation stating that every Web site in the French domain has to indexed by the French National Library's crawler.

At the British Library, however, the first step is choosing which sites the library wants to archive. If permission to crawl the site is granted, a Web curator tool is used to take occasional "snapshots." The frequency of these snapshots is adjustable.

"We could do it if need be every day. On average we're doing three or four instances a year of a site," Bury said.

Losing information

One problem, then is that the information in between snapshots gets lost. The constantly changing nature of data on the Internet means that "there's issues of how do you get a very wide coverage and how do you get a very deep coverage," said Bury. "It's likely that for the foreseeable future we will just be getting a partial view."

What's more, the archivists are aware that the technologies and methods they are currently using to hold on to data could at some point be made obsolete by new technological developments.

Which means they are not exactly confident that the systems being developed now will still be valid in a decade or two.

"It's a technological challenge going along and I think in 1996 we wouldn't have envisaged sort of Web 2.0 and that type of approach, blogs and wikis, YouTube, and whatever," said Bury.

Flux is expected

As for the Digital Consultancy's Tanner, he said he believes continuous action will probably be needed to stay abreast of changes.

"I think that we're seeing slightly more stability ... in the way that people are using digital information," he said. Gone are the days when people needed to change their computers every six months in order to get it to work with other computers.

"Now there are people surfing the Internet on computers they bought in the 1990s," he said. In terms of changing file formats or new software versions, the time frame has moved from six months to three- and five-year spans.

"We need to get over the issue that digital is significantly different from paper," he said. "It is a carrier of human knowledge and as such memory organisations should have the ability to store and preserve that for future access."

Author: Naomi Fowler (Jennifer Abramsohn)

Editor: Sean Sinico