Welcome to WebmasterWorld Guest from 184.108.40.206
The growing problem of accessing old digital file formats is a "ticking time bomb", the chief executive of the UK National Archives has warned.
Natalie Ceeney said society faced the possibility of "losing years of critical knowledge" because modern PCs could not always open old file formats.
She was speaking at the launch of a partnership with Microsoft to ensure the Archives could read old formats.
Digital Data, a 'Ticking Time Bomb' [news.bbc.co.uk]
We are all experiencing these issues every time a program is upgraded, or even a technology change with HD DVD and BluRay.
What's going to happen to the digital data I have on my DVDs - who really knows. I'm in the middle of converting analogue to digital right now.
Sure, that takes time and money, but nobody said digital archiving wasn't going to have any of the preservation costs associated with traditional media.
As an example, if I now had anything in Word Perfect, I'd be having it converted to something else now, before it's too late.
The problems we see here are entirely due to archiving projects getting started without the budget required to maintain the archive in the future.
From a historian's point of view, the main disadvantage of digital storage is not just the software issue, but the stability of the media themselves. I can walk into an archive and sit down with a 500 year-old manuscript and, with a bit of training in paleography and a good knowledge of archaic language, read it as effectively as someone could have when it was, say, 10 years old.
But lets say I have a perfectly good computer that's 500 years old. I have all the software and hardware to run it. But the optical media have oxidized and the magnetic media has bled and de-magnetized. The data is gone.
To truly preserve digital data on magnetic and optical storage, there needs to be a constant process of refreshing the media. Forget that this might not happen because of laziness. Over time, governments fall and civilizations collapse. 1000 years hence, good rag paper with pigment-based inks will still be readable as long as the archives don't burn.
Of course, over the long term, there are other challenges, such as the evolution of language, for example, but the pace of linguistic change has slowed down a lot, at least in Europe.
Check out a book called "Clock of the Long Now" by Stewart Brand (the Whole Earth Catalog guy). One of the main themes is the question of how a culture can preserve information.
Or get destroyed by a war, flood, hurricane/tornado, earthquake, volcanic eruption, glaciation, meteorite strike, hungry worms, insects, rodents, etc.
It seems to me that the argument is mis-targeted; The challenge for archivists now and those in the future is determining what is important and should be preserved. If we do a too-thorough job of archiving everything, the challenge in the future will be to sort through it all, and separate the momentous from the trivial...
The perspective of their argument is twisted; We have access today to only a tiny amount of recorded information from the past. For example, it has been said that the sacking of the library at Alexandria set civilization back 1000 years. Old books mildew and rot -- or even worse, get used for palimpsests. Obviously, the archivists feel compelled to preserve absolutely everything and are worried about losing anything. But how much of it will truly be important to future generations?
The one bright spot here is that if they can get the information on-line, then (if allowed) it will replicate across space, time, and generations of hardware and file formats, making loss due to any one event less likely.
Today computers basically work the same but to get smaller, cheaper and faster there are no discrete circuits that can be fixed and removing a surface mounted chip from a modern board will virtually destroy it, it's unmaintainable. So now you can basically replace the boards assuming that particular board still exists.
So forget the media, there won't be a maintainable machine to read the media!
So assuming after a civilization collapse of some sort what many are missing is the fact that 100 years later even *IF* the media is still intact, the very ability to read a CD with a laser may be lost, assuming people even know what a laser is in the first place.
Maybe you find one CD-ROM drive still operational, what is the data stored on the CD-ROM?
Anyone know how to decode an MP3 or JPEG off the top of their heads?
Of course all the documentation on how to decode MP3s and JPEGs is stored in a PDF file on the dual-sided DVD backup. Assuming you could read a dual-sided DVD or interpret a PDF file which has visual presentations on the process in JPEG format inside the PDF describing how to decode the JPEG in the first place...
Let's face it, if they ever drop the bomb we'll all be reduced to driving the few '60s era cars remaining and using the few Apple 2's still operational because it's about the only thing we can maintain on our own by scavenging for parts.
The bottom line is we'll lose far more than just a few databases as all of our music and movies will also be lost, not to mention family photos.
An entire generation of creation that pales the works lost in the great library of Alexandria: POOF!
Another important aspect is DRM - which adds locks on a file which don't magically expire once the copyright ends, and which use obfuscation and encryption to deny access.
There is the problem of physical data storage as mentioned above, but the files will remain accessible only if there is a clear, accessible, non-copyrighted, open and fully-documented standard which can be reimplemented from scratch if required. HTML is such a standard, .doc, .pst (for all your emails stored in Outlook) or .docx certainly are not.
Lost in that fire were many of the works of Archimedes and hundreds of other early thinkers, mathematicians, playwrights, and philosophers -- Dozens of generations of information of a much more fundamental kind. It was left to later generations to re-invent much of mathematics and science as a result, just for example, because most of the original copies of the information stored there were dispersed and lost or destroyed over time.
The library was not just a random collection of books; Alexandria had a law that the valuable books (scrolls, actually) aboard any vessel docking there were to be seized, quickly copied, and returned to their owners. It was, at the time, one of the busiest and most important ports in the world. Thus, they amassed a great trove of information, well beyond what we think of as a library today...
To use your example, what was lost was not the equivalent of how to read and decode a CD. Rather, it was the equivalent of foundational electrical/electronic theory.
Very interesting Foo, this one!
P.S. My CP/M box still works. :)
MS have always used proprietary, closed, binary-only formats as lock-in for their clients
And they were alone in this?
Lotus, Ashton-Tate, Novel, IBM and others all did the same thing.
Bet you can't read a Lotus AmiPro document.
Import something from Visicalc or WordStar anyone?
Rather, it was the equivalent of foundational electrical/electronic theory.
What good is that if you have lost the specs for the laser, CD-ROM and MP3?
Trust me, it's a technological equivalent of what Alexandria lost as figuring out simple math again pales by comparison to figuring out the collective engineering from hundreds of thousands of people to get to the stage of electronics we have today.
[edited by: incrediBILL at 7:27 pm (utc) on July 4, 2007]
They aren't important I know so they're lost forever but I'm just curious, does this innitiative intend to ignore the copyright on those old files and convert them or are they planning to design something to read everything (which is against the copyright too I'd think). How are they planning to get around the copyrights that forbid deconstructing software to use on other systems? Most software applications had them back then.
[edited by: Kurgano at 7:36 pm (utc) on July 4, 2007]
I've got some ancient VIC-20 programs I wrote in 1983 kicking around somewhere. Hey, at the time they were cutting edge.
If you can get them into a PC you can run them in a VIC-20 emulator.
There are all sorts of emulators to run ancient software such as the ATARI 2600, TRS-80, so on and so forth, and they're all just a Google search away.
Getting it onto a computer would be a chore, the programs are stored on a cassette tape, the kind everyone listened to music with back then. I'm fairly sure they are still in working order because I can play music tapes stored in the same box without problem still.
This thread made me dig out old music cassette tapes! ugh. PEEK and POKE commands anyone?
this is like saying a load of music was lost when they phased out vinyl
I think you miss the point that VINYL can be played with a cone of paper and any old needle or straight pin stuck into the end of the cone of paper. Not like the record would last long, but you can play the record with extremely low-tech implements, so it's almost as easy as reading a book.
However, an 8" floppy disk can't be played without an 8" disk drive composed of magnetic read heads, index hole sensors, and all sorts of nonsense.
There are some folks who will, for a fee, help me bring back that synth to operating condition. If I get onto it right now, while parts are still somewhat available on the antique market.
however is that really an interesting field to work in: data preservation - google says they want to organize the worlds info, but no word about saving it for future generations. not sure if the market sees a need for that?
The perspective of their argument is twisted; We have access today to only a tiny amount of recorded information from the past. For example, it has been said that the sacking of the library at Alexandria set civilization back 1000 years. Old books mildew and rot
Good point Jim. I got sidetracked, but I meant to mention that. The big advantage of digital storage is that it can be replicated and storage can be distributed.
Case in point. I work with 500 year-old documents. Or used to. As of last year, the collection I work on the most frequently was photographed and made available on DVD. This means that if the archive is destroyed tomorrow in a firestorm, those unique volumes are now no longer truly unique.
Trust me, it's a technological equivalent of what Alexandria lost as figuring out simple math again pales by comparison to figuring out the collective engineering
I've got to side with Jim on this one WRT Alexandria or the fall of the Roman Empire. If we lost all the specs for CD readers and lasers, do you really think it would take us centuries to figure out how to do it again?
We tend to think that we live in the dramatic and rapidly changing era in history, but often that is based on temporal foreshortening.
Actually, a whole lot of stuff like that from perhaps 70 to 100+ years ago has already been lost.
Some stuff sees zero surviving copies, even for stuff originally replicated in the (tens of) thousands of copies.
Hard disks became unusable, floppy disk too. Several reading errors on every media. So, I work on digital imaging and publishing, and by the amount of data we manage at the department, we are so sick and tired of making backups and "backups of the backups".
On the other hand I see that what you don't use frequently is often because is left aside... so is like "natural selection" for information.
And, the web is becoming a way to preserve dif data. So if you are not making your data available in some universal format (html or CSV, jpgs...) then other is already doing it. Thats why we find lots of info on the web, even free.
As for original material, lots of companies are taking advantage of this content putting it on the web and even using adsense... So, the data keeps their life cycle.
I guess collectors are the key to this... maybe they will make profit by "converting" all those old aldus pagemaker files to text only and jpgs for you in the future... :)
I have always kept that in mind... preserving something for the future. Can you imagine yourself showing a CD to your kids without a way to play them? "inside this disc are my family pictures"... too bad I can't show you... ha ha
Assuming that we can still read the characters used today, then any future generations for the next few billion years would be able to read the diamond. Really stable things, diamonds!
Sure, you would need OCR to digitise it into whatever your modern system is, but if future generations can't manage OCR then I formally disown them and they deserve to have decayed unreadable archives.
this is like saying a load of music was lost when they phased out vinyl. i'm sure they could build a record player again if they really had to. it's not like the technology has been 'lost'. all the important stuff got transferred over to other formats. it's only the lousy stuff that didn't (val doonican... max bygraves... the spice girls...)
i have 50 year old vinyl that plays as well as the day it was made.
no bits or magnetics or substrates - just vibrations pressed in pvc.
the technology is still current.
you can get a turntable to play vinyl today at circuit city (big box electronic retailer) for $100 or so.
or you can spend several thousand on the latest tonearm.
you can play today's turntables through usb or play them through vacuum tube amplifiers.
the spice girls sold way more cds than vinyl.
and last week they announced their reunion tour...
have 50 year old vinyl that plays as well as the day it was made.
Vinyl (just like tape) has constant degradation time every you play it because it's a contact medium and all copies have imperfections which is the nature of analog mediums as well. Photography is even worse with duplicate slides and film always slightly lesser quality than the original, or gets a fungus, has a color shift, or on old movies the sprocket holes get fragile and break, a big mess.
Digital is always an exact carbon copy of the previous copy, no degradation whatsoever unless the medium becomes corrupt or you purposely use lossy compression to make copies. Therefore, as long as we can maintain the technology used to reproduce these digital images, movies and music we'll always have premium copies with every copy, not old scratchy sounding records or spotty looking movies.
The downside is that if we lose the technology, all copies are lost, unlike a single book or picture that degrades over time, you lose the whole collection in one shot.
FWIW, who cares anyway as we'll all probably be dead and buried long before the fuel crisis hits that causes civilization to collapse and people will be too busy fighting over scraps of food to worry about whether they can play an ancient DVD.
[edited by: incrediBILL at 8:56 am (utc) on July 5, 2007]
Import something from Visicalc or WordStar anyone?
Of course, some document formats don't have this challenge, and it's always nice to be able to quickly read an ancient Postscript or PDF file (Adobe's Reader still has support for very early versions of PDF).
I am told that this is called "progress"...