| This 41 message thread spans 2 pages: 41 (  2 ) > > || |
|Digital Data, a 'Ticking Time Bomb'|
|The growing problem of accessing old digital file formats is a "ticking time bomb", the chief executive of the UK National Archives has warned. |
Natalie Ceeney said society faced the possibility of "losing years of critical knowledge" because modern PCs could not always open old file formats.
She was speaking at the launch of a partnership with Microsoft to ensure the Archives could read old formats.
Digital Data, a 'Ticking Time Bomb' [news.bbc.co.uk]
We are all experiencing these issues every time a program is upgraded, or even a technology change with HD DVD and BluRay.
What's going to happen to the digital data I have on my DVDs - who really knows. I'm in the middle of converting analogue to digital right now.
This is not only true for retriving data, but using some hardware. I have a large, commercial grade printer for which I have to keep a PC with W98 as they never made a driver upgrade.
In almost every case, there are conversion utilities from one format to the next. They just need to be converted at the time the format starts to be less supported.
Sure, that takes time and money, but nobody said digital archiving wasn't going to have any of the preservation costs associated with traditional media.
As an example, if I now had anything in Word Perfect, I'd be having it converted to something else now, before it's too late.
The problems we see here are entirely due to archiving projects getting started without the budget required to maintain the archive in the future.
Interesting. This is the essence of why my books are still published with acid-free inks on acid-free paper and archival-grade bindings, even though the works themselves would truly be better-suited to electronic access and consultation.
From a historian's point of view, the main disadvantage of digital storage is not just the software issue, but the stability of the media themselves. I can walk into an archive and sit down with a 500 year-old manuscript and, with a bit of training in paleography and a good knowledge of archaic language, read it as effectively as someone could have when it was, say, 10 years old.
But lets say I have a perfectly good computer that's 500 years old. I have all the software and hardware to run it. But the optical media have oxidized and the magnetic media has bled and de-magnetized. The data is gone.
To truly preserve digital data on magnetic and optical storage, there needs to be a constant process of refreshing the media. Forget that this might not happen because of laziness. Over time, governments fall and civilizations collapse. 1000 years hence, good rag paper with pigment-based inks will still be readable as long as the archives don't burn.
Of course, over the long term, there are other challenges, such as the evolution of language, for example, but the pace of linguistic change has slowed down a lot, at least in Europe.
Check out a book called "Clock of the Long Now" by Stewart Brand (the Whole Earth Catalog guy). One of the main themes is the question of how a culture can preserve information.
> 1000 years hence, good rag paper with pigment-based inks will still be readable as long as the archives don't burn.
Or get destroyed by a war, flood, hurricane/tornado, earthquake, volcanic eruption, glaciation, meteorite strike, hungry worms, insects, rodents, etc.
It seems to me that the argument is mis-targeted; The challenge for archivists now and those in the future is determining what is important and should be preserved. If we do a too-thorough job of archiving everything, the challenge in the future will be to sort through it all, and separate the momentous from the trivial...
The perspective of their argument is twisted; We have access today to only a tiny amount of recorded information from the past. For example, it has been said that the sacking of the library at Alexandria set civilization back 1000 years. Old books mildew and rot -- or even worse, get used for palimpsests. Obviously, the archivists feel compelled to preserve absolutely everything and are worried about losing anything. But how much of it will truly be important to future generations?
The one bright spot here is that if they can get the information on-line, then (if allowed) it will replicate across space, time, and generations of hardware and file formats, making loss due to any one event less likely.
One important point here is the expiry of copyright. As things expire and automatically enter the public domain, the job of sorting and archiving becomes automated by the public at large. If documents are remotely interesting they will be held on webpages and archived in any number of digital formats across the world. Just take a look at the vast libraries of software available for obsolete computers, kept up by hobbiests and those who use emulators.
Part of the problem lies in the complexity of the current technology. 20 years ago I knew how all of it worked from the CPU to the floppy and could make board level repairs to any part of my computer that might break. Back then I used to even know how to read and write tracks directly from a floppy or hard disk to manually fix things when they crashed, or transfer data from CP/M machines to IBM mainframes in EBCDIC format disks, something I don't even have the gear to do today.
Today computers basically work the same but to get smaller, cheaper and faster there are no discrete circuits that can be fixed and removing a surface mounted chip from a modern board will virtually destroy it, it's unmaintainable. So now you can basically replace the boards assuming that particular board still exists.
So forget the media, there won't be a maintainable machine to read the media!
So assuming after a civilization collapse of some sort what many are missing is the fact that 100 years later even *IF* the media is still intact, the very ability to read a CD with a laser may be lost, assuming people even know what a laser is in the first place.
Maybe you find one CD-ROM drive still operational, what is the data stored on the CD-ROM?
Anyone know how to decode an MP3 or JPEG off the top of their heads?
Of course all the documentation on how to decode MP3s and JPEGs is stored in a PDF file on the dual-sided DVD backup. Assuming you could read a dual-sided DVD or interpret a PDF file which has visual presentations on the process in JPEG format inside the PDF describing how to decode the JPEG in the first place...
Let's face it, if they ever drop the bomb we'll all be reduced to driving the few '60s era cars remaining and using the few Apple 2's still operational because it's about the only thing we can maintain on our own by scavenging for parts.
The bottom line is we'll lose far more than just a few databases as all of our music and movies will also be lost, not to mention family photos.
An entire generation of creation that pales the works lost in the great library of Alexandria: POOF!
There is the significant irony of the tie-up with Microsoft, who are so helpfully attempting to provide a solution to a problem they have deliberately created. MS have always used proprietary, closed, binary-only formats as lock-in for their clients, changing those proprietary formats with each revision of their program (Word documents are a good example of incompatibility between versions). Their latest "open" XML document format does nothing to define the compatibility with the older proprietary formats it is supposed to encompass.
Another important aspect is DRM - which adds locks on a file which don't magically expire once the copyright ends, and which use obfuscation and encryption to deny access.
There is the problem of physical data storage as mentioned above, but the files will remain accessible only if there is a clear, accessible, non-copyrighted, open and fully-documented standard which can be reimplemented from scratch if required. HTML is such a standard, .doc, .pst (for all your emails stored in Outlook) or .docx certainly are not.
I realised, just a few months back, that I no longer own anything that is capable of reading any type of floppy disk. Hmm. Must copy all that data onto a CD or USB hard Drive and clear the shelf of old bulky disks sometime... but will now need to borrow something to do it with.
While I agree with the main thrust of your post, I can't agree with this:
> An entire generation of creation that pales the works lost in the great library of Alexandria: POOF!
Lost in that fire were many of the works of Archimedes and hundreds of other early thinkers, mathematicians, playwrights, and philosophers -- Dozens of generations of information of a much more fundamental kind. It was left to later generations to re-invent much of mathematics and science as a result, just for example, because most of the original copies of the information stored there were dispersed and lost or destroyed over time.
The library was not just a random collection of books; Alexandria had a law that the valuable books (scrolls, actually) aboard any vessel docking there were to be seized, quickly copied, and returned to their owners. It was, at the time, one of the busiest and most important ports in the world. Thus, they amassed a great trove of information, well beyond what we think of as a library today...
To use your example, what was lost was not the equivalent of how to read and decode a CD. Rather, it was the equivalent of foundational electrical/electronic theory.
Very interesting Foo, this one!
P.S. My CP/M box still works. :)
|MS have always used proprietary, closed, binary-only formats as lock-in for their clients |
And they were alone in this?
Lotus, Ashton-Tate, Novel, IBM and others all did the same thing.
Bet you can't read a Lotus AmiPro document.
Import something from Visicalc or WordStar anyone?
|Rather, it was the equivalent of foundational electrical/electronic theory. |
What good is that if you have lost the specs for the laser, CD-ROM and MP3?
Trust me, it's a technological equivalent of what Alexandria lost as figuring out simple math again pales by comparison to figuring out the collective engineering from hundreds of thousands of people to get to the stage of electronics we have today.
[edited by: incrediBILL at 7:27 pm (utc) on July 4, 2007]
I've got some ancient VIC-20 programs I wrote in 1983 kicking around somewhere. Hey, at the time they were cutting edge.
They aren't important I know so they're lost forever but I'm just curious, does this innitiative intend to ignore the copyright on those old files and convert them or are they planning to design something to read everything (which is against the copyright too I'd think). How are they planning to get around the copyrights that forbid deconstructing software to use on other systems? Most software applications had them back then.
[edited by: Kurgano at 7:36 pm (utc) on July 4, 2007]
Reading this is quite ironic as a take a break from upgrading my wife's computer to Microsoft Vista.
Software purchased 1 year ago (from Microsoft) isn't compatible and trying a work-around just set me back 2 hours.
|I've got some ancient VIC-20 programs I wrote in 1983 kicking around somewhere. Hey, at the time they were cutting edge. |
If you can get them into a PC you can run them in a VIC-20 emulator.
There are all sorts of emulators to run ancient software such as the ATARI 2600, TRS-80, so on and so forth, and they're all just a Google search away.
incrediBILL if I bother it would be to have a copy for memories sake only, the Vic had 5.5k of ram and 2k of that was used for the system itself. I could re-write the stuff today fairly easily and make it better i'm sure.
Getting it onto a computer would be a chore, the programs are stored on a cassette tape, the kind everyone listened to music with back then. I'm fairly sure they are still in working order because I can play music tapes stored in the same box without problem still.
This thread made me dig out old music cassette tapes! ugh. PEEK and POKE commands anyone?
this is like saying a load of music was lost when they phased out vinyl. i'm sure they could build a record player again if they really had to. it's not like the technology has been 'lost'. all the important stuff got transferred over to other formats. it's only the lousy stuff that didn't (val doonican... max bygraves... the spice girls...)
|this is like saying a load of music was lost when they phased out vinyl |
I think you miss the point that VINYL can be played with a cone of paper and any old needle or straight pin stuck into the end of the cone of paper. Not like the record would last long, but you can play the record with extremely low-tech implements, so it's almost as easy as reading a book.
However, an 8" floppy disk can't be played without an 8" disk drive composed of magnetic read heads, index hole sensors, and all sorts of nonsense.
is that actually true? cool! i am off to try that now.
My first keyboard synthesizer used analog casettes to record digital information - not digital tape, not floppy disks, but analog casettes interpreted by some proprietary system. I just ran across some of the music and sound patches that I stored in that ancient way, but the old synth is already in grave need of repairs. So I'm stuck, and after only 18 years!
There are some folks who will, for a fee, help me bring back that synth to operating condition. If I get onto it right now, while parts are still somewhat available on the antique market.
there is an implication that sounds like all data is worth storing... i am not sure, if that is the case!
however is that really an interesting field to work in: data preservation - google says they want to organize the worlds info, but no word about saving it for future generations. not sure if the market sees a need for that?
The perspective of their argument is twisted; We have access today to only a tiny amount of recorded information from the past. For example, it has been said that the sacking of the library at Alexandria set civilization back 1000 years. Old books mildew and rot
Good point Jim. I got sidetracked, but I meant to mention that. The big advantage of digital storage is that it can be replicated and storage can be distributed.
Case in point. I work with 500 year-old documents. Or used to. As of last year, the collection I work on the most frequently was photographed and made available on DVD. This means that if the archive is destroyed tomorrow in a firestorm, those unique volumes are now no longer truly unique.
Trust me, it's a technological equivalent of what Alexandria lost as figuring out simple math again pales by comparison to figuring out the collective engineering
I've got to side with Jim on this one WRT Alexandria or the fall of the Roman Empire. If we lost all the specs for CD readers and lasers, do you really think it would take us centuries to figure out how to do it again?
We tend to think that we live in the dramatic and rapidly changing era in history, but often that is based on temporal foreshortening.
>> this is like saying a load of music was lost when they phased out vinyl <<
Actually, a whole lot of stuff like that from perhaps 70 to 100+ years ago has already been lost.
Some stuff sees zero surviving copies, even for stuff originally replicated in the (tens of) thousands of copies.
I agree to some extent as old data can't be read anymore not only because we don't have-use that old software today, but because the CDS are magically unreadable now, even those "double backups", one for use and another cd putted inside a plastic bag.
Hard disks became unusable, floppy disk too. Several reading errors on every media. So, I work on digital imaging and publishing, and by the amount of data we manage at the department, we are so sick and tired of making backups and "backups of the backups".
On the other hand I see that what you don't use frequently is often because is left aside... so is like "natural selection" for information.
And, the web is becoming a way to preserve dif data. So if you are not making your data available in some universal format (html or CSV, jpgs...) then other is already doing it. Thats why we find lots of info on the web, even free.
As for original material, lots of companies are taking advantage of this content putting it on the web and even using adsense... So, the data keeps their life cycle.
I guess collectors are the key to this... maybe they will make profit by "converting" all those old aldus pagemaker files to text only and jpgs for you in the future... :)
I have always kept that in mind... preserving something for the future. Can you imagine yourself showing a CD to your kids without a way to play them? "inside this disc are my family pictures"... too bad I can't show you... ha ha
The answer could well come from the oldest technologies matched with some of our newest innovations. For example, some of the oldest surviving documents are cave paintings and engravings. Fast-forward to today, and etch your documents with ultra-high precision lasers onto diamond slices.
Assuming that we can still read the characters used today, then any future generations for the next few billion years would be able to read the diamond. Really stable things, diamonds!
Sure, you would need OCR to digitise it into whatever your modern system is, but if future generations can't manage OCR then I formally disown them and they deserve to have decayed unreadable archives.
|this is like saying a load of music was lost when they phased out vinyl. i'm sure they could build a record player again if they really had to. it's not like the technology has been 'lost'. all the important stuff got transferred over to other formats. it's only the lousy stuff that didn't (val doonican... max bygraves... the spice girls...) |
i have 50 year old vinyl that plays as well as the day it was made.
no bits or magnetics or substrates - just vibrations pressed in pvc.
the technology is still current.
you can get a turntable to play vinyl today at circuit city (big box electronic retailer) for $100 or so.
or you can spend several thousand on the latest tonearm.
you can play today's turntables through usb or play them through vacuum tube amplifiers.
the spice girls sold way more cds than vinyl.
and last week they announced their reunion tour...
|i have 50 year old vinyl that plays as well as the day it was made. |
This begs the question is the "newer" technology better or just a marketing ploy to get us to buy things. I think the latter.
|have 50 year old vinyl that plays as well as the day it was made. |
Vinyl (just like tape) has constant degradation time every you play it because it's a contact medium and all copies have imperfections which is the nature of analog mediums as well. Photography is even worse with duplicate slides and film always slightly lesser quality than the original, or gets a fungus, has a color shift, or on old movies the sprocket holes get fragile and break, a big mess.
Digital is always an exact carbon copy of the previous copy, no degradation whatsoever unless the medium becomes corrupt or you purposely use lossy compression to make copies. Therefore, as long as we can maintain the technology used to reproduce these digital images, movies and music we'll always have premium copies with every copy, not old scratchy sounding records or spotty looking movies.
The downside is that if we lose the technology, all copies are lost, unlike a single book or picture that degrades over time, you lose the whole collection in one shot.
FWIW, who cares anyway as we'll all probably be dead and buried long before the fuel crisis hits that causes civilization to collapse and people will be too busy fighting over scraps of food to worry about whether they can play an ancient DVD.
[edited by: incrediBILL at 8:56 am (utc) on July 5, 2007]
|Import something from Visicalc or WordStar anyone? |
Yes! Specifically, WordStar for Dos 3.3. It was a mess, and the hours spent cleaning and reformatting the automatic conversation results were costly. Just getting the physical media (5.25 disks) read was one hassle, then finding that modern programs couldn't convert, I ended up running a series of conversions to get to Rich Text Format. So, yes, this problem is real. If you've got a bunch of information of any value still in formats more than 10 years old, there's an exponential price-of-recovery vs. time-since-creation graph to consider. I'd say were well past the shallow end of the curve for documents over 20 years old (1987). Using the original tools to recover documents of this age is probably a near-impossible challenge and recovery will take reverse-engineering.
Of course, some document formats don't have this challenge, and it's always nice to be able to quickly read an ancient Postscript or PDF file (Adobe's Reader still has support for very early versions of PDF).
My pet peeve is and old Windows 3.1 hard drive I have which used Stacker compression - the data is preserved (I have several clones) but I cannot find any way to access or convert it,
despite trying every suggestion I found in several months of intensive Googling.
I am told that this is called "progress"...
| This 41 message thread spans 2 pages: 41 (  2 ) > > |