Introduction to the Problem
Today, we have access to information and data that 15 years ago would have scarcely seemed possible. It seems that almost everything is being created and used in the digital realm. Documents such as your history report, the spreadsheet that shows last year’s travel budget and more were likely all generated on your computer. However, though we use computers for so many things, we often don’t give much thought about preserving what we do generate until it is too late. Most people can remember at least one horror story of lost data, whether it happened to them or to a friend; the research paper that was lost when the computer crashed or the scattered and disorganized family photos that were only saved to one hard drive – that eventually crashed! This list of lost digital data illustrates the potential fragility of digital information. There are several reasons why digital objects are so fragile.
Fragility of Digital Objects
One reason that digital information is fragile is that software programs and other technologies can be very quickly superseded by newer ones and fall out of use. This phenomenon is called technological obsolescence. Once newer technologies become accepted as the norm, it can be difficult to use any digital object that exists in an older format. Although there is currently some backwards capability available for popular programs – for example, Open Office is able to open a Microsoft Office 2003 doc – this is not necessarily the case for less widely used programs and proprietary formats from small companies. Obsolescence can also occur with the media that digital information is stored on. It is quite difficult now to find a computer with a 3 ½” floppy drive, much less one for 5 ¼” floppies. These obsolete media or formats may contain unique information that may be very difficult or impossible to recover.
Another problem associated with digital preservation is media degradation. The very media that digital information is stored on was not always made to last, and can quickly degrade. This media can include magnetic tapes, floppy discs, optical discs and more. Take, for example, a movie on DVD. More likely than not, you have experienced a crucial scene in a movie being ruined because of scratches on a DVD you were watching. This is a case of media degradation – the information that was on the DVD no longer exists because the media that it was on has itself degraded. Now imagine this has happened with not simply a commercially available CD or DVD, but a unique item that contained thousands of digitized images. Clearly there are many risks associated with media degradation, especially when you consider how much information that has been burned onto CDs and DVDs to serve as a backup.
What is Digital Preservation?
So how do we deal with the problems mentioned above? One way we can do this is through active Digital Preservation. Digital Preservation is the management and maintenance of digital objects (the files, or groups of files, that contain information in digital form) so they can be accessed and used by future users. It is important to start thinking about digital preservation early in the life cycle of a digital object because while traditional print objects may last relatively unharmed for decades untouched, this is not the case with digital objects, which have significantly shorter life spans. Therefore, by thinking about preserving the digital object early on, even when it is created, we save a great deal of time and stress later on when trying to retrieve the information an object holds before it is too late. In this sense, digital preservation, and especially early digital preservation, is important not only for personal data management but also large repositories that manage many objects. Though personal horror stories of lost data seem to be scattered and only happen from time to time, for larger repositories that contain many hundreds and thousands of digital objects, lost data can be a much bigger problem. Digital Preservation, after all, is frequently focused on long term use, which can be quite difficult to achieve considering how fragile digital objects can be. There are several strategies used to help preserve digital objects, such as emulation, migration and data redundancy.
Digital Preservation Strategies
One of the best ways to help preserve digital objects is by data redundancy. This is, simply put, making sure there are many copies of important files. If there are one or more copies of an important file available, it mitigates the disaster of the computer crashing or one disc being lost. However, though this may be helpful in the short term, it may not prove to be helpful in the long term, as file formats and media can change rapidly over a short period of time. In this case, two more digital preservation strategies can be helpful in preserving digital objects, emulation and migration.
Emulation involves using a program that imitates the original, obsolete hardware or software to render a digital object. In emulation, the original bit stream (the information that comprises the file) is saved and used. In contrast, in migration, the original bit stream is changed over to a new, current file format. Both strategies allow for the use of digital objects that may require outdated software or hardware, but in slightly different ways. When choosing a strategy, it is important to consider how the digital objects are to be used as well as the significant properties of that object. For example, is it a word document where you only need to read the information contained in it? In this case, migration which would eliminate some of the formatting might be ok. But what about a computer game where migrating data instead of emulating it would cause significant changes to the way the game was played? Although there are merits to both strategies, these types of questions are good to ask before choosing one. A more in-depth comparison of these two strategies can be seen below.
One last way to help preserve digital objects is to make sure that as much information as possible is gathered when they are created. This information is called metadata and can include basic descriptive information about the file as well as information about the file format of the object. The metadata collected about an object helps to place items in context, as well as give specific information. This is essential for making sure that digital objects are authentic. Authenticity is that the file hasn’t been added to or modified in any way. This means that it is the digital object created by the producer and the content of the digital object was not modified once it was placed in the digital repository. This is especially important for digital files that can be easily changed in a way that may not be easily apparent as opposed to print media. In addition, metadata can also help to track what was done to preserve the object throughout its life cycle, such as migrating an object from one format to another. This metadata can be linked to the digital object or encapsulated with the digital object itself. Encapsulating the metadata with the object, for example placing the metadata with the object in the same folder in a zip file, ensures that the information stays with the file no matter where it goes. Linking the metadata and storing the metadata somewhere else (not with the file), ensures that the information about the file can be recovered even if the object itself was lost.