University of Michigan Polices and Practice for the Long Term Retention of Locally Produced Digital Projects and Materials: A Report Prepared for the Joint RLG/TASK Force on Digital Preservation
Most of the locally produced digital projects and materials at the University of Michigan Library have been produced solely by the Digital Library Initiative or with the DLI as a major partner. DLI practice is documented in publications and in grant applications. Formal policies are now being drafted. Most existing policy has concentrated on ensuring long term access on the premise that continuing attention to access will also ensure the long term viability of the digital materials themselves. Policies regarding long-term archival storage and management of digital materials are currently being discussed in conjunction with the University of Michigan Library Preservation Department.
A note on source documents:
The following document is a synthesis of several existing documents in which the practices and working policies of the University of Michigan Library have been documented and explicated. The Digital Library Production Services draft policies are appended and can also be found online. Particularly relevant sources are cited at the end of each section and a complete list of source documents is at the end of this document.
Projects produced and maintained through the Digital Library Production Services (DLPS)
General principles for DLPS systems
DLPS undertakes open-ended maintenance and development of the collections and access systems it supports. While it is a certainty that the methods and strategies for maintenance will evolve, setting a term on the duration of responsibility for a collection can only contribute to a process of trivializing the collections of the digital library: these digital library collections are, in effect, our perpetual responsibility.
All DLPS systems are constructed so that they can be supported in this open-ended fashion (i.e., as much as it is possible to say this, in perpetuity). Capture formats are all standards-based and high fidelity; in fact, most are suitable for creating replacement copies of original publications. In nearly all cases, the "archival" version of the digital surrogate is also the online version.
Moving the Digital Library from "Project" to "Production" John Price-Wilkin, Head, Digital Library Production Service. Presented at DLW99 in Tsukuba, Japan, March 1999. http://jpw.umdl.umich.edu/pubs/japan-1999.html
Two components of digitally-based preservation:
1.The first component of digitally-based preservation is the question of the adequacy of a particular method of digital capture. Does the digital file capture adequate information to be acknowledged as a surrogate of the original? Some but not all methods of microfilm photography are recognized as appropriate in preservation projects. Similarly, some but not all methods of digital capture should or can be sanctioned in the same way. It is DLPS's understanding that 600dpi bitonal TIFF G4 images are acknowledged by many in the Preservation community--including our own Preservation Department--as being adequate for representation of many types of printed publications. DLPS bases its use of that format on the understanding that this method of capture adequately represents the printed original printed publication for materials converted in, for example, the Making of America work. Similar formats are chosen and used for other types of Preservation-related conversion projects.
2.The second component to digitally-based preservation is longevity of the original. This issue is characterized by issues surrounding the longevity of the medium on which a master file is stored, as well as issues of the longevity of the format of the file. By working with standards (see Preservation and the Role of Digital Masters in the Online System), especially in building our online systems, we contribute significantly to the viability of these standards. The issue that remains and which we have not addressed as an institution is the longevity of the medium on which the master is stored. Here, we refer to a preservation master and mean a copy of the master file on a special storage medium, designated specifically for long-term retention and management. Discussions on this issue are taking place with Preservation now, and may result in an NEH proposal within the CIC. Here, we would collectively advocate the use of media (e.g., gold CD-ROM) and methods (e.g., off-site cold storage and periodic migration) to ensure that the preservation master is accessible in the future.
DRAFT DLPS POLICY: Preservation of Digital Files http://jpw.umdl.umich.edu/dlps/digital-preservation.html
Role of the "rich master"
In pursuit of effective long-term strategies for digital libraries, the UM DLPS seeks to incorporate digital masters in the creation of its online systems wherever possible. Since digitization resources are scarce and older materials are fragile, returning to the object for second and even third passes of digitization is unlikely. Consequently, we believe that creating digital surrogates with high levels of information content -- high enough levels, for example, to create replacement print copies in some cases -- is a key strategy for digital library conversion activities. We believe that by privileging this "rich" master file in the online system, we ensure a more functional online system as well as a greater likelihood of sustaining the master through generations of technological change.
The rich master file typically affords a wider range of options for display and manipulation than a derivative of the file. For example, through the use of SGML or XML and stylesheets, different types of displays can be provided for different types of uses and users. Among the opportunities this affords is more effective transmission of information to the visually impaired. The richer the page image, the more possible it is to deliver that information in a wide range of formats and at a wide range of resolutions. A rich continuous tone image is a valuable tool for enabling high resolution analysis of art and artifact images.
By privileging the rich master file in the online system, we are contributing to the longevity of its format and of the file itself. Systems that make first-hand use of the rich master file ensure that the master file's format is one that current technology can use effectively. That is, we wish to ensure that we have tools to use our master file formats, and thus we build systems (a type of tool) to use those formats. An effective test of the value and viability of these file formats is often whether and how effectively we can build digital library systems that exploit them.
DRAFT DLPS POLICY: Preservation and the Role of Digital Masters in the Online System http://jpw.umdl.umich.edu/dlps/dlps-preservation.html
Standards and best practices
In all phases of creation and maintenance of digital materials, nationally accepted standards and guidelines are followed wherever they exist. Where there is no national standard, best professional practices are followed, and local policies and practices are in force. The University Library has established a nexus of strategies, guidelines, practices and policies that define and support its initiatives and programs aimed at conversion to digital format of printed originals. These local practices are based on industry standards where applicable (such as the use of TIFF CCITT Group 4 images), in deployment of current technologies (hardware and software), in preservation principles (painstaking preparation of text, high-resolution image capture), in strict bibliographic description and cataloging, and in continuing close involvement of subject specialists and curators. Local practices are the results of the University Librarys involvement in such pioneering programs as the Journal Storage Project (JSTOR) and Making of America (MoA), phases 1 and 4. In the pilot phase of JSTOR, which at that time was partnered with the University Library, the Preservation Division was responsible for designing and codifying all pre-scanning preparation and post-scanning quality control guidelines and processes, based on nationally-accepted guidelines and standards for microfilm prep and quality control. These processes are currently used by both JSTOR and by MoA; they are also the basis for all programmatic digital conversion done as part of the Preservation Divisions services to the Library.
Philippine-American Relations at the Turn of Twentieth Century: A Scholarly Digital Repository. A Proposal Submitted to the National Endowment for the Humanities July 1, 1999.
Partnering with Preservation
As part of its mission of full integration within the larger University Library, DLPS relies on the expertise and methods of the Preservation Department to ensure long-term maintenance of these digital masters. We rely on cooperation with the Preservation Department to achieve effective preservation of the digital files, but by relying on the master file whenever possible, it may become unnecessary to retrieve and use preservation media on which a copy of the master is stored. As we move forward on formulating policy and practice related to questions of media and storage, we expect that the Preservation Department will take the lead in coordinating and documenting those discussions.
Preservation staff are also active in conversion decisions. The Preservation staff, with guidance from DLPS, make determinations of the most appropriate means of digital capture, and then prepare the materials for digital capture (occasionally operating equipment for the actual capture).
DRAFT DLPS POLICY: Preservation and the Role of Digital Masters in the Online System http://jpw.umdl.umich.edu/dlps/dlps-preservation.html
Moving the Digital Library from "Project" to "Production" John Price-Wilkin, Head, Digital Library Production Service. Presented at DLW99 in Tsukuba, Japan, March 1999. http://jpw.umdl.umich.edu/pubs/japan-1999.html
General principles of data storage and forward migration
Data is stored, maintained and protected as part of the University of Michigans computing environment, a comprehensive networked data environment. Text and image files are stored redundantly. Three copies are stored at all times. The production version is housed on a server at the University of Michigans Media Union; an identical copy is stored on a back-up server. And files, including system software, are copied to Digital Linear Tape on frequent basis. In addition, the development version of the system, stored at DLPS, is backed up every week. In order to ensure that accidental or unauthorized changes or replacements do not take occur, there is a permissions system in place which only allows designated staff members to alter the files. Moreover, once files are passed from production staff to technical staff, they are never passed back, in order to make further changes, the production staff must give the technical staff new files that supercede the previous files. Storing the data in a variety of media will help ensure further access. Also, by using industry standards for imaging and SGML/XML encoding for text, we anticipate and support migration into future incarnations.
Philippine-American Relations at the Turn of Twentieth Century: A Scholarly Digital Repository. A Proposal Submitted to the National Endowment for the Humanities July 1, 1999.
Principles for continuous tone images
Continuous tone images create some exceptions to the general principles outlined above. These images are too rich in data to currently build an access system around the master image. Master manuscript and photographic images are written to CD-ROM; an off-line process then creates wavelet compress versions of the images. These are mounted in the RAID systems and used to deliver information to the end user. The University Library refreshes the CD-ROM images on a scheduled basis.
In order to facilitate long term access of to these images, DLPS Image Services captures the following metadata:
Simple databases are used to explicitly capture the following:
Scanner settings are also saved per CD, per directory, or per image, depending on the context.
Furthermore, Image Services follows file naming conventions that:
Finally, Image Services adheres to the following storage principles:
When images have long filenames, Romeo convention is used in conjunction with ISO 9660. (This is problematic because Romeo is not a standard, but we believe this gives the best cross platform support for filenames).
Principles for bitonal page images
Bitonal page images in DLPS systems are 600 dpi, TIFF G4 files. This supports emerging preservation standards and allow for selective reprinting of high-use brittle volumes on acid free paper. No derivatives (e.g., GIF or JPEG images) are created or stored, except at the time of viewing request. When a user requests a page, the system generates a GIF or PDF derivative in real time and without any appreciable delay (typically less than one second). Four levels of resolution in GIF are made available to users, taking into account the wide range of displays and network connections; a 600dpi PDF version is also made available, primarily for printing. While the number of pages currently online at the University of Michigan --approximately 3 million by late 2000--is relatively small compared to a typical research library collection, its large size, expected continued growth, and continuing changes in desktop technology (including networking) argue against storing anything but the master images online. Use patterns also suggest that as long as we are able to generate appropriate derivatives in real time, based on user demand, we will significantly minimize the requirements for management.
The University Library requires vendors and in-house units to produce master page images defined by the principles of fidelity and completeness laid out in TIFF CCITT Group 4 image guidelines. Of course there are still concerns about the appropriateness of TIFF G4 as a preservation-quality surrogate for pages, but the University of Michigan Library believes that this format provides a high quality surrogate for most printed materials. The page images are stored in redundant arrays of independent disks (RAID) level 5. The system is mirrored in a geographically separate area of campus. Additionally, the data on the RAID is written to DLT on an at least monthly basis. Finally, all data is written to CD-ROM at the point of final quality control acceptance.
Moving the Digital Library from "Project" to "Production" John Price-Wilkin, Head, Digital Library Production Service. Presented at DLW99 in Tsukuba, Japan, March 1999. http://jpw.umdl.umich.edu/pubs/japan-1999.html
Philippine-American Relations at the Turn of Twentieth Century: A Scholarly Digital Repository. A Proposal Submitted to the National Endowment for the Humanities July 1, 1999.
Other locally produced projects from the University of Michigan Library
The Engineering Library and the Business School Library both digitize a substantial number of course materials such as handouts, exercises, syllabi and problem sets. There is no policy for long term retention and maintenance of these materials.
The representative from the Engineering Library says "We have considered this stuff ephemera, useful only for the term it's produced for. So, we've done nothing with regard to long term storage. We do back up each previous term, and store that for at least 6 months, which allows us to respond to faculty members who want the scans. But we don't archive it in any formal sense of the word."
Similarly the librarian who administers this program at the Business School Library says "Our course reserves documents are treated as mission critical data in the same way the B-School's databases are treated. They are housed on servers with RAID drives and backed up daily, with archival copies of backup tapes sent to an off-site storage facility. These are very ephemeral materials, and our concern with backup is more for reliable access than preservation.
We also maintain a collection of digitized working papers produced by our faculty and various institutes affiliated with the B-School. These are maintained as an archive collection in print, while the digital collection is, again, produced for access. However, I believe this is a collection that we should consider preserving electronically in some type of archival form."
Other locally produced projects, primarily html documents, tend to be created by individuals and are maintained, managed and backed-up idiosyncratically and without documented policies.
Complete Source Documents
DRAFT DLPS POLICY: Preservation of Digital Files http://jpw.umdl.umich.edu/dlps/digital-preservation.html
DRAFT DLPS POLICY: Preservation and the Role of Digital Masters in the Online System http://jpw.umdl.umich.edu/dlps/dlps-preservation.html
Making of America IV: The American Voice, 1850-1876 A Proposal Submitted to the Andrew W. Mellon Foundation October 1998 http://www.umdl.umich.edu/dlps/moa4/proposal.html
Moving the Digital Library from "Project" to "Production" John Price-Wilkin, Head, Digital Library Production Service. Presented at DLW99 in Tsukuba, Japan, March 1999. http://jpw.umdl.umich.edu/pubs/japan-1999.html
Philippine-American Relations at the Turn of Twentieth Century: A Scholarly Digital Repository. A Proposal Submitted to the National Endowment for the Humanities July 1, 1999.
Prepared by Maria Bonn
last updated Sept. 9, 1999