Deep into that darkness peering: Our Dark Repository

Some people tend to get all riled up when they hear the phrase “dark archive” or “dark repository.” So, I’m starting with a definition from Digital Preservation Matters of what a dark repository means in our context here at the library.

Dark Archive: An archive that is inaccessible to the public. It is typically used for the preservation of content that is accessible elsewhere.

The need In our case, the “preservation of content that is accessible elsewhere” line is an important one. Before we created a dark archive, all of our preservation systems were built for access, with many of them creating access copies (or DIPs, for all you OAIS groupies out there) on the fly from the preservation copy (AIPs) in the repository. This approach works fine for a lot of our digital material, but the pressures created by time-based digital media, particularly video, pushed against that model. Uncompressed video files produced through digitization are humongous and too big to serve as access copies or be the source of on-the-fly access copy creation. This means we are free to separate methods of DIP access from AIP storage. For example, we currently provide onsite access via optical disk (I know, I know) but are transitioning to online access with appropriate access restrictions depending on the content. The transition in access methods has no effect on the AIPs, which can be managed on good old reliable, and in this case dark, preservation storage.

So, we need a place to keep A/V preservation masters with long-term preservation. We also need to create a kind of digital rest stop (but cleaner than an actual rest stop) for material that was on its way to full preservation. The main example of that is content produced through our born-digital preservation workflows. We need a place to store disk images and transfers for the medium-term as we wait on capacity building and additional workflow development aimed at providing user access to this material.

Dark Blue To meet these demands we created the cleverly named Dark Blue repository. Currently, Dark Blue provides long-term storage for A/V preservation masters and medium-term storage for forensic images/file transfers of born-digital archival accessions. It is possible that it will expand to include things like geospatial data backups, storage for perpetual access copies of licensed content, backups of video games, and local web crawl backups.

Dark Blue is implemented as a Ruby on Rails application. Depositors can upload a BagIt bag with metadata specifying a particular "content type", for example, audio or video. Dark Blue then validates the bag according to the content type and (if it passes validation) moves it to preservation storage. If validation fails, Dark Blue provides the depositor with a report of the problems with the package. Its primary storage is campus-provided MiStorage with backup to Amazon S3 and supports periodic fixity checking of the primary storage copy as well as full or partial audits of the backup storage. Dark Blue also supports retrieval of bag content via HTTPS for authorized users. It currently does not support much in the way of metadata management or searchability, so we rely on other systems such as the catalog and ArchivesSpace to provide linkages between metadata and packages in Dark Blue.

We will continue to evaluate our storage strategy as the diversity and size of our digital collections grow, but right now Dark Blue fills an important void in our preservation strategy.

Merely this and nothing more

Thanks to Aaron Elkiss for help in describing the technical characteristics of Dark Blue.

Deep into that darkness peering: Our Dark Repository

Tags: