Design principles and requirements

Development of mPach is guided by the following design principles and requirements.

Design principles

  1. HathiTrust and mPach will serve different purposes and perspectives on use. HathiTrust serves the archiver and the library/researcher doing discovery and research/analysis, assumes fixed content, and demands that that content and access to it are preservable. mPach serves the subscriber/reader focusing on a single publication and demands innovation; it also serves the content producer with more immediate access to the repository in a way that treats content as relatively fluid.
  2. Preservation without access is meaningless; there can be no access without discovery.
  3. The project to develop mPach is designed to meet the needs of MPublishing at the University of Michigan Library. Since HathiTrust supports the development of a shared infrastructure, this project is encouraged to meet the needs of other HathiTrust members besides the University of Michigan but should not be constrained by their needs. While design choices will be mindful of the potential needs of other HathiTrust partners and while the needs of such users will be considered where possible, meeting these needs is not the focus of the project.
  4. We won't reinvent the wheel when a suitable open-source solution is available.
  5. Any publishing service offered will be tightly coupled to the HathiTrust, incentivizing publishers to make more of their content openly accessible. mPach will not support deposit in any repository besides HathiTrust.
  6. Development of a service and of the platform inform each other and should be planned in parallel.
  7. mPach will be extensible so that individual publishers/publications can add content structures, system behaviors, or interface features without impacting the rest of the system.
    1. While mPach will require text content to conform to a prescribed schema for archiving, a nonconformant version of the content (such as a publisher PDF or the publisher’s own XML format) may also be archived in the system as a bitstream.
    2. Neither mPach nor any service offered using it will be responsible for verifying that the content of nonconformant versions matches the content of the conformant version, though users will be expected to update all versions when updating any of them.
    3. The service will offer a tool to help publishers map certain file formats to the prescribed schema.
    4. All archived content will be available to the publisher via the Data API, enabling the development of publication-specific access mechanisms and user interfaces (that can leverage the original, nonconformant version of the content).

Requirements

  1. The version of the content in HathiTrust must always be the single authoritative version. A cached, read-only copy could be generated via the Data API and kept in an outside system to improve performance, but any revisions to the content must be made to the authoritative copy in the HathiTrust repository.
  2. By policy, HathiTrust only closes access to content for legal reasons, not because a rightsholder decides to restrict access. Therefore, mPach will only allow deposit of content into HathiTrust if the rightsholder (often the publisher) licenses HathiTrust to make all the content available perpetually through open access.
  3. A delay for open access is acceptable only as a means to ensure that all articles in an issue are made available simultaneously.
  4. Born-digital journal content in HathiTrust will be discoverable through at least some of the discovery systems for other HathiTrust content.
  5. Journal content presented in the HathiTrust interface will have usual HathiTrust branding, plus subtle additional branding for the journal (journal metadata and a logo).
  6. Users will be able to follow a link from an item in HathiTrust to an outside system for that particular journal (if one exists) that displays the article to users.  Journal content displayed in outside systems will incorporate subtle HathiTrust branding.
  7. HathiTrust will develop preservation specifications for genres of content found within journal content (text, diagrams, media clips, typeset versions, etc.) that are not already covered by existing preservation specifications. Such formats must be in an open, non-proprietary format with sufficiently descriptive, granular, and consistent structure to support future content refreshing for preservation purposes and to support scholarly uses. For any genres of content for which preservation specifications cannot be developed, and for alternative formats of content deposited according to specification, HathiTrust will ingest a bitstream and guarantee fixity but not refreshing to future formats.
  8. The HathiTrust interface will not necessarily be able to render all bitstream content to users. In such cases, the interface will simply allow users to download the bitstream for their own use outside of HathiTrust.
  9. mPach must be able to deposit into HathiTrust not only articles but also contextual information (like the list of editors, submission requirements, etc.) that will change over time.
  10. The source code for mPach will be open-source.
  11. Since the online medium allows for revision of content after publication, and since this facility is expected by users, mPach will need to support this. However, since the scholarship depends on being able to see the exact version of a text cited, mPach will need a way for users to be able to find previously deposited versions. Therefore, HathiTrust will need a way to store or reconstruct these past versions and present them to users.
Page maintained by Jeremy Morse
Last modified: 03/28/2014