The mPach platform represents an end-to-end suite of tools for publishing fully-encoded XML journal articles directly into the HathiTrust repository

What follows is a description of the planned software components that will comprise the system, divided into the following areas:

Editorial Workflow and Peer Review

This area satisfies the set of behaviors associated with acquiring content, determining its fitness for publication, and exporting to Prepper using the Upload API. A number of possible software solutions are possible which can fulfill the needs of different publishers, loosely coupled with the rest of mPach and communicating over an Upload API. It is also possible to conduct all of these behaviors offline, or skip altogether, and interact manually with the Prepper service.

mPach OJS Plugin

A plugin or similar modular modification to Open Journal Systems (OJS) is planned to satisfy the functions of this area.


Content Preparation (Prepper)

This dashboard allows a journal editor to carry out various functions related to publishing a journal, including configuring the journal and preparing manuscripts approved for publication so that they can be ingested into the HathiTrust repository and distributed to the public. The key requirement of this stage is that the content and metadata are captured and normalized in compliance with HathiTrust's specifications for long-term preservation.

Upload API

A RESTful web API that any Editorial Workflow and Peer Review system can use to upload articles into Prepper. (Prepper will also include a simple web client to this API for manual uploading of articles.)

User Management

Allows an administrator to manage user privileges, such as determining that the user attempting login is authorized to submit content and determining which content the user may view in Prepper.

Journal Management

This module is used to configure a journal in mPach, allowing a user to:

  • Enter basic metadata about the journal title required of all journals
    • title
    • this journal has been previously published under this title (yes/no)
    • ISSN for this title, if available
    • title(s) that this journal succeeds and ISSN, if applicable
  • Set up "Aboutware"
    • current editor(s)
    • editorial and/or advisory board
    • submission policy
    • etc.
  • Configure journal-specific features
    • CrossRef deposit of DOIs
    • Metadata feeds
      • ONYX
      • Atom/RSS


The heart of Prepper, Norm accepts article content according to one of several prescribed input schema and normalizes it to a preservation-grade XML output schema suitable for HathiTrust ingest and rendering. While the capacity will exist to expand the supported schema with further development, at schema supported at launch will be as follows:

input schema
Microsoft Word DOCX files employing a prescribed set of Word paragraph styles. A template .docx file will be provided. Support for OpenOffice ODT files with prescribed paragraph styles will follow shortly thereafter, and Norm will be extensible to support other document formats. Files in conformant JATS XML schema will be natively supported.
output schema
Journal Publishing Tag Set of JATS (an application of NISO Z39.96).

Norm will provide various levels of feedback to the user based on various scenarios:

  • invalid input file not conforming to one of the input schema
  • insufficient information in input file to create a valid file according to the output schema
  • ambiguous information in input file requiring clarification to complete transformation to output schema

The supported method of correcting any normalization errors will be for the user to adjust the DOCX file, re-upload and re-process. We hope to add tools at a later time that will allow for direct editing of the output XML within Prepper.

Conversion Review

Presents the results of conversion to JATS by Norm, including a review of select metadata fields extracted from the Word document.


Offers a preview of the content that has been validated according to the output schema as it will be rendered in the HathiTrust PageTurner application. This affords the user an opportunity to make further adjustments to the input file for re-processing, such as re-tagging the content correctly to improve the rendering.


Aids the user in assembling the Submission Information Package (SIP) to be submitted to HathiTrust. In addition to incorporating the output of Norm, it also assists the user in uploading and describing any media files to be presented within the article text, supplemental media files to be offered for download, or alternate formats of the article itself, such as a PDF. PackageBuilder flags any such files that don't meet HathiTrust preservation benchmarks, allowing the user to upload a valid or higher-quality version of the content if such files exist.


Submits the assembled SIP to the HathiTrust repository for ingest, and the analytic MARC record of the article to the HathiTrust Catalog, provided the following conditions are met:

  • The SIP, and its contents, are valid according to the HathiTrust validation code.
  • Prepper is authorized by HathiTrust to submit content for this journal.
  • The current user is authorized to submit on behalf of the journal (authorization using the User Management module).

Due to fluctuations in HathiTrust's ingest capacity, and the delay introduced by data replication across the University of Michigan and Indiana University campuses, ingest can currently take up to 48 hours to complete once requested by Prepper's Submitter module. If the article being submitted is the first item of a given volume or issue, Submitter also creates a permanent HathiTrust Collection for the volume or issue.

While not expected at launch, a number of Submitter submodules are desirable, to trigger certain external services at the time of publication. Such sub-modules might include:

  • CrossRef deposit
  • PubMed Central deposit (in order to fulfill the NIH Public Access Policy)



Receives confirmation from HathiTrust that the articles in the issue have been ingested, validated, and replicated across repository instances, and that the catalog records for each article have been added to the HathiTrust Catalog.  Sends a notification email to the user account who submitted the issue.



Articles submitted via mPach will be handled according to the same policies and available for the same functions as other objects in the repository. While no new applications will be added to HathiTrust as part of mPach, several existing application will be extended to accommodate mPach's requirements.


mPach SIPs will be validated by HathiTrust's ingest mechanisms prior to ingest.


PageTurner will detect when a HathiTrust item contains JATS XML and render it appropriately. It will also render any embedded media files that meet HathiTrust preservation specifications. Supplemental files, and embedded media files which don't meet preservation specifications, will be offered for download.


Each journal submitted via mPach will be represented by a permanent collection, which will enable browsing of articles by volume and issue, and provide access to the most current Aboutware for that journal (editorial board, submission policy, etc.)


Analytic MARC records for each article submitted via mPach will be present in the HathiTrust Catalog, with links to the article in PageTurner, any External Publication Presentations (see below) included in the metadata, and the title-level catalog record for the journal.

Data API

Will allow the retrieval of the JATS XML; any embedded, supplemental, or alternate media files included in the SIP; and formats derived from the JATS XML (PDF and EPUB).

Bibliographic API

Will allow the retrieval from the HathiTrust Catalog of analytic records for individual articles and the journal-level record.

External Publication Presentation

A key feature of mPach is the Publisher's ability to build their own platform for the presentation and use of mPach materials. Participation in mPach requires that the preservation copy of the content is always used for access, in order to keep the preservation copy up to date. Any user, however, is free to make use of a read-only cached copy of the content pulled via the HathiTrust Data API, and develop their own fully branded access experience, adding whatever functionality they see fit.

Page maintained by Jeremy Gregg Morse
Last modified: 03/28/2014