Within the online library catalogue and beyond: Islamic Manuscripts at Michigan

by Evyn Kropf

(prepared 24 August 2012)


Since January 2009, we at the University of Michigan Library have been engaged in a grant-funded project to complete the cataloguing of our Islamic Manuscripts Collection.[1]Supported by a "Cataloging Hidden Special Collections and Archives" grant administered by the Council on Library and Information Resources with generous funding from the Andrew W. Mellon Foundation,[2] the project engages established and emerging scholars (at various levels of expertise in Islamic codicology and palæography) in the cataloguing process – training, examination, description, and generation of searchable bibliographic and codicological metadata – for the collection of roughly 1,090 manuscript volumes dating from the 9th to the 20th century CE and containing texts primarily in Arabic, Persian and Turkish.

Our chief goals with the project have been:

  • to enrich scholars' knowledge of codicology and palæography
  • to enhance intellectual access to the collection through creation and dissemination of searchable, web-discoverable descriptions and digital surrogates
  • and to compile a database of bibliographic and codicological data that may serve to further research and scholarship in Near Eastern studies and Islamicate manuscript studies in particular.

Our descriptive approach aims at "full" codicological description with a suite of elements that characterize not only the contents of the codices (text, paratext and ornament) though transcription, notes and headings, but also their form via further notes addressing the script and hand, structure (composition of gatherings, sewing, and cover), dimensions, writing surface, layout and other physical aspects.[3] Particular attention is given to the evidence for the history of the manuscript as attested in manuscript notes and in changes to structure through addition or repair. The data elements representing each manuscript are situated in a searchable, electronic, bibliographic record within our online Library catalogue,[4] and where possible accompanied by a link to a digital surrogate in the HathiTrust Digital Library.[5] This approach to data curation and deployment is motivated by the belief that our online library catalogue should be the chief (but not the only) repository for representation of the items in our Library collections. The intention is for each catalogue record to be as textually rich as possible, with a special emphasis on standardized headings for browsing and faceting, and transcriptions in the Arabic script (whereas English is used for descriptive notes and headings). Structured together within the Library catalogue, these records form a database of valuable bibliographic and codicological data.

Workflow and Technical Infrastructure

Our descriptive approach obviously presents challenges with data creation – namely, the anything-but-trivial task of generating the full manuscript descriptions – and also with data curation and deployment, i.e. formatting the data in such a way that it will be accessible in all of the preferred environments.

Thus, to realize our descriptive approach within the context of our project goals, we devised an iterative, collaborative scheme that leverages the potential of the digital environment to facilitate the cataloguing process in a way that engages as many scholars as possible. Emerging scholars from the University of Michigan are engaged as project staff and receive foundational training in codicology and palæography.[6]Emerging and established scholars from the University and around the world are engaged as contributors through the project website, a digital platform that facilitates remote examination and commentary on descriptive elements.[7] Data posted there can be exposed to search engines and retrieved by researchers browsing the Web. Thus, the efforts of University of Michigan project staff are supplemented by contributions from colleagues around the world and the evolving manuscript data is distributed beyond the library catalogue.

In the initial phase of the approach, existing inventory data in the form of a card-format handlist are converted to preliminary bibliographic records designed for display in the Library's online catalogue.

Figure 1. Preliminary bibliographic record carrying limited descriptive information as it is displayed in the Library's OPAC.

Project staff use a text editor to create the bibliographic records in MARC21 format. The project cataloguer then edits, enhances and inputs the records to the Library's Integrated Library System (ALEPH). The arrangement of descriptive elements is governed by a modified version of the AMREMM standard[8](developed by the project cataloguer as inspired by Adam Gacek's approach[9]) and Library of Congress Romanization schemes (chiefly for Arabic, Persian and Turkish).

Also during this phase, digital surrogates are created for as many manuscripts as possible and ingested (with companion metadata) for delivery via the HathiTrust Digital Library. Manuscripts that cannot be digitized are examined and described by project staff relying on collaborative tools like Google Docs and Picasa web albums. A project website elaborating on the blogging paradigm is also developed,[10]with the project cataloguer collaborating with the project web developer (Nancy Moussa) to develop several templates of descriptive elements compatible with MARC21 and the MARCXML schema.

In the next phase, the descriptive elements are harvested from the preliminary records created for each digitized manuscript and combined in a representative description that is posted to the project website by the project cataloguer. A custom plugin (Mblog) created by the project web developer is used within the WordPress blogging engine with CommentPress theme.[11] At the project cataloguer's prompting via an interface integrated with the WordPress admin dashboard, this plugin fetches a manuscript's preliminary record in MARC21 format (using the catalogue API) and converts it to MARCXML,[12] fetches images and links from the HathiTrust Digital Library (using the HathiTrust API),[13] parses the MARCXML data according to a specified template and generates a formatted manuscript description in the form of a blog post with element by element comment ability.

Figure 2. A manuscript description as a series of descriptive elements, each having links for submitting comments.

Two images with associated links to the manuscript's digital surrogate in the HathiTrust Digital Library are also displayed in the posted description.

Figure 3. Following an image link to the digital surrogate in the HathiTrust Digital Library.

The digitized manuscript may then be examined and the existing descriptive information enhanced or corrected. Project staff or contributing colleagues carry on this work and submit their enhancements to the descriptions as comments via the project website.

Figure 4. Comment submitted on a descriptive element (here the chief title) to supply information otherwise lacking.

All contributions are then reviewed by the project cataloguer – who may offer feedback and respond to any questions – before they are refashioned (as necessary) for incorporation into the catalogue records for those manuscripts, which serve as the final "published" descriptions.

In the final phase, the project cataloguer and staff focus on completing the manuscript descriptions. In particular, the physical examination and description for those manuscripts which have been examined only in the digital environment is undertaken, as well as full examination and description for any still uncatalogued manuscripts not slated for digitization. Details of the collection history and provenance are also more thoroughly investigated.


Data that cannot be reached and used cannot be considered valuable. It is important therefore to consider who might use our data and how they would reach it. We imagine our users to be: codicologists and palæographers, students of Arabographic manuscript studies, researchers of texts, authors, and other social and historical phenomena attested in the artifact as document, researchers of style and ornament, and even manuscript enthusiasts. These individuals may be graduate students, faculty, or independent researchers located anywhere in the world. They may hold any or no university or institutional affiliation.

If these users have some interest in the manuscripts specifically held by our Library, they might come to our Library gateway / online catalogue to search for these data. However, they are more likely to conduct their searches more broadly and elsewhere on the web, often through Google. Our approach therefore aims to facilitate search and discovery in a well-supported fashion in the library catalogue and beyond. Though the catalogue should perhaps be the chief repository for our manuscript data it should not be the only repository.

Despite the present limitations (in terms of display, types of data, data structuring, and even exposure of data), we choose to create, curate and deploy our data from within the environment of the online library catalogue. We believe that our catalogue should be a place via which users can find data on items in our collections (including the Islamic manuscripts), should they wish to. The established infrastructure and functionality of the library catalogue is well-serviced, well-designed, and well-resourced – that is, a great deal is being invested in its maintenance and development at our institution and within the larger library / information services community – and can easily be leveraged. More generally the notion of the "library catalogue" is supported by an established and evolving scholarly framework. The result is a cost-effective and reliable means for long-term preservation, improvements, and potential for discovery beyond the OPAC or even the Library gateway via data-sharing mechanisms like HathiTrust, WorldCat, MARCXML and eventually linked data.[14]

Likewise, despite the present limitations with navigation,[15] we choose to deploy our digital surrogates via HathiTrust because it is an established, well-resourced but evolving entity, capable of long-term preservation and delivery of full-color, high-resolution page images that may be viewed and downloaded individually or together as a complete pdf. In addition, its union catalogue allows us greater reach in terms of data dissemination through exposure of bibliographic record data to search engines.

Our project site, while designed chiefly to facilitate wide scholarly engagement with the manuscripts and the potential for "crowdsourcing" descriptive contributions, also serves to extend the reach of our manuscript data. For the moment, this includes only manuscripts (now digitized) which were uncatalogued at project start, but in future all manuscripts may be included. The site could stand as a longer-term forum for discussion and exchange surrounding scholarly engagement with the manuscripts, primarily via digital surrogates but potentially also via physical artifacts as readers visit the reading room.


Clearly, our project concept situates the scholarship of codicology and palæography in a collaborative environment that attempts to harness the potential of the digital to extend the Library's reach. Further, our approach is unusually iterative, which allows for data and object discovery while the descriptions are still evolving as well as affording the opportunity for many individuals to be involved in the process.

Still, it must be acknowledged that the work with which we are attempting to engage contributors is neither quick nor straightforward, but instead requires a marked level of diverse expertise. The "crowd" or number of "well-informed enthusiasts"[16] we would wish to engage is therefore quite small and widely distributed geographically and linguistically. We can offer little in return beyond acknowledgment on our website and in the "published descriptions" (i.e. the cataloguing records) for the manuscripts. Reaching and compelling the "crowd" with invitations to contribute in a timely fashion has thus been a challenge. We have relied primarily on listserv blasts, word/web-of-mouth, and tailored appeals to researchers on the basis of knowledge of their expertise, though a more distributed model of publicity/outreach may have been more effective. Nevertheless, to date we have received roughly 274 descriptive contributions from 35 scholars in Belgium, Egypt, France, Germany, Iran, Israel, Turkey, the UK and the US via our project site with the vast majority of the (quality) contributions being provided by two or three individuals.[17]Just as significantly, we have also been able to provide training (and powerful incentive) for the emerging scholars serving on our project staff, essentially forming our own, highly productive "local crowd."

The result is that almost all of the manuscripts are now represented by data-rich full or near full descriptions in the online catalogue. All 912 manuscripts initially slated for digitization are now represented by digital surrogates in the HathiTrust Digital library, and may be downloaded in their entirety. Additional manuscripts will be digitized in the future. Researchers are reaching our manuscripts (images and data) through the project site, HathiTrust, and even external websites for online forums and listservs, and they in turn are pointing their colleagues to our manuscripts. Needless to say, use and interest in the collection have increased exponentially as the project has progressed and we cannot help but look forward to the future.

