Within the online library catalogue and beyond: Islamic Manuscripts at Michigan

by Evyn Kropf

(prepared 24 August 2012)


Since January 2009, we at the University of Michigan Library have been engaged in a grant-funded project to complete the cataloguing of our Islamic Manuscripts Collection.[1]Supported by a "Cataloging Hidden Special Collections and Archives" grant administered by the Council on Library and Information Resources with generous funding from the Andrew W. Mellon Foundation,[2] the project engages established and emerging scholars (at various levels of expertise in Islamic codicology and palæography) in the cataloguing process – training, examination, description, and generation of searchable bibliographic and codicological metadata – for the collection of roughly 1,090 manuscript volumes dating from the 9th to the 20th century CE and containing texts primarily in Arabic, Persian and Turkish.

Our chief goals with the project have been:

  • to enrich scholars' knowledge of codicology and palæography
  • to enhance intellectual access to the collection through creation and dissemination of searchable, web-discoverable descriptions and digital surrogates
  • and to compile a database of bibliographic and codicological data that may serve to further research and scholarship in Near Eastern studies and Islamicate manuscript studies in particular.

Our descriptive approach aims at "full" codicological description with a suite of elements that characterize not only the contents of the codices (text, paratext and ornament) though transcription, notes and headings, but also their form via further notes addressing the script and hand, structure (composition of gatherings, sewing, and cover), dimensions, writing surface, layout and other physical aspects.[3] Particular attention is given to the evidence for the history of the manuscript as attested in manuscript notes and in changes to structure through addition or repair. The data elements representing each manuscript are situated in a searchable, electronic, bibliographic record within our online Library catalogue,[4] and where possible accompanied by a link to a digital surrogate in the HathiTrust Digital Library.[5] This approach to data curation and deployment is motivated by the belief that our online library catalogue should be the chief (but not the only) repository for representation of the items in our Library collections. The intention is for each catalogue record to be as textually rich as possible, with a special emphasis on standardized headings for browsing and faceting, and transcriptions in the Arabic script (whereas English is used for descriptive notes and headings). Structured together within the Library catalogue, these records form a database of valuable bibliographic and codicological data.

Workflow and Technical Infrastructure

Our descriptive approach obviously presents challenges with data creation – namely, the anything-but-trivial task of generating the full manuscript descriptions – and also with data curation and deployment, i.e. formatting the data in such a way that it will be accessible in all of the preferred environments.

Thus, to realize our descriptive approach within the context of our project goals, we devised an iterative, collaborative scheme that leverages the potential of the digital environment to facilitate the cataloguing process in a way that engages as many scholars as possible. Emerging scholars from the University of Michigan are engaged as project staff and receive foundational training in codicology and palæography.[6]Emerging and established scholars from the University and around the world are engaged as contributors through the project website, a digital platform that facilitates remote examination and commentary on descriptive elements.[7] Data posted there can be exposed to search engines and retrieved by researchers browsing the Web. Thus, the efforts of University of Michigan project staff are supplemented by contributions from colleagues around the world and the evolving manuscript data is distributed beyond the library catalogue.

In the initial phase of the approach, existing inventory data in the form of a card-format handlist are converted to preliminary bibliographic records designed for display in the Library's online catalogue.

Figure 1. Preliminary bibliographic record carrying limited descriptive information as it is displayed in the Library's OPAC.

Project staff use a text editor to create the bibliographic records in MARC21 format. The project cataloguer then edits, enhances and inputs the records to the Library's Integrated Library System (ALEPH). The arrangement of descriptive elements is governed by a modified version of the AMREMM standard[8](developed by the project cataloguer as inspired by Adam Gacek's approach[9]) and Library of Congress Romanization schemes (chiefly for Arabic, Persian and Turkish).

Also during this phase, digital surrogates are created for as many manuscripts as possible and ingested (with companion metadata) for delivery via the HathiTrust Digital Library. Manuscripts that cannot be digitized are examined and described by project staff relying on collaborative tools like Google Docs and Picasa web albums. A project website elaborating on the blogging paradigm is also developed,[10]with the project cataloguer collaborating with the project web developer (Nancy Moussa) to develop several templates of descriptive elements compatible with MARC21 and the MARCXML schema.

In the next phase, the descriptive elements are harvested from the preliminary records created for each digitized manuscript and combined in a representative description that is posted to the project website by the project cataloguer. A custom plugin (Mblog) created by the project web developer is used within the WordPress blogging engine with CommentPress theme.[11] At the project cataloguer's prompting via an interface integrated with the WordPress admin dashboard, this plugin fetches a manuscript's preliminary record in MARC21 format (using the catalogue API) and converts it to MARCXML,[12] fetches images and links from the HathiTrust Digital Library (using the HathiTrust API),[13] parses the MARCXML data according to a specified template and generates a formatted manuscript description in the form of a blog post with element by element comment ability.

Figure 2. A manuscript description as a series of descriptive elements, each having links for submitting comments.

Two images with associated links to the manuscript's digital surrogate in the HathiTrust Digital Library are also displayed in the posted description.

Figure 3. Following an image link to the digital surrogate in the HathiTrust Digital Library.

The digitized manuscript may then be examined and the existing descriptive information enhanced or corrected. Project staff or contributing colleagues carry on this work and submit their enhancements to the descriptions as comments via the project website.

Figure 4. Comment submitted on a descriptive element (here the chief title) to supply information otherwise lacking.

All contributions are then reviewed by the project cataloguer – who may offer feedback and respond to any questions – before they are refashioned (as necessary) for incorporation into the catalogue records for those manuscripts, which serve as the final "published" descriptions.

In the final phase, the project cataloguer and staff focus on completing the manuscript descriptions. In particular, the physical examination and description for those manuscripts which have been examined only in the digital environment is undertaken, as well as full examination and description for any still uncatalogued manuscripts not slated for digitization. Details of the collection history and provenance are also more thoroughly investigated.


Data that cannot be reached and used cannot be considered valuable. It is important therefore to consider who might use our data and how they would reach it. We imagine our users to be: codicologists and palæographers, students of Arabographic manuscript studies, researchers of texts, authors, and other social and historical phenomena attested in the artifact as document, researchers of style and ornament, and even manuscript enthusiasts. These individuals may be graduate students, faculty, or independent researchers located anywhere in the world. They may hold any or no university or institutional affiliation.

If these users have some interest in the manuscripts specifically held by our Library, they might come to our Library gateway / online catalogue to search for these data. However, they are more likely to conduct their searches more broadly and elsewhere on the web, often through Google. Our approach therefore aims to facilitate search and discovery in a well-supported fashion in the library catalogue and beyond. Though the catalogue should perhaps be the chief repository for our manuscript data it should not be the only repository.

Despite the present limitations (in terms of display, types of data, data structuring, and even exposure of data), we choose to create, curate and deploy our data from within the environment of the online library catalogue. We believe that our catalogue should be a place via which users can find data on items in our collections (including the Islamic manuscripts), should they wish to. The established infrastructure and functionality of the library catalogue is well-serviced, well-designed, and well-resourced – that is, a great deal is being invested in its maintenance and development at our institution and within the larger library / information services community – and can easily be leveraged. More generally the notion of the "library catalogue" is supported by an established and evolving scholarly framework. The result is a cost-effective and reliable means for long-term preservation, improvements, and potential for discovery beyond the OPAC or even the Library gateway via data-sharing mechanisms like HathiTrust, WorldCat, MARCXML and eventually linked data.[14]

Likewise, despite the present limitations with navigation,[15] we choose to deploy our digital surrogates via HathiTrust because it is an established, well-resourced but evolving entity, capable of long-term preservation and delivery of full-color, high-resolution page images that may be viewed and downloaded individually or together as a complete pdf. In addition, its union catalogue allows us greater reach in terms of data dissemination through exposure of bibliographic record data to search engines.

Our project site, while designed chiefly to facilitate wide scholarly engagement with the manuscripts and the potential for "crowdsourcing" descriptive contributions, also serves to extend the reach of our manuscript data. For the moment, this includes only manuscripts (now digitized) which were uncatalogued at project start, but in future all manuscripts may be included. The site could stand as a longer-term forum for discussion and exchange surrounding scholarly engagement with the manuscripts, primarily via digital surrogates but potentially also via physical artifacts as readers visit the reading room.


Clearly, our project concept situates the scholarship of codicology and palæography in a collaborative environment that attempts to harness the potential of the digital to extend the Library's reach. Further, our approach is unusually iterative, which allows for data and object discovery while the descriptions are still evolving as well as affording the opportunity for many individuals to be involved in the process.

Still, it must be acknowledged that the work with which we are attempting to engage contributors is neither quick nor straightforward, but instead requires a marked level of diverse expertise. The "crowd" or number of "well-informed enthusiasts"[16] we would wish to engage is therefore quite small and widely distributed geographically and linguistically. We can offer little in return beyond acknowledgment on our website and in the "published descriptions" (i.e. the cataloguing records) for the manuscripts. Reaching and compelling the "crowd" with invitations to contribute in a timely fashion has thus been a challenge. We have relied primarily on listserv blasts, word/web-of-mouth, and tailored appeals to researchers on the basis of knowledge of their expertise, though a more distributed model of publicity/outreach may have been more effective. Nevertheless, to date we have received roughly 274 descriptive contributions from 35 scholars in Belgium, Egypt, France, Germany, Iran, Israel, Turkey, the UK and the US via our project site with the vast majority of the (quality) contributions being provided by two or three individuals.[17]Just as significantly, we have also been able to provide training (and powerful incentive) for the emerging scholars serving on our project staff, essentially forming our own, highly productive "local crowd."

The result is that almost all of the manuscripts are now represented by data-rich full or near full descriptions in the online catalogue. All 912 manuscripts initially slated for digitization are now represented by digital surrogates in the HathiTrust Digital library, and may be downloaded in their entirety. Additional manuscripts will be digitized in the future. Researchers are reaching our manuscripts (images and data) through the project site, HathiTrust, and even external websites for online forums and listservs, and they in turn are pointing their colleagues to our manuscripts. Needless to say, use and interest in the collection have increased exponentially as the project has progressed and we cannot help but look forward to the future.

[2] More on the grant program, selection criteria, funded projects, etc. may be found at "Cataloging Hidden Special Collections and Archives"http://www.clir.org/hiddencollections/index.html.

[3] See Jan Just Witkam, "Aims and Methods of Cataloguing Manuscripts of the Middle East," In Les manuscrits du Moyen-Orient: essais de codicologie et paléographie. Actes du Colloque d'Istanbul (Istanbul, 26-29 mai 1986). Ed. François Déroche, (Paris: Institut français d’études anatoliennes et Bibliothèque Nationale, 1989): 1–5; Annie Berthier and Marie Geneviève Guesdon, "Codicology and the History of Collections," In François Déroche, et al. Islamic Codicology: an Introduction to the Study of Manuscripts in Arabic script. (London: Al-Furqān Islamic Heritage Foundation, 2006): 345–360; and Adam Gacek, "Appendix V. Describing the Manuscript," In Adam Gacek, Arabic manuscripts: a vademecum for readers, (Leiden: Brill, 2009): 333–338.

[4] Mirlyn (http://mirlyn.lib.umich.edu/) accessible through the MLibrary gateway at http://www.lib.umich.edu/.

[6] This foundational training has included participation in a week-long workshop addressing Arabic manuscript studies (palæography and codicology) conducted by Adam Gacek in May 2009 and a multi-session workshop addressing Islamic bindings, book structures, materials and condition conducted by Julia Miller in February 2010. Training continues with extensive further readings and hands-on experience under the guidance of the project cataloguer at each stage of the project.

[7] Inspired by the "crowdsourcing" spirit beginning to take off in cultural heritage communities at the time of project proposal. "Crowdsourcing" projects are now rampant in library circles with mixed results that depend a great deal on interpretation in light of project aims. See Rose Holley, "Crowdsourcing: how and why should libraries do it?" D-Lib Magazine, 16 (3/4) 2010; Christine Madsen, "Will 2011 be the year of crowdsourcing in libraries?" Posted 30 December 2011,http://christinemadsen.com/2011/will-2012-be-the-year-of-crowdsourcing-in-libraries/; Johan Oomen and Lora Aroyo, "Crowdsourcing in the Cultural Heritage Domain: Opportunities and Challenges," In Proceedings of the 5th International Conference on Communities and Technologies, Brisbane, QLD, Australia, June 29 - July 02 (New York: ACM, 2011): 138–149; Ben Brumfield's "Lessons from Small Crowdsourcing Projects" talk at IMLS WebWise 2012 (transcribed in a blog post dated 17 April 2012, "Crowdsourcing at IMLS WebWise 2012,"http://manuscripttranscription.blogspot.com/2012/03/crowdsourcing-at-imls-webwise-2012.html) and a series of recent blog posts by Trevor Owens (http://www.trevorowens.org/tag/crowdsourcing/).

[8] That is, descriptive cataloging of Ancient, Medieval, Renaissance and Early Modern Manuscripts, created as a supplement to the second edition of the Anglo American Cataloging Rules (AACR2). See Gregory Pass, Descriptive cataloging of ancient, medieval, Renaissance, and early modern manuscripts, Chicago: Association of College and Research Libraries, 2003.

[9] Again, see Appendix V of his Vademecum.

[14] For more on this latest wave in library data curation and dissemination, see Karen Coyle, "Library Data in the Web World," Library Technology Reports 46, 2 (2010): 5-11 and Karen Coyle, Linked Data Tools: Connecting on the Web. Library Technology Reports 48, 4 (2012).

[15] Right-to-left languages still present some challenges and it is currently not possible to link directly to or collate images of elements of interest, such as opening, illuminated headpiece, manuscript note, etc. In the future, aggregated image page tags might do the trick.

[16] To use Rachel Stone's term (see "What Can the Vulgus Do?" posted 17 August 2010, http://magistraetmater.blog.co.uk/2010/08/17/what-can-the-vulgus-do-crowd-sourcing-for-medievalists-9195007/).

[17] According to the typical power distribution as Ben Brumfield has also reported; again, see his IMLS WebWise 2012 talk, "Lessons from Small Crowdsourcing Projects."