Descriptive Metadata: The TEI Header, MARC, and AACR2

Randall Barry, Library of Congress


  • Introduction
    • My brief talk will treat the relationship between the TEI Header, MARC, and AACR2.
    • We've decided to give all of these an umbrella under which to sit called "descriptive metadata."
    • I hope this will be just the beginning of discussions of the issues and solutions relating to handling this descriptive metadata for TEI documents.
    • I'll assume everyone here has heard of MARC, AACR2, and the TEI header. Since your level of familiarity with each may be different, let me take just a minute to define each of them for the purposes of my talk and for future reference when our discussions continue in the breakout sessions:
      1. MARC is a standard structure for encoding Machine Readable Cataloging data (most often bibliographic and authority, but, as of late, several other kinds of data);
      2. AACR2 is the set of rules used for collecting bibliographic data relating to library materials and for formulating access points (for authors, titles, subjects, related works, etc.);
      3. The TEI header consists of those elements within the TEI DTD designed to contain bibliographic data about the TEI instance.
    • You may be asking: "Why do we care about the relationship of these three things?"
      1. The answer to that question is that, without going too deep into any of them, it becomes pretty apparent that all three are talking about some of the same data.
    • Cataloging rules have been around the longest. They were developed to guide librarians in the creation of bibliographic entries in catalogs (catalog entries being surrogates for the actual items stored elsewhere).
    • MARC provided a way to put that cataloging information into machine-readable format and provided some additional information used by computers to enhance access (coded data, content designation).
    • Until the advent of electronic documents, the creation of bibliographic data, either on paper or in MARC records, had to be done from scratch.
    • Electronic documents with electronic title pages led naturally to the idea of "harvesting" bibliographic data into a MARC record automatically (or with little human intervention).
    • With only a minimal understanding of SGML markup, it's easy to see that there might be a relationship between the bibliographic elements (tags) defined for the TEI Header and the MARC data elements used to encode cataloging data collected following AACR2.
    • Before going any further, I'd like to mention one other standard (or group of standards actually), that influenced the development of AACR, MARC, and the TEI header.
  • International Standard Bibliographic Description (ISBD)
    • The ISBDs for various forms of material were developed by IFLA to establishe some agreed-upon groups of data which could be explicitly identified in bibliographic products (cataloging records).
    • The result of the intellectual effort to create a taxonomy of bibliographic data resulted first in the ISBD(M), later in ISBD for serials, maps, computer files, etc.
    • The young AACR were revised in 1964 to embrace ISBD; the MARC formats likewise made special accommodations for ISBD in certain parts of the bibliographic record, particularly in the descriptive fields (2XX-4XX).
    • The TEI header was also designed around functional groups of information identified in the ISBDs (with some differences; let's look at them!).
    • The ISBDs divide bibliographic data into manageable categories, including:
      1. Title and statement of responsibility area
      2. Edition area
      3. Material specific details area
      4. Publication, distribution, etc. area
      5. Physical description area
      6. Series area
      7. Notes area
      8. Standard number and terms of availability area
    • ISBD calls these "areas"; the TEI categories are called "statements."
    • TEI places area 4 above into its Publication Statement element.
    • TEI also includes ISBN area 8 in its Publication Statement element.
    • TEI adds a special statement covering the source of the electronic document (the Source Description element), which relates loosely to the linking entries in MARC, but which do not figure in the eight basic areas of ISBD.
  • The TEI Header
    • Since some of you may not be very familiar with the TEI header, let me take moment to describe it.
    • Its purpose is to encode information that describes the electronic document.
      1. The TEI header has four components (tags): the file description, encoding description, text profile, and revision history.
      2. Together the components function as "electronic preliminaries" (title page, verso, etc.) for an electronic text.
      3. The descriptive metadata is contained in the first element of the header, the file description. It is the minimum required by TEI. Some users of TEI will include one or all of the other three components of the Header.
  • Chicken or the Egg?
    • If AACR2, MARC, and the TEI header all share this common ISBD foundation, it's pretty easy to come to the conclusion that it should be possible to get information for one from one of the others.
    • It has been suggested that you could create MARC records from the TEI header; but data could move in the opposite direction as well.
    • Is going in one direction rather than the other better for bibliographic control?
    • There are a couple of points to consider (and these will hopefully form the basis of our further discussions):
      1. Bibliographic data, although based on information from a bibliographic item, incorporates data from other sources:
        • Authority files for access points
        • Cataloger-supplied data when data is missing or incorrect (missing date or typo in date are typical examples)
      2. Cataloging rules often prescribe massaging of bibliographic data for consistency:
        • Capitalization
        • Abbreviation/expansion (addition of qualifiers)
        • Reworking of order (place before publisher before date)
        • Addition/suppression of data (terms of address, etc.)
        • Transliteration/spelling modifications
    • Cataloging has always been a reactive creative task, not a pro-active controling one (despite efforts by libraries to influence publishers and writers to make better title pages!).
    • Is it feasible to think of MARC records being generated from information in the TEI instance (either from title page information, TEI header information, or both) with no cataloger intervention?
    • The desire to do data manipulations with MARC information has recently led to the development of the MARC DTDs and conversion tools.
  • The Real World
    • It's probably not feasible to create a perfectly acceptable (full) MARC record from data in a TEI document; Why?
      1. Data is lacking: authority controlled access points beyond authors' names (subjects, added titles, etc.).
      2. Most authors don't understand and wouldn't want to be bothered by the rules of cataloging when creating a document.
      3. Realistically, getting authors to understand the intended content of many of the TEI header elements may be a challenge.
        • A study of the feasibility of author-supplied header data is probably needed.
        • Tools to help authors create useful headers could help matters, but there are not a lot of them.
    • Some other points that need to be considered are:
      1. The actual content of MARC records varies from library to library.
      2. Although many libraries may have the same item, their bibliographic records often differ.
      3. Different classification systems can be used:
        • Dewey Decimal Classification
        • LC classification
        • Universal Decimal Classification
      4. Different subject thesauri are applied to content analysis.
      5. Some libraries have technical limitations to the amount of information they can process and store (records are abbreviated).
      6. Lastly, the cataloging profession would be resistent to the self-cataloging of materials, regardless of the quality (this takes into account the human sensibilities involved).
    • Differences in granularity:
      1. Although I've just finished suggesting all the things the TEI header, MARC, and AACR2 have in common, they are separate standards with different reasons for being (they are specialized standards!).
      2. The pieces of information they deal with exist at different levels of granularity.
      3. The three can not be perfectly aligned with each other on an element-by-element basis (you wouldn't want to!).
  • So; what to do?
    • Users of AACR2, MARC, and TEI (that is, catalogers and writers) are usually intelligent and reasonable.
    • Most catalogers have years of experience creating quality records under increasing pressure for productivity.
    • Writers, likewise, are constantly striving to improve their craft and make their writings more accessible to "their public" (well, most writers, I assume: Are there any writers who don't want to be read?)
    • TEI has certainly embraced the "value-added" that cataloging brings to library materials and has done so in a way that allows some of that value to be encoded in the document by the creator (that is, the writer).
    • It is believed that the creation of TEI documents would benefit from cataloger involvement in the encoding of the header itself (or perhaps in the harvesting of header data).
    • We need to explore the following possibilities:
      1. Should catalogers create the TEI header element content?
      2. Should catalogers be involved in the design of input tools to help writers to create more useful TEI headers (that meet the needs of cataloging standards)?
      3. Should cataloging rules and MARC be revised to accommodate information specific to electronic texts? (Some of this work has already been done.)
        • CCDA has already developed guidelines for describing electronic resources.
        • MARC data elements have been added to the formats to link and provide access to electronic resources (field 856).
  • Pilot Projects:
    • Over the next two days we will hear about work that is going on to bring the TEI Header, AACR2, and MARC even closer together (better synergy).
    • The University of Virginia has already done some experimentation with the creation of cataloging records from TEI encoded texts and Jackie Shieh will be talking to us about that.
    • Wider experimentation is needed involving other (new!) TEI users:
      1. LC has used TEI and TEI-like DTDs but, up to now, has not harvested cataloging data (electronically) to create MARC records.
      2. Other TEI users should be encouraged to experiment and share their good and bad experiences with other TEI users and potential users.
      3. The new MARC SGML DTDs and conversion utilities could help move data between TEI and MARC records.
  • Conclusions
    • There is wide agreement on the value of a DTD like TEI for textual documents.
    • The TEI header has great appeal to librarians and document creators because of its rich level of granularity and bibliographic usefulness.
    • The TEI header concept has been copied by other DTDs (EAD most recently), and could become a de facto standard implemented in other DTDs in the future.
    • The TEI header was developed with MARC and AACR2 (and the ISBDs) in mind. Now those standards may need to reciprocate a bit and consider the impact TEI (and other implementations of SGML) might have on library catalogs in the future.
    • The dream of automatic cataloging being derived from electronic texts is probably just that (a dream!), but out of dreams sometimes even more fantastic realities come to life. I hope the sharing of ideas at this meeting helps make some dreams come true!
    • Thank you for your attention while I talked, Now: Do you have any questions or comments?


Page maintained by Kat Hagedorn
Last modified: 03/05/2013