Project Description / Goals & Strategies
Project Goals:
The Universities of Michigan and
Oxford,
with the financial support of over seventy libraries worldwide,
are creating accurately keyboarded and SGML/XML encoded text editions
for a significant portion of the EEBO corpus. Known as the EEBO-Text
Creation Partnership (EEBO-TCP), this cooperative academic initiative
is producing legible and searchable encoded texts that link to corresponding
page images from ProQuest's EEBO product. For students and scholars,
this allows immediate search access to the content of thousands
of historically significant works, while retaining the cultural
context of the original print representation of the material. The
EEBO-Text Creation Partnership offers a number of important benefits
to the library community:
- Entrusts conversion of important but difficult works to the university community, supporting appropriate scholarly review and intervention;
- Draws upon community expertise to develop the scope and standards underlying such projects;
- Carries forward the work in a cost effective manner by distributing the costs across many academic institutions, as well as encouraging substantial contributions from commercial partners;
- Ensures that Partner libraries co-own the resulting textfile with robust rights to manage, re-use, and distribute the file as they see fit-including the right to distribute texts beyond their campus or community authenticated users.
Creating the Textfile:
Creating the Textfile: Works selected for conversion are identified
monthly by staff at the University of Michigan in accord with
criteria
established by a Task Force convened in 2000. ProQuest digital
facsimile pages are then made available to keyboarding vendors
directed by staff at the Universities of Michigan and at Oxford
University. When the keyboarding and SGML/XML tagging is completed
by the vendor, the file is transferred back to Michigan or Oxford
for proofing and SGML tag review, to ensure that the work conforms
to the established standard of 99.995% character accuracy. Works
that are accepted are then matched to catalog records to create
bibliographic headers and added to the existing online collection.
Currently, between two and three hundred titles per month are
added to the textfile. We are often asked if it wouldn't be possible
to make EEBO texts searchable through optical character recognition.
Our belief, backed by some early testing (link to Raleigh title
page), is that OCR would not produce an acceptable or cost effective
result. Keyboarding and tagging also provide the following benefits
that are particularly well suited to early texts in a large corpus
like EEBO:
- Because the text is accurate, it can be displayed (unlike in most OCR based projects) and hence provides a legible reading copy of the EEBO texts that, because of early fonts and printing in the original, can be difficult for novice readers to decipher.
- Word and phrase searching is not only more accurate, results are also displayed in context of surrounding text to help sort through a large number of returns.
- Tagging allows for more precise searching such as limiting searches to titles, headings, notes, stage directions, captions, acts, verses, etc.
- Tagging also renders a browsable structure to any text, analogous to a table of contents, by producing a hierarchy of titles, sections, chapter headings and sub headings.
- The willingness to display accurately keyboarded texts allows the reader to access an index of all words in the corpus-or in a designated work- that can serve as an index, concordance, or a check on variant spellings and word forms.
- Standard tagging of the texts allows the corpus to be combined with other corpora tagged to the same standard, hence allowing the reader to search across multiple collections.
EEBO Content
The EEBO corpus consists of the works represented in the English Short Title Catalog I and II (based on the Pollard & Redgrave and Wing short title catalogs), as well as the Thomason Tracts. Together these trace the history of English thought from the first book printed in English in 1475 through 1700. The content covers literature, philosophy, politics, religion, geography, science and all other areas of human endeavor. The assembled collection of 125,000 volumes is a mainstay for understanding the development of Western culture in general and the Anglo-American world in particular. The STC collections have always found strong proponents in English, linguistics, and history but the collections are all encompassing in coverage, including core texts in religious studies, art, women's studies, history of science, law, and music.
The following are but a small sampling of the seminal authors whose works are included:
Erasmus,
Shakespeare,
King James I,
Marlowe,
Galileo,
Caxton,
Chaucer,
Malory,
Boyle,
Newton,
Locke,
More,
Milton,
Spenser,
Bacon,
Donne,
Hobbes,
Purcell,
Behn,
and
Defoe.
Licensing and Access
The EEBO-TCP project is notable for creating quality electronic
editions of culturally significant content of enduring value.
It is also notable for doing so under terms that foster scholarly
use and widespread access. Partner institutions are co-owners
of the textfile and are entitled to copies of that file for local
loading and management. As owners of the file, having funded its
creation, partners can treat the file as if locally created and
can distribute the texts to an audience beyond strictly authorized
users if it should choose to open its servers in this way. Likewise,
the license allows scholars to use texts in their entirety to
reproduce or create new editions. Our cooperative agreement with
ProQuest is intended to protect their investment in the EEBO project
while maintaining the principle of public domain access to early
texts.
For partner institutions not yet prepared to support a local implementation of searchable EEBO-TCP texts, access is presently provided without charge by libraries at the Universities of Michigan and Oxford. Users can search TCP editions at these sites and retrieve both relevant text portions and corresponding page images. ProQuest is also working to develop an interface for searchable text and will soon be in a position to provide access to the subset of keyboarded and tagged texts, along with page images of the entire collection.
Benefits for Scholarly Researchers:
Word and phrase searching of the EEBO corpus provides a new research dimension never before available through print, microfilm or digital page facsimiles. Scholars are now able to pinpoint references to subjects, people or places that would not be indicated in a brief bibliographic citation. The search interface also allows scholars to uncover word patterns and other literary or linguistic forms across texts. Whether a user is seeking contemporaneous references to people or events, tracing citations to classical authors like Aristotle, or quickly finding known quotes, these thousands of searchable texts open up an array of research possibilities that was unthinkable when the texts were only accessible by author, title and broad subject.
Benefits for Teachers and Students:
The ease of access that EEBO and the EEBO-TCP offers to texts once confined to rare originals and microfilm has already made the corpus a significant part of classroom teaching on a number of campuses. The EEBO in Education pages offer compelling examples of how teachers at both the undergraduate and graduate levels have used the corpus to introduce students to texts, both canonical and lesser known, as they appeared to their first readers. As these EEBO works become searchable, they become even easier to use in the classroom. Students can readily find references to the "great fire" of London, or remedies for common diseases, benefiting from clearly legible text with instant access to original illustrations and typefaces.