Project Description / Production

KEYBOARDING

Why Do EEBO Texts Need To Be Keyboarded?
The page images that make up Early English Books Online are displayed in a form legible to human users, but computers themselves see the words on EEBO's pages as simple squiggles that cannot be identified as letters. As a result, the search tools that users have come to associate with word processing and computer-based indices cannot sort the information on EEBO pages.

In many cases, Optical Character Recognition (OCR) software can transform image files into text files; when a text's lettering is not in a recognizable font, however, or when a page is otherwise obscured with ink smudges or wormholes, OCR software often fails to accurately render its image into a usable, searchable text (see this sample of how OCR reads an EEBO text). While it is conceivable that OCR software could be modified to "read" EEBO texts, the many variations in early modern typefaces make this an unrealistic option. Keyboarding, done by a person trained to identify the features of early modern texts, actually proves more cost effective.

SGML

Why SGML Encoding?
SGML encoding marks the structure and parts of a text, which enables easy and sophisticated searching. While simply being able to pick out keywords from an Early English text is a remarkable step forward, SGML encoding allows users to focus their queries more specifically. The tags added during the encoding process can, for example, permit users to look for the occurrence of a word only in the marginal notes of EEBO texts, for non-English terms as they appear in stage directions, or for proper names appearing in epigraphs.

Document Type Definition (DTD)
A DTD, or document type definition, provides guide to the various tags that may be used in encoding an XML/SGML text, showing when and how these tags may be used. While DTDs generally follow a standard form, they can be modified to fit the demands of an individual project.

Because EEBO contains so many different types of text, and because the corpus contains so many page images, the Text Creation Partnership DTD Working Group determined that the DTD reflect a low and fairly generic level of tagging. This practice, the Group decided, would move texts through the keyboarding process more quickly while also allowing for additional tagging in the future.