Project Description / Production
KEYBOARDING
Why Do EEBO Texts Need To Be Keyboarded?
The page images that make up Early English Books Online are displayed in a form legible to human users, but computers
themselves see the words on EEBO's pages as simple squiggles that cannot be identified as letters. As a result, the
search tools that users have come to associate with word processing and computer-based indices cannot sort the
information on EEBO pages.
In many cases, Optical Character Recognition (OCR) software can transform image files into text files; when a
text's lettering is not in a recognizable font, however, or when a page is otherwise obscured with ink smudges or
wormholes, OCR software often fails to accurately render its image into a usable, searchable text (see this sample of
how OCR reads an EEBO text). While it is conceivable that OCR software could be modified to "read" EEBO texts, the many
variations in early modern typefaces make this an unrealistic option. Keyboarding, done by a person trained to identify the
features of early modern texts, actually proves more cost effective.
SGML
Why SGML Encoding?
SGML encoding marks the structure and parts of a text, which enables easy and sophisticated searching. While simply being able to pick out keywords from an Early English text is a remarkable step forward, SGML encoding allows users to focus their queries more specifically. The tags added during the encoding process can, for example, permit users to look for the occurrence of a word only in the marginal notes of EEBO texts, for non-English terms as they appear in stage directions, or for proper names appearing in epigraphs.
Document Type Definition (DTD)
A DTD, or document type definition, provides guide to the various tags that may be used in encoding an XML/SGML text, showing when and how these tags may be used. While DTDs generally follow a standard form, they can be modified to fit the demands of an individual project.
Because EEBO contains so many different types of text, and because the corpus contains so many page images, the
Text Creation Partnership DTD Working Group determined that the DTD reflect a low and fairly generic level of tagging. This practice, the Group decided, would move texts through the keyboarding process more quickly while also allowing for additional tagging in the future.