How to Review EEBO Materials

Some of this may be more easily done using a parsing editor with display options (e.g. XMetaL).

  1. VALIDATION. Validate the file at the command line (using the "v" or "vmore" batch file). Those who plan to edit the file with a text editor (e.g. TextPad) will need to save and revalidate frequently to check their edits. Those who plan to edit primarily with XMetaL should still validate first in order to check for invalid code that will confuse XMetaL (if XMetaL gets confused it may do some very odd things like truncating the file or inserting </EEBO> tags in the middle). For the latter purpose, errors involving attributes and character entities may be ignored, but errors involving structure ("document type does not allow element 'P' here") should be repaired in TextPad before loading the file into XMetaL for the first time. The file should also be validated when you are done with it. No files with parsing errors should be passed on as 'done'--with the possible exception of files with explicitly described character-entity problems (e.g.: 'text uses Greek digamma, but no character entity available; used "&digamma;" for the time being; please replace with appropriate character").

  2. HEADER and DOCTYPE. Add a TEMPHEAD (copy and paste a complete temphead template, in so doing replacing the existing doctype declaration with the one attached to the temphead.

    The existing templates include a good deal of boilerplate. Pieces of it that do not apply can be deleted. This area is used to record anything distinctive done to the text, or anything left undone, e.g. "blackletter text should have been tagged as HI throughout, but wasn't." Feel free to edit the templates if you find that they do not accurately reflect the most common tasks that you find yourself performing in the books. Many reviewers use the template as a quick checklist of things to do and look for. If you do not use it this way, you may prefer to use a very shortened form of template.

  3. TITLE PAGES. Proof title page(s): remove excess P tags (or, more rarely, insert additional P tags) and HI tags that record typeface changes used only for decorative effect. Generally, Ps should demarcate title/author and publisher/date. Title pages with lengthy sub-titles, epigraphs, etc. will usually require more Ps.

  4. STRUCTURE. Page through the original book in order to get a sense of its structure. Pay particular attention to the usual cues: table(s) of contents and summaries, if any; "heads" and "feet" (and marginal text that serves the same purpose) that indicate the beginning and end of something; and numerical or sequential clues ("Firstly, Word"; "Secondly, Sacrament").

    Compare the structure applied by the vendor and correct to match the book. Detailed multilevel structural hierarchies can often be left unmarked if it proves too much trouble to capture them, but this decision should be made only after you determine to your satisfaction what the real structure is and how much would be sacrificed.

    Typical vendor problems:
    • missing the lower levels in a multi-level hierarchy
    • treating whole poems as if they were merely stanzas (LGs)
    • missing signs of subordination (i.e., putting two sections at the same level instead of making one subordinate to the other)
    • using or abusing DIVs when what is really needed is <Q><TEXT><BODY> ... </BODY></TEXT></Q>.

  5. HEADS and FEET. Divisions usually have something at their head or foot, or both, to mark them off. We have various tags for these things (TRAILER, HEAD, ARGUMENT, EPIGRAPH, SIGNED, OPENER, CLOSER, etc.). It is usually convenient to check these at the same time that you are considering the structure.

  6. VALIDATION. Add TYPEs to DIVs using your knowledge of the overall structure as well as any local designations in the text.

    In textpad, using Find in Files to search for <DIV[^>]*> (with binary, all matching lines, and regular expression checked) will provide a list of DIVs with TYPEs. In XMetaL, the style sheet editor can be used to force display in the text of selected attribute values. Display of TYPE and the N and REF attributes of the PB tag is recommended.

    Lack of TYPEs should be primary (often the only) reason that a file fails to validate. pursue invalid bits one by one till the file validates.

  7. PAGE BREAKS. Check placement of PBs: move PBs inside DIVs, Ps, Qs, LGs, Ls, ROWs, ITEMs, (but not inside HIs), if the beginning of the content-bearing element is coincident with the beginning of the page. That is, if the page begins with something that is tagged, tuck the PB inside the beginning of it; if it begins with two tags (say, DIV1 and HEAD) put the PB only inside the first. If a blank page precedes, put in BOTH PBs, one right after the other.

    Check completeness of PBs. (In Find in Files, find <PB[^>]*> with Regular expression and Binary files checked. The resulting list should show a PB for every page in the file, including blank pages at beginning and end.

    If the image set includes two (or more!) copies of the same page, choose one to capture and omit the other(s); mark the uncaptured page <GAP DESC="duplicate" EXTENT="1 page"> and a <PB> tag.

    Note: Sometimes the last image in the set includes the FIRST page of a book that was bound with the one that you are working on. Omit this material and treat the page as a blank flyleaf. Similarly, the first image in a set will sometimes include the last page of some other book. Treat this also as a blank flyleaf. Include a PB tag but no text.

    Typical vendor problem: Tech especially, when it finds duplicate pages, omits the text (as they should), marks the spot with a GAP tag (as they should), but then forgets to include a PB tag, and instead numbers the other PB REF values straight through, with the result that the REF numbers get out of synch with the correct numbers.

  8. REVIEW PROOFSHEETS. Check proofsheets and correct. The proofsheets will usually have a few character-errors, excusable or not, that need correcting, along with spacing problems. Correct these. Use the proofsheets as indicators of possible other "global" problems: if a U is captured as a V on a proofsheet, it is worth checking others in the file; if numbered Ps are captured as ITEMs on a proofsheet, it is worth checking ITEMs throughout. If a note is seriously misplaced (or notes are inconsistently placed) on the proofsheet, the same is probably true throughout.

  9. COMMON MINOR CHECKS.
  10. GAPS and ILLEGIBILITIES.

    A brief sample will show whether the MUSIC, MATH, and FOREIGN gaps are correctly used. Check spacing around <GAP DESC="foreign">. Early files often need spaces added on each side.

    Illegibilities are harder. You may find individual letters marked as $, groups of letters marked as strings of $s (e.g. Lo$$on for "London") illegible words marked as $word$ or $$word$$, and pages, lines, and spans of text marked as $page$ (or $$page$$), $line$ (or $$line$$), and $span$ (or $$span$$).

    Tne notes file should already contain a count of illegibilities of most the most common types. Searching for (regular expression, binary, file count only) \$[^ ]*\$? should confirm the overall count, which is the most important one: if there are fewer than 100 $-groups in the file, correct by examining each. If there are more than 100 $-groups, do not normally correct them; instead replace globally with <GAP DESC="illegible" RESP="tech"> [or RESP="apex" etc.]), with EXTENT set appropriately. An unqualified number means number of characters ("3" means "3 characters"); a word is indicated by "1 word", "3 words", etc.; "1 line"; etc.

    The global replacement of illegibility markers ($ etc.) with <GAP> tags is most easily done by running the batch file "skint.bat" either at the command line or from within TextPad (via the tools/run menu). At the command line, this requires typing (e.g.) C:\pfs>skint S1234.apex.sgm (i.e., skint followed by the filename.). This file edits the sgm file 'in place' and saves the unmodified version in the same directory with the extension .bak.

    If you need to replace $s globally manually, it is best to work down; e.g. replace

    regular expression:

    \$+word\$+ with <GAP DESC="illegible" EXTENT="1 word" RESP="[vendor]">

    \$+line\$+ with <GAP DESC="illegible" EXTENT="1 line" RESP="[vendor]">

    \$+para\$+ with <GAP DESC="illegible" EXTENT="1 paragraph" RESP="[vendor]">

    \$+page\$+ with <GAP DESC="illegible" EXTENT="1 page" RESP="[vendor]">

    \$+span\$+ with <GAP DESC="illegible" EXTENT="1 span" RESP="[vendor]">

    normal replace:

    $$$$$$ with <GAP DESC="illegible" EXTENT="6" RESP="[vendor]">
    $$$$$ with <GAP DESC="illegible" EXTENT="5" RESP="[vendor]">
    $$$$ with <GAP DESC="illegible" EXTENT="4" RESP="[vendor]">
    $$$ with <GAP DESC="illegible" EXTENT="3" RESP="[vendor]">
    $$ with <GAP DESC="illegible" EXTENT="2" RESP="[vendor]">
    $ with <GAP DESC="illegible" EXTENT="1" RESP="[vendor]">

    [These regexps assume Textpad; XMetaL also has a regular expression language, slightly different; see manual]

    If you're resolving illegibilities individually, you'll find that many can be read (given contextual information) with at least 95% certainty. Feel free to insert the correct character in such cases based on context, so long as the physical form remaining does not contradict your conclusions as to the the correct character. Do not attempt to supply a character when there is nothing in the original at all, no matter how correct or inevitable it might be. Those that cannot be resolved should be replaced by <GAP DESC="illegible" EXTENT="1"> (or whatever extent applies). Optionally, you may add the reason for the illegibility if that is ascertainable. Possible values for the REASON attribute include: over-inked, under-inked, blotted, faint, in gutter, page cropped, page torn, missing, broken type, bleedthrough, overwritten, scratched out, damaged, and left blank.

    Other problems with illegibility may require creative solutions, and they are too various to be listed here.

  11. Ensure that any problems noted in comments at the head of the file (occasionally supplied by vendors) have been resolved.

  12. Ensure that all notable and peculiar features of the text (as observed while doing routine proofing and review) have been adequately captured. Refer to the guidelines and to the online tips, emails, etc., or ask advice from colleagues and supervisors. Common problematic features include tables, lists, indexes, figures, extended quotations, dialogues, dramatic features, stanzaic and verse structure (long lines or short? carry-over lines?), epigraphs, headings, and arguments, figures within figures, genealogies, etc.