Proofing EEBO materials

  1. Pick an available file from the inbox (F:\markup\eebo\work\todo\[latest month's folder]). Move it (i.e., do not simply copy it) into your in-progress box in F:\markup\eebo\work\ (mona, Judith, Amanda, etc.) using Windows Explorer. You may use this as your working directory, or (recommended) make an additional working copy on C: and use the copy in your in-progress box on F: as a secure backup, to be updated after each day's work. If you make any radical changes, it is usually a good idea to make several backups, so as to allow you to go back to an earlier version if you make a hash of things.

  2. At the Windows command prompt,
    1. CD to the working directory (on C: if the advice above was followed)
    2. TYpe 'dir' to remind yourself of the filename
    3. Type 'sample' followed by a space and the filename of the .sgm file, e.g. C:\pfs>sample S1234.apex.sgm
    In most cases this will accomplish several things: it will extract a sample from the sgm file; convert the sample to HTML to allow you to print it; create a NOTES file with most of the vital numbers already filled in; and download the page-images from the ProQuest site in pdf form.

    If the pdf download fails, proceed to step 3, otherwise skip to 4.

  3. Download the complete image set for the book in .pdf format.
    1. Look at the head of the .sgm file or the appropriate line in the NOTES file, find the number tagged as <VID> (e.g. <VID>12345</VID>), and copy it your clipboard (Ctrl-C).
    2. Direct a browser of your choice to F:\markup\code\batch\getpdf.html and follow the directions there. [Oxford, this is at P:\code\batch\getpdf.html; Toronto, at ??] (You will be asked to type in the VID: just copy it in). This should start the download process. (I reserve a copy of Netscape 4 for this process, since it is easy to set Netscape 4 to break the association of pdf files with Acrobat and instead simply download all pdf files). Save the file in your working directory.
    3. Note: if for some reason this procedure reaches the wrong book (or no book at all), this means that the numbers have been wrongly assigned for some reason, or that there is some database linkup problem at ProQuest: no further work should be done on the file till the problem is resolved.

  4. Open the test.html file in a browser and print it. (In some cases it will not display satisfactorily. You may need to hand-edit the html. If you can identify the problem, and figure out to prevent it next time, you may want to edit the perl script eebo2htm.pl)

  5. Open the notes file in TextPad. At this point, many people print the notes file, attach it as a coversheet to the sample, and use it to write notes on. Others prefer to enter all their comments electronically and do not print out the notes file and attach it till the end of the review process.

  6. Open the pdf in Adobe Acrobat. Proof the printed-out sample file against it, being careful to distinguish:

    1. mistranscriptions that should have been avoided (using the online documents that supply the principles and examples of "inexcusable" errors)

    2. mistranscriptions that could not reasonably have been avoided (same principles).

    3. completely legible letters or words which are illegitimately flagged as illegible. Adopt a generous and forgiving spirit in compiling this count.

    4. letters (or words) that are more or less completely missing in the original, but which the vendor supplied anyway. Doesn't happen much any more.
    5. spacing errors, e.g. "spa cingerr or"

    NOTE: Of these five categories, only errors in categories (a) and (c) should be counted against the vendor's allowable error rate.

    NOTE: when a book is very bad, you may be able to stop proofing early. E.g. if the stripped sample size is 20,000 bytes, the vendor is allowed only one error. If the number of clearly inexcusable errors counted exceeds this (especially if it exceeds it by more than one), you may stop proofing, since the file has already failed. If this happens you should add a note to the notes file of this sort: "stopped proofing after 5 pages because of excessive errors." Some judgment is called for in doing this; we prefer not to reject books without good evidence, and rarely reject on the basis of (say) a couple of missing full stops, or an s/S case error.

    NOTE: We usually "give" the book a free error. That is, for small books, we do not reject it for just one error, regardless of size; and even for larger books, we discount one error in deciding whether to reject, accept, or pardon.

    NOTE: very small books (books of 5-10 pages), and many slightly larger books (10-25 pages) we will usually pardon regardless of how many errors they have, since we will have effectively proofed the entire book anyway.

  7. When proofing of sample is complete, enter appropriate number in each category in the notes/coversheet file. And decide on that basis (as well as more tenuous grounds, if appropriate) whether to ACCEPT, REJECT, or PARDON the book. If you ACCEPT or PARDON the book, move on to the review stage; if you REJECT it, go directly to the record-keeping ('end') stage.