ENTANGLEMENT:
transcription and markup in the hermeneutic circle

Example: what is this character?
Contextual characters and ambiguous glyphs.








Windows XP

"Whilst suffering in ffraunce, he thought always of the proverb 'Pan darffo treiglo pob tre / Da yw edrych tuag adre.'"








Example: what is this character?
False friends and diachrony.

1. Character splits (U/V/VV I/J)

2. Character merger/replacement (The disappearance of THORN into Y)





3. Character specialization of "DEGREE"





4. Character mergers (The various affiliations of ZED)

AppearanceMeaningGraphemeCapture
zodiaczodiacLATIN LETTER Zzodiac
omnezomnemLATIN LETTER Momnem
page zpage 3ARABIC NUMERAL 3page 3
zet[y]etLATIN LETTER YOGH&yogh;et
waszwasLATIN LETTER Swasz
z xvixvi [drams]SYMBOL DRAM&dram; xvi
15 sz.15 shillings{abbr mark
rare collocation}
15 <ABBR EXPAN="shillings">s.</ABBR>
ii sz.ii 1/2{abbr mark
rare collocation}
ii <ABBR EXPAN="semis">s.</ABBR>
qzquart{abbr mark
rare collocation}
<ABBR EXPAN="quart">q</ABBR>
oz.ounces{abbr mark
common collocation
modern survival}
oz.
viz.videlicet{abbr mark
common collocation
modern survival}
viz.
qzque{abbr mark
or alloglyph of que symbol}
&abque;
qzquod{abbr mark
or alloglyph of que symbol
substituted for quod symbol
--by mistake?}
&abquod;
omnibzomnibus{abbr mark
common collocation
?amounts to symbol}
omni&abbus;
hzhabet{abbr mark}<ABBR EXPAN="habet">h</ABBR>
pzpatet{abbr mark}<ABBR EXPAN="patet">p</ABBR>
szsed{abbr mark}<ABBR EXPAN="sed">s</ABBR>







Example: what is this character?
Character, glyph, and homoglyph
Unicode speaks with forked tongue--but you decide

An open triangle in five writing systems

  1. Alchemical



  2. Astronomical



  3. Greek alphabet





  4. meaningless Without specific meaning

  5. Borrowed into an invented writing system








Example: what is a character, really?
Substitutions and authorial intent

1 Pseudogreek

2 Pseudo-scruples

3 ÿ for ij













4 ¶ for parentheses




5 inverted semicolon for query mark




6 RECIPE or RESPONSUS sign for 'rotolo'




7 RECIPE for 'ratio' in math text




8 RECIPE or RESPONSUS sign for '-rum' abbrev.




9 JUPITER sign for RECIPE sign




10 History of ou ligature

a In early Greek printing



b In modern Greek graffiti



c 8 substituted in Algonquin












Example: what is a word? "normalized" spacing








Example: what is a line?

Criteria for the verse line






Example: what is a drama?








C H O I C E S

You can encode (mark up) almost anything.

You can privilege one depth-of-field setting, one use, one theory
But you can't encode everything
So what should you encode?

And to what depth?

These are the decisions that make text encoding fun.

Your choices will be dictated by ...

  • the nature of the material
  • the character, variety, and 'depth' of your incoming data
  • the amount of your funding
  • the patience of your funders
  • (how much time left till your retirement)
  • the scale of the project (how many items)
  • the scope and variety of the project
  • your whim and mood of the moment
  • the purpose of the project
    • desired functionality
    • expected (or guessed-at) audience
    • potential for repurposing
    • potential for sharing/reuse
  • your own knowledge or ignorance
  • the existence of standards
    • why to avoid them. They are:
      • complex, hard to use
      • not tailored to material
      • not supported by local expertise or compatible with local systems
      • not as good as what you can come up with yourself
      • etc.?
    • why to use them. You can
      • leverage community expertise
      • share data
      • share tools
      • entertain at least a faint hope of "sustainability"
      • provide a fairly consistent perspective on the data

Our own projects as examples:

These differ in

  • their adherence to standards
  • their labor-intensity
  • their longevity (the price of success?)
  • their scale and scope

But share a common rationale:
  • intelligible display
  • intelligent navigation
  • contextually useful search restrictions
  • constraint by method and cost to very generic tagging
  • susceptability to incremental improvement