Skip to content

Add short form encoding alternative for highly predictable and regular ruby #4

@duncdrum

Description

@duncdrum

Import from main repo

add a light weight alternative means of encoding ruby, where complexity is not needed and a standoff approach could save ~ 50% of markup.

see HZ04-004-01.pdf. 800+ pages, ruby on every character no irregularities or special cases. Based on proposal

<ab> 
  <r:ruby>
    <r:rb>This<anchor/>be<anchor/>text</r:rb>
    <r:rt place="left">Zis<anchor/><anchor/>tekst</r:rt>
    <r:rt place="right">Kindly<anchor/>regard this<anchor/>letter</r:rt>
  </r:ruby> 
  for you. 
</ab>

Switches the markup logic of the original proposal around, by not specifying anchors and segments in the ruby base. instead, assuming there to be a 1-1 relation ship by default between sequences of rb and rt, only adding markup where this is not the case <supplied>, <group>.

This would greatly reduce the markup load on long regular documents. This is not intended to replace the full fledged nested example, but as an alternative for cases where more light-weight markup is desirable so as to not interfere with other markup.

<ab> 
  <r:ruby>
    <r:rb>This be text</r:rb>
    <r:rt place="left">Zis <supplied reason="symmetry">bee</supplied> tekst</r:rt>
    <r:rt place="right">Kindly <group type="ruby">regard this</group> letter</r:rt>
  </r:ruby> 
  for you. 
</ab>
  • Question 1: anything speaking against this, given that there is a full fledged means to deal with tricky cases by using e.g. nested <ruby elements.

  • Question 2: Chunk length. technically the whole 900 page pdf could be captured by something like this:

<body>
  <r:ruby>
    <r:rb>Here be 900 pages of the pdf</r:rb>
    <r:rt place="left">and their ruby annotations</r:rt>
  </r:ruby> 
</body>

We can leave it up to encoders to decide on acceptable chunk lengths. or make a suggestion, or even limit the max length of <rb> via schema (not my preferred option but it exists.

  • Question 3: where to put <pb/>, <lb/> etc. Can or should they go into rb? Do we want to exclude that possibility?
  • Question4: aka Martin's nightmare, should we point to the possibility of using &ZeroWidthSpace; U+200B to introduce otherwise invisible word separation

see ab9b7d1

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions