Skip to content

Invalid ReadAlong XML when translations are added #409

@sergeleger

Description

@sergeleger

The translation sentence (<s />) elements have the same id value as the original sentence. See id="t0b0d0p0s0" in the following example. This violates the XML specification requiring id attributes to be unique across a single document.

<p id="t0b0d0p0">
   <s id="t0b0d0p0s0"><w id="t0b0d0p0s0w0" ARPABET="T HH IY S" time="0.72" dur="0.25">This</w> <w id="t0b0d0p0s0w1" ARPABET="IY S" time="0.97" dur="0.14">is</w> <w id="t0b0d0p0s0w2" ARPABET="AA" time="1.11" dur="0.05">a</w> <w id="t0b0d0p0s0w3" ARPABET="T EY S T" time="1.16" dur="0.58">test</w>.</s>
   <s do-not-align="true" id="t0b0d0p0s0" sentence-id="t0b0d0p0s0" class="sentence__translation editable__translation" xml:lang="eng">Ceci est un test.</s>
</p>

There was an attempt to fix this issue, but there is now functionality that depends on this broken implementation. Additionally, any corrective action will need to support the "broken" implementation since older readalong XML files will not get fixed.

Recommendations

  • append the suffix trN to the original sentence's id to generate t0b0d0p0s0tr0. Current read alongs have a single translation, the trN prefix would support additional translations.
  • use the sentence-id attribute to identify a sentence's translation
  • maintain current implementations to support older read along files.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions