- 
                Notifications
    You must be signed in to change notification settings 
- Fork 7
Open
Description
It looks like, that the METS parser does not allow structures like this in METS:
                  <div ID="DIVL5" TYPE="TITLE_OF_WORK">
                     <fptr>
                        <area BETYPE="IDREF" FILEID="ALTO00001" BEGIN="P1_TB00002"/>
                     </fptr>
If I call mm2tei with this kind of METS I get an exception:
Traceback (most recent call last):
  File "/home/calamariadmin/tei_venv_3.7/bin/mm2tei", line 8, in <module>
    sys.exit(cli())
  File "/home/calamariadmin/tei_venv_3.7/lib/python3.7/site-packages/click/core.py", line 1130, in __call__
    return self.main(*args, **kwargs)
  File "/home/calamariadmin/tei_venv_3.7/lib/python3.7/site-packages/click/core.py", line 1055, in main
    rv = self.invoke(ctx)
  File "/home/calamariadmin/tei_venv_3.7/lib/python3.7/site-packages/click/core.py", line 1404, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/home/calamariadmin/tei_venv_3.7/lib/python3.7/site-packages/click/core.py", line 760, in invoke
    return __callback(*args, **kwargs)
  File "/home/calamariadmin/tei_venv_3.7/lib/python3.7/site-packages/mets_mods2tei/scripts/mets_mods2tei.py", line 56, in cli
    tei.fill_from_mets(mets, ocr)
  File "/home/calamariadmin/tei_venv_3.7/lib/python3.7/site-packages/mets_mods2tei/api/tei.py", line 175, in fill_from_mets
    self.add_div_structure(div)
  File "/home/calamariadmin/tei_venv_3.7/lib/python3.7/site-packages/mets_mods2tei/api/tei.py", line 831, in add_div_structure
    div = div.get_div()[0]
IndexError: list index out of range
As a starting point an "ignore" of  <fptr><area> in <div> area would be good.
In general it would be even better, if the OCR text from ALTO is taken from the link referenced there.
Metadata
Metadata
Assignees
Labels
No labels