Skip to content

Mets cache key error when removing a page from the cache #1297

@MehmedGIT

Description

@MehmedGIT

The following block of code still produces errors when the mets caching is enabled and doing mets files merging (the issue is not there if the mets server is used, however, I would prefer to have both options working and available):
https://github.com/OCR-D/core/blob/master/src/ocrd_models/ocrd_mets.py#L564-L567

I remember reporting that internally and previously just disabled the mets caching to avoid the issue, but that is disturbing now. Could we silently ignore that error, please? I don't see any issues if the key to be removed is not in the cache when removing a page attribute (potentially previously removed). Are there any? This also happens only on specific workspaces - not sure what is the issue. Here is an example ocrd zip: https://easyupload.io/gexwxu (expires in 29 days, reuploaded on 20.01.2025). The used processor is ocrd-cis-ocropy-binarize without any extra parameters specified.

  apptainer exec --bind /mnt/lustre-emmy-hdd/projects/project_pwieder_ocr_nhr/operandi_test_local/slurm_workspaces/test_wf_job_20241203_102643621751/test_ws_20241203_102643621751:/ws_data --bind /local/3673970/ocrd_models/ocrd-resources:/usr/local/share/ocrd-resources --env OCRD_METS_CACHING=true /local/3673970/ocrd_processor_sifs/ocrd_all_maximum_image.sif ocrd workspace -d /ws_data merge --force --no-copy-files /ws_data/mets_1.xml --page-id PHYS_0005,PHYS_0006,PHYS_0007,PHYS_0008
  apptainer exec --bind /mnt/lustre-emmy-hdd/projects/project_pwieder_ocr_nhr/operandi_test_local/slurm_workspaces/test_wf_job_20241203_102643621751/test_ws_20241203_102643621751:/ws_data --bind /local/3673970/ocrd_models/ocrd-resources:/usr/local/share/ocrd-resources --env OCRD_METS_CACHING=true /local/3673970/ocrd_processor_sifs/ocrd_all_maximum_image.sif rm /ws_data/mets_1.xml

Command exit status:
  1

Command output:
  (empty)

Command error:
  Traceback (most recent call last):
    File "/usr/local/bin/ocrd", line 8, in <module>
      sys.exit(cli())
    File "/usr/local/lib/python3.8/site-packages/click/core.py", line 1157, in __call__
      return self.main(*args, **kwargs)
    File "/usr/local/lib/python3.8/site-packages/click/core.py", line 1078, in main
      rv = self.invoke(ctx)
    File "/usr/local/lib/python3.8/site-packages/click/core.py", line 1688, in invoke
      return _process_result(sub_ctx.command.invoke(sub_ctx))
    File "/usr/local/lib/python3.8/site-packages/click/core.py", line 1688, in invoke
      return _process_result(sub_ctx.command.invoke(sub_ctx))
    File "/usr/local/lib/python3.8/site-packages/click/core.py", line 1434, in invoke
      return ctx.invoke(self.callback, **ctx.params)
    File "/usr/local/lib/python3.8/site-packages/click/core.py", line 783, in invoke
      return __callback(*args, **kwargs)
    File "/usr/local/lib/python3.8/site-packages/click/decorators.py", line 92, in new_func
      return ctx.invoke(f, obj, *args, **kwargs)
    File "/usr/local/lib/python3.8/site-packages/click/core.py", line 783, in invoke
      return __callback(*args, **kwargs)
    File "/build/core/src/ocrd/cli/workspace.py", line 815, in merge
      workspace.merge(
    File "/build/core/src/ocrd_utils/deprecate.py", line 15, in wrapper
      return f(*args, **kwargs)
    File "/build/core/src/ocrd_utils/deprecate.py", line 15, in wrapper
      return f(*args, **kwargs)
    File "/build/core/src/ocrd_utils/deprecate.py", line 15, in wrapper
      return f(*args, **kwargs)
    [Previous line repeated 1 more time]
    File "/build/core/src/ocrd/workspace.py", line 173, in merge
      self.mets.merge(other_workspace.mets, after_add_cb=after_add_cb, **kwargs)
    File "/build/core/src/ocrd_models/ocrd_mets.py", line 919, in merge
      f_dest = self.add_file(
    File "/build/core/src/ocrd_models/ocrd_mets.py", line 485, in add_file
      self.remove_file(ID=ID, fileGrp=fileGrp)
    File "/build/core/src/ocrd_models/ocrd_mets.py", line 511, in remove_file
      self.remove_one_file(f)
    File "/build/core/src/ocrd_models/ocrd_mets.py", line 567, in remove_one_file
      del self._page_cache[attr][page_div.attrib[attr.name]]
  KeyError: ' - '

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions