Skip to content

Only one OCR'd PDF is created when running the OCR workflow on an item with multiple PDFs #1456

@andrewjbtw

Description

@andrewjbtw

When we started working on the OCR workflow, we had not yet supported multiple PDF display in the viewer and unfortunately did not test that case. I recently tested it on a sample item in QA and learned that we have a few problems handling multiple PDF:

  • the auto-generated filename pattern for the ABBYY-created PDF is too generic, making it impossible to store more than one OCR'd PDF
  • the resource label that appears in the viewer is also generic and obscures which of the multiple "original" PDFs corresponds to the OCR'd PDF

In that example, the OCR'd PDF is the last resource in the list:

  • label: PDF (with automated OCR)
  • filename: zj915gz4357-generated.pdf

We need new strategies for both the filename and the resource label that can accommodate multiple PDFs. We also need to determine how to set the order in the file list. One option would be to alternate between the original file and the OCR'd PDF in the sequence, rather than put all the OCR'd PDFs below all the original PDFs.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions