Skip to content

To retry 'ocr-create' from Argo, we need to retry 'fetch-files' and 'xml-ticket-create' first #1512

@andrewjbtw

Description

@andrewjbtw

Because running OCR via ABBYY is a multi-step process in the ocrWF, a retry of ocr-create using the Argo retry/rerun/reset feature will essentially never succeed if all the retry does is cause that single step to be re-run again. The problem is that when ocr-create runs, ABBYY moves the files out of the INPUT folder that we use to start the OCR job. So a retry can't re-start ABBYY OCR because there are no files for ABBYY to OCR.

The OCR process relies on:

  • fetch-files: get the files for ABBYY to process
  • xml-ticket-create: create the XML file containing instructions for ABBYY on how to run OCR
  • ocr-create: actually generate the OCR

That effectively means that if ocr-create has any hope of success, the following steps have to take place:

  • files are put into the appropriate ABBYY INPUT folder
  • the xml ticket is created again and put into the appropriate folder
  • ABBYY processes the files again

ABBYY puts failed OCR runs into an EXCEPTIONS folder, so it is potentially possible to get the files from there and re-route them into INPUT, but for logging purposes, it's better to leave those files in EXCEPTIONS. That way, if a retry succeeds, we could still look at the files in EXCEPTIONS and see what the original error was. It's likely that a druid that fails once will fail again, but in testing, I did once find that manually setting up a retry on an item worked on a second attempt after failing initially.

Additional background

Our current process for retrying is completely manual. Someone goes to the accessioning/ABBYY shared filesystem and copies files (both images and the XML ticket) into the ABBYY INPUT folder. Once ABBYY is done (if the retry succeeds), accessioning will then pick up the OCR output.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions