When we started working on the OCR workflow, we had not yet supported multiple PDF display in the viewer and unfortunately did not test that case. I recently tested it on a sample item in QA and learned that we have a few problems handling multiple PDF:
- the auto-generated filename pattern for the ABBYY-created PDF is too generic, making it impossible to store more than one OCR'd PDF
- the resource label that appears in the viewer is also generic and obscures which of the multiple "original" PDFs corresponds to the OCR'd PDF
In that example, the OCR'd PDF is the last resource in the list:
- label:
PDF (with automated OCR)
- filename:
zj915gz4357-generated.pdf
We need new strategies for both the filename and the resource label that can accommodate multiple PDFs. We also need to determine how to set the order in the file list. One option would be to alternate between the original file and the OCR'd PDF in the sequence, rather than put all the OCR'd PDFs below all the original PDFs.