Mark unsuccessfully processed documents / Hand off to cloud OCR service upon bad results #373
christianlouis
started this conversation in
Ideas
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
With the library of documents I've amassed over the course of the last 10+ years, also the OCR quality varies significantly.
What I'd love to see is some option to make sure I can re-process documents that paperless-ai wasn't able to process, either due to a lack of content in the document itself (maybe it was an image that got stuck in the documents) or because the content was so indecisive that no proper extraction of information was possible by the LLM.
Ultimately, I'd love to have the ability to:
a) identify these poorly-processed documents (maybe by adding a custom tag to them, marking them as 'processing-failed') or - even better -
b) automatically trigger some kind of better document pre-processing and content extraction - e.g. re-send the document to Azure for Document and content recognition.
I've seen good results with my scans and Azure OCR, even in more challenging environments, superior to Paperless's own OCR and ABBYY HotFolders, but am reluctant to "fix something that isn't broken" by re-processing my entire set of documents.
Beta Was this translation helpful? Give feedback.
All reactions