Mark unsuccessfully processed documents / Hand off to cloud OCR service upon bad results #373

christianlouis · 2025-02-21T10:41:58Z

christianlouis
Feb 21, 2025

With the library of documents I've amassed over the course of the last 10+ years, also the OCR quality varies significantly.
What I'd love to see is some option to make sure I can re-process documents that paperless-ai wasn't able to process, either due to a lack of content in the document itself (maybe it was an image that got stuck in the documents) or because the content was so indecisive that no proper extraction of information was possible by the LLM.

Ultimately, I'd love to have the ability to:

a) identify these poorly-processed documents (maybe by adding a custom tag to them, marking them as 'processing-failed') or - even better -

b) automatically trigger some kind of better document pre-processing and content extraction - e.g. re-send the document to Azure for Document and content recognition.

I've seen good results with my scans and Azure OCR, even in more challenging environments, superior to Paperless's own OCR and ABBYY HotFolders, but am reluctant to "fix something that isn't broken" by re-processing my entire set of documents.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Mark unsuccessfully processed documents / Hand off to cloud OCR service upon bad results #373

Uh oh!

{{title}}

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

Uh oh!

Mark unsuccessfully processed documents / Hand off to cloud OCR service upon bad results #373

Uh oh!

christianlouis Feb 21, 2025

Replies: 0 comments

christianlouis
Feb 21, 2025