-
Notifications
You must be signed in to change notification settings - Fork 7
Open
Description
I am surprised to see the following in our current recommendations:
- Ocropy nlbin instead of one of the Olena algorithms
- slow
skimage
binarize/denoise processors instead of Olena/Ocropy - only Tesseract deskewing (no Ocropy)
- region clipping after Ocropy page segmentation (not necessary)
- Ocropy line segmentation after Ocropy page segmentation (redundant)
- line clipping after Ocropy line segmentation (not necessary)
- Tesseract line segmentation without resegmentation or line clipping (to remove bbox overlaps)
- Tesseract vs Calamari recognition (should be exchangeable regardless of workflow up to that point)
EDIT (thanks @jbarth-ubhd for reminding me): also
- region clipping after region deskewing (impossible!)
Do these choices have some empirical grounding (measuring quality and/or performance on GT)?
Metadata
Metadata
Assignees
Labels
No labels