Skip to content

fix/discuss recommended workflows #172

@bertsky

Description

@bertsky

I am surprised to see the following in our current recommendations:

  • Ocropy nlbin instead of one of the Olena algorithms
  • slow skimage binarize/denoise processors instead of Olena/Ocropy
  • only Tesseract deskewing (no Ocropy)
  • region clipping after Ocropy page segmentation (not necessary)
  • Ocropy line segmentation after Ocropy page segmentation (redundant)
  • line clipping after Ocropy line segmentation (not necessary)
  • Tesseract line segmentation without resegmentation or line clipping (to remove bbox overlaps)
  • Tesseract vs Calamari recognition (should be exchangeable regardless of workflow up to that point)

EDIT (thanks @jbarth-ubhd for reminding me): also

  • region clipping after region deskewing (impossible!)

Do these choices have some empirical grounding (measuring quality and/or performance on GT)?

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions