segmentation: tesseract5.3.0 vs ocrd/all:2022-08-15

I'm just wondering a bit about different recognition results using tesseract5.3.0 and OCR-D with `ocrd-olena-binarize && ocrd-tesserocr-segment`.

Original TIF: https://digi.ub.uni-heidelberg.de/diglitData/v/heidelberg1592_-_04manual.tif

Result using tesseract5.3.0 `-l Fraktur_GT4Hist...` (right column = ground truth)
![image](https://user-images.githubusercontent.com/30653779/222190836-3f927713-22d0-4dab-9e73-a619f175be1e.png)

and using tesserocr-segment and calamari-recognize (`fraktur_historical1.0`) with OCR-D:
![image](https://user-images.githubusercontent.com/30653779/222191458-1835482b-b1c2-4bb2-82a7-aa57ef1f4a3c.png)

and using tesserocr-segment and tesserocr-recognize (`Fraktur_GT4Hist...`) with OCR-D:
![image](https://user-images.githubusercontent.com/30653779/222191742-710bf0f8-b894-4a31-b18c-e22d414d60ce.png)

It seems that OCR-D-"tesserocr" segmentation is somewhat different to OCR-D segmentation (perhaps because olena-binarize?), but I can't find a big change in line/region/segmentation etc. in the tesseract changelog the last year.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

segmentation: tesseract5.3.0 vs ocrd/all:2022-08-15 #346

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

segmentation: tesseract5.3.0 vs ocrd/all:2022-08-15 #346

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions