Skip to content

Conversation

kba
Copy link
Member

@kba kba commented Mar 22, 2023

If we can we should reduce the size of OCR-D/assets.

This resizes the MAX image in dfki-testdata. Does not break ocrd_anybaseocr's tests but draft to make sure it's not used elsewhere.

@bertsky
Copy link
Contributor

bertsky commented Mar 22, 2023

Note: ocrd-anybaseocr tests are not able to break – they are merely smoke tests. You have to compare before/afterwards in that case, and I guess that at least the (broken) dewarper will be quite sensitive to image quality.

@bertsky
Copy link
Contributor

bertsky commented Mar 22, 2023

I wonder why we have all 20 pages on data/kant_aufklaerung_1784-page-region, but only 2 on the others.

Also, couldn't we somehow share the images between the different Kant Aufklärung variants?

@kba
Copy link
Member Author

kba commented Mar 22, 2023

I wonder why we have all 20 pages on data/kant_aufklaerung_1784-page-region, but only 2 on the others.

Agree, need to make sure no one is using anything except *17 and *20 and remove the rest.

Also, couldn't we somehow share the images between the different Kant Aufklärung variants?

We did that (with symlinks) before but then decided that every folder in assets should be a valid OCRD-ZIP, i.e. BagIt which does not allow symlinks beyond the data dir. We should probably just accept that these are not OCRD-ZIP and reintroduce that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants