Skip to content

post-process ALTO→PAGE #21

@bertsky

Description

@bertsky

The alto page transform does not set /PcGts/Page/@imageFilename if the input had no /alto/description/sourceImageInformation/@fileName. It is impossible to fix that with OCR-D means (even ocrd workspace).

It would be very helpful if this processor had some fix-up capability for this important case (and probably others).

My suggestion would be to try to find the "correct" image file by looking up the physical pageId for the ALTO file and then among the image-only fileGrps taking the first (or the largest, or a parameter-configured) entry for that page.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions