Skip to content

Donating training data #321

@haydn-jones

Description

@haydn-jones

I'm planning on generating some amount of training data (hopefully on the order of 10s of thousands of pages, depends on cost) using olmocr/data/buildsilver.py (at least I assume this is how the data was generated). I've been running this on a lot of medicinal chemistry-esque papers and its been struggling here.

If you are open to me donating the data, I can use open access papers exclusively for this, otherwise I'll just throw what I have into a private mix.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions