Skip to content

DEFI-COLaF/suryAlto

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

suryALTO

A simple script to produce ALTO data based on Surya OCR

Install

pip install -r requirements.txt

Run

You need to set environment variable based on your GPU. Unfortunately, right now, we can't set the batch from the command line interface directly:

  • RECOGNITION_BATCH_SIZE is for the OCR part, I recommend 64 for 24 GB of GPU RAM
  • DETECTOR_BATCH_SIZE is for the segmentation part, I recommend 16 for 24 GB of GPU RAM

Then you can run the script as:

RECOGNITION_BATCH_SIZE=64 DETECTOR_BATCH_SIZE=16 python to-alto.py aPDF.pdf_OR_multiple_images --destination output --lang la --format pdf/image

See the supported languages on the Surya repository.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages