Release Text extraction from PDF files · sambitdash/PDFIO.jl

The release provides the following functionalities.

Has a text extraction API pdPageExtractText(page)
Supports Unicode code extraction from font encoding as well as Unicode CMap. (does not read into the font internal encoding embedded in the font file)
Supports Adobe’s encoding for Latin fonts (AdobeGlyphList). Symbol and ZapfDingbats encodings are supported as well.
Does not do any special handling for tagged PDFs but tagged PDFs may behave better as the creation order and reading order of document objects are similar.

Provide feedback