Skip to content

Releases: sambitdash/PDFIO.jl

New pdPageExtractText Method

02 Oct 08:14
Compare
Choose a tag to compare

Changes this release:

  1. A new pdPageExtractText method is introduced which does a cleaner text conversion for complex PDFs including non-tagged PDFs.
  2. Bug fixes

Text conversions carried out on 25,000+ files.

Text extraction from PDF files

09 Sep 03:39
Compare
Choose a tag to compare

The release provides the following functionalities.

  1. Has a text extraction API pdPageExtractText(page)
  2. Supports Unicode code extraction from font encoding as well as Unicode CMap. (does not read into the font internal encoding embedded in the font file)
  3. Supports Adobe’s encoding for Latin fonts (AdobeGlyphList). Symbol and ZapfDingbats encodings are supported as well.
  4. Does not do any special handling for tagged PDFs but tagged PDFs may behave better as the creation order and reading order of document objects are similar.

PDFIO v0.0.6

24 Aug 23:06
Compare
Choose a tag to compare
  1. Implementation of PDF Common Data types
    - Text Strings
    - Date
    - Name Tree
    - Number Tree
  2. Page Labels
  3. File attachments and annotations supported as custom scripts
  4. Cleaner implementation of show and print methods of PDF Objects
  5. Inline API documentation in REPL