Skip to content

Releases: dhdaines/playa

PLAYA-PDF 0.5.0: Breaking all the APIs again

14 May 15:59
1c2a73a

Choose a tag to compare

There was a lot of rot and bug in various APIs, especially text and font related ones, and since ZeroVer and Reasons, it seemed like a good idea to get rid of that nonsense.

Changes from CHANGELOG.md

  • Remove use of object in type annotations
  • Add support for role map and standard structure types
  • Refactor page.py as it was getting really unwieldy
  • Add missing ctm to content objects in metadata API
  • Somewhat improve untagged text extraction where the CTM is exotic
  • Correct character and word spacing to apply after all glyphs
  • Correct vertical writing to fully support glyph-specific position
    vectors, even totally absurd ones
  • Correct horizontal scaling to apply to vertical writing, including
    the position vector
  • Add bbox and contents to structure elements
  • Add origin and displacement to glyphs
  • Add size to glyphs and texts to get effective font size (still not
    entirely accurate when there is rotation or skewing)
  • Support PDF 2.0 Length attribute on inline images
  • Add font property to documents and pages
  • BREAKING: find and find_all in structure search by standard
    structure types (roles)
  • BREAKING: parent_tree moved to playa.structure.Tree
  • BREAKING: Point, Rect, Matrix and PDFObject moved to
    playa.pdftypes
  • BREAKING: PathObject no longer contains "subpaths", it is safe to
    recursively descend it now
  • BREAKING: Content objects moved to playa.content and interpreter
    to playa.interp
  • BREAKING: Text state no longer exists in the public API, text
    objects have immutable line matrix and glyph offset now, and
    everything else is in the graphic state
  • BREAKING: text_space_ properties are removed since what they
    returned was not actually text space (and maybe not useful either)
  • BREAKING: glyph_offset is removed from glyphs and made private in
    text objects, as it is not in a well defined space.
  • BREAKING: Glyph bbox now has a precise definition, which isn't
    exactly the glyph bounding box but is a lot closer. This means
    notably that adjacent glyphs may overlap or may not touch, which is
    why you should never use the bbox to detect word boundaries.
    Use origin and displacement instead, please!
  • BREAKING: cid2unicode attribute of fonts is removed as it doesn't
    make any sense for Type3 or CID fonts.

What's Changed

  • fix!: make type annotations much stricter by @dhdaines in #95
  • feat!: Add support for role map and standard structure types by @dhdaines in #98
  • Dont't split PathObject into subpaths by @lambdalemon in #85
  • XObjects inherit graphic state from surrounding by @lambdalemon in #96
  • fix: correct ascent/descent for Type3 fonts by @dhdaines in #99
  • refactor!: split playa.page into three modules by @dhdaines in #100
  • refactor!: most of text state is just graphics state by @dhdaines in #101
  • refactor!: drown text state in the bathtub by @dhdaines in #102
  • Correct documentation and metadata for font, text, and glyph objects by @dhdaines in #105
  • Fix text rendering matrix for GlyphObject by @lambdalemon in #107
  • Correct glyph and text bboxes in vertical writing mode by @dhdaines in #110 (thanks @lambdalemon for a different version of this PR)
  • Make benchmarks more useful by @dhdaines in #111
  • feat!: Improve text extraction and add useful glyph and text properties by @dhdaines in #112
  • Correct the handling of character and word spacing parameters by @dhdaines in #113
  • feat: support PDF 2.0 inline images by @dhdaines in #115

Full Changelog: v0.4.3...v0.5.0

PLAYA-PDF 0.4.3: More bug fixes

09 May 20:07

Choose a tag to compare

  • Correct ascent, descent, and glyph boxes for Type3 fonts
  • Use ascent and descent (and not a single solitary text space unit, floating in a man's hat) to calculate glyph/text bbox height (thanks to @lambdalemon)
  • XObjects inherit graphics state from surrounding content (by @lambdalemon)

Full Changelog: v0.4.2...v0.4.3

PLAYA-PDF 0.4.2: Bug fixes

27 Apr 01:21

Choose a tag to compare

What's Changed

  • Correct fontsize and scaling in text state
  • Correct ValueError on incorrect stream lengths for ASCII85 data
  • Correct implicit font encodings for Type1 fonts
  • Tolerate all sorts of illegal structure trees
  • Allow accessing annotations and XObjects from structure tree
  • Better encoding for SimpleFont by @lambdalemon in #82
  • Improve error handling in font initialization by @dhdaines in #84
  • Extra robustness for ascii85 and inline images by @dhdaines in #89
  • fix: do not follow circular xobject references by @dhdaines in #90
  • Fix a few annoyances in logical structure trees by @dhdaines in #74
  • Fix bug in CFFFontProgram when using predefined encodings by @lambdalemon in #91
  • Remove padding in AES encrypted strings by @dhdaines in #92
  • Add the ability to access underlying objects in structure content objects by @dhdaines in #93
  • Correct asobj for structure elements by @dhdaines in #94

New Contributors

Full Changelog: v0.4.1...v0.4.2

PLAYA-PDF 0.4.1: Minor but important cleanups

20 Mar 19:15

Choose a tag to compare

What's Changed

  • Correct outlines in CLI
  • Accept UTF-16LE in strings with BOM
  • Speed up fallback xrefs in pathological PDFs
  • Detect two PDFs in a trenchcoat

Full Changelog: v0.4.0...v0.4.1

PLAYA-PDF 0.4.0: More robustness and expanded CLI

19 Mar 18:17
5024a16

Choose a tag to compare

What's Changed

  • Export structured/typed metadata for use in CLI and clients by @dhdaines in #68
  • Remove deprecated APIs for 0.4.0 (or maybe 1.0.0?) release by @dhdaines in #69
  • Be extra robust to really broken PDFs in parsing

Full Changelog: v0.3.2...v0.4.0

PLAYA-PDF 0.3.2: Improved stability and bug fixes

19 Mar 04:07

Choose a tag to compare

What's Changed

  • Run all the pdf.js tests and fix as many problems as possible by @dhdaines in #67
  • fix: Decrypt all the things by @dhdaines in #70
  • Remove at least one footgun from TextObject by @dhdaines in #71

Full Changelog: v0.3.1...v0.3.2

PLAYA-PDF 0.3.1: Supporting some users

28 Feb 13:32
b7e20fb

Choose a tag to compare

What's Changed

  • feat: accept bytes as input (for async applications) by @dhdaines in #65
  • Fix CTM in Form XObjects (and support pdfannots) by @dhdaines in #66

Full Changelog: v0.3.0...v0.3.1

PLAYA-PDF 0.3.0: Break all (well most of) the APIs!

21 Feb 03:41

Choose a tag to compare

What's Changed

  • Remove deprecated APIs for upcoming PLAYA-PDF 0.3 series by @dhdaines in #52
  • fix: accept empty name objects by @dhdaines in #54
  • feat: extract text objects not text badly by @dhdaines in #53
  • feat: support text extraction and make a benchmark by @dhdaines in #55
  • feat!: make mcstack immutable to avoid surprises by @dhdaines in #57
  • feat: Add backreferences to content objects by @dhdaines in #58
  • Import and re-export a lot of types at top level by @dhdaines in #60
  • Deprecate more APIs by @dhdaines in #59
  • Lazy interface to logical structure tree by @dhdaines in #61
  • feat: new APIs, flatten, extract_text, is_tagged by @dhdaines in #62
  • New document outline and destination APIs by @dhdaines in #64

Full Changelog: v0.2.8...v0.3.0

PLAYA-PDF 0.2.10: Nope, more bugs to fix.

19 Feb 04:07

Choose a tag to compare

PLAYA 0.2.10: 2025-02-18

  • Fix serious bug in rare ' and " text operators
  • Fix robustness issues in structtree API

Full Changelog: v0.2.9...v0.2.10

PLAYA-PDF 0.2.9: Final (really) 0.2 release

12 Feb 13:40

Choose a tag to compare

What's Changed

  • fix: Support the all-important empty name object
  • feat!: Break the CLI again (ZeroVer YOLO) to better support page ranges
  • feat: Support some limited and lossy text extraction in the CLI
  • feat: Add necessary .doc property to page list
  • fix: Correct type annotations for page list

Full Changelog: v0.2.8...v0.2.9