Skip to content

Releases: dhdaines/playa

PLAYA-PDF v0.7.1: Python 3.8 compatibility and bug fixes

17 Aug 04:06

Choose a tag to compare

PLAYA 0.7.1: 2025-08-16

  • Tolerate non-integer values for page rotation
  • Restore Python 3.8 compatibility (oops!)
  • Restore robustness to broken structure elements
  • Correct handling of byte alignment in CCITT decoding fixing an endless loop
  • Be more robust when extracting images

What's Changed

Full Changelog: v0.7.0...v0.7.1

PLAYA-PDF v0.7.0: Vastly improved structure and marked content

05 Aug 00:03

Choose a tag to compare

PLAYA 0.7.0: 2025-08-04

  • Remove long-deprecated functions
  • Add and document finalize method on ContentObjects
  • Make PageList work more or less like a Sequence
  • Support iteration over playa.structure.ContentItem
  • Greatly increase test coverage
  • Greatly optimize marked content section access
  • Add find and find_all methods to page.structure
  • Extract CMYK images (except JPEG/JPEG2000) as TIFF

What's Changed

Full Changelog: v0.6.6...v0.7.0

PLAYA-PDF v0.6.6: Fix two awful and long-standing bugs

01 Aug 23:29

Choose a tag to compare

PLAYA 0.6.6: 2025-08-01

  • Correct and test rotation behaviour which was quite incorrect, and
    also allow users to update rotation and space on an existing page
  • Fix a very long-standing and stupid bug in normalize_rect
  • Never crash on invalid UTF-16 (we mean it this time)

Full Changelog: v0.6.5...v0.6.6

PLAYA-PDF v0.6.5: Clean up another hasty release

01 Aug 16:35

Choose a tag to compare

PLAYA 0.6.5: 2025-08-01

  • Fix terrible error in xref detection and parsing
  • Support 1D and mixed CCITT fax decoding

What's Changed

  • Support one-dimensional and mixed CCITT image compression by @dhdaines in #162
  • feat: Add finalize() method to ContentObject by @dhdaines in #164

Full Changelog: v0.6.4...v0.6.5

PLAYA-PDF 0.6.4: Clean up an overly hasty release

26 Jul 16:09
f62f1e6

Choose a tag to compare

0.6.3 was released a bit too soon and had a big problem, and an unfixed bug.

From CHANGELOG.md

  • Fix terrible error in fallback indirect object parsing
  • Simplify and robustify xref detection
  • Stop stream parsing on endobj as well as endstream

What's Changed

  • Be robust to multiply broken PDFS (xrefs, streams, indirect objects) by @dhdaines in #160

Full Changelog: v0.6.3...v0.6.4

PLAYA-PDF v0.6.3: Various bugfixes

26 Jul 13:37
d659a1e

Choose a tag to compare

From CHANGELOG.md

  • Correct and slightly optimize PNG predictor
  • Accept all standard number syntaxes (oops)
  • Fail fast on incorrect or damaged xref pointers
  • Accept fontsize of 0
  • Don't throw an exception on malformed text strings
  • Extract images with any colorspace
  • Correct ASCIIHexDecode for all odd-length strings (not just some)
  • Remove sketchy characters from image and font filenames
  • Track streamid in ObjectParser (this will become useful with time)
  • Cache inline images in ObjectParser

What's Changed

Full Changelog: v0.6.2...v0.6.3

PLAYA-PDF 0.6.2: Bug fixes and fonts

21 Jul 15:04

Choose a tag to compare

What's Changed

Full Changelog: v0.6.1...v0.6.2

PLAYA-PDF 0.6.1: Regression, refactoring, image extraction

18 Jun 01:19
7915d1c

Choose a tag to compare

What's Changed

Full Changelog: v0.6.0...v0.6.1

PLAYA-PDF 0.6.0: Structure and text improvements

13 Jun 16:52

Choose a tag to compare

What's Changed

  • Iterate over Form XObjects inside Form XObjects with .xobjects by @dhdaines in #126
  • Correct bbox on non-diagonal Type3 FontMatrix by @dhdaines in #127
  • Fixes and improvements to text extraction, marked content and logical structure by @dhdaines in #124
  • Add displacement property to text objects by @dhdaines in #129
  • Allow iteration over Type3 font programs by @dhdaines in #130
  • Extract images as PNMs if possible by @dhdaines in #131

Notes from CHANGELOG.md

  • Add structure to Page to access structure elements indexed by
    marked content IDs (convenience wrapper over the parent tree)
  • Add structure to XObjectObject for the same reason
  • Add parent to all ContentObject to access parent structure
    element (if any) via the parent tree
  • Descend into Form XObjects in Page.xobjects
  • Improve text extraction for rotated pages
  • Improve text extraction for tagged PDFs
  • Correct displacement and bbox for Type3 fonts with non-diagonal
    FontMatrix
  • Add displacement property to TextObject
  • Add functioning __iter__ to GlyphObject in the case of
    Type3 fonts, which works like XObjectObject
  • Extract non-JPEG images as PNM
  • BREAKING: Fix __len__ on PathObject which incorrectly returned
    non-zero even though iteration is not possible
  • BREAKING: Remove misleading char_width, get_descent, and
    get_ascent methods and hscale and vscale properties from font
    objects
  • BREAKING: Do not guess basename for Type3 fonts (generally it
    isn't different from fontname for other subset fonts)
  • BREAKING: Element.contents contains both structure.ContentItem
    and structure.ContentObject

Full Changelog: v0.5.1...v0.6.0

PLAYA-PDF 0.5.1: Bug fixes and a few little features

26 May 20:52

Choose a tag to compare

What's Changed

  • Detect and correct missing unicode mappings for Type3 fonts by @dhdaines in #121
  • Tolerate bogus line endings and blank lines in xref tables by @dhdaines in #122
  • feat: support bbox on Annotation (oops) by @dhdaines in #123
  • Implement do_gs by @lambdalemon in #118
  • Correct ParentTree and RoleMap in logical structure

Full Changelog: v0.5.0...v0.5.1