Releases · dhdaines/playa · GitHub

17 Aug 04:06

dhdaines

PLAYA-PDF v0.7.1: Python 3.8 compatibility and bug fixes Latest

Latest

PLAYA 0.7.1: 2025-08-16

Tolerate non-integer values for page rotation
Restore Python 3.8 compatibility (oops!)
Restore robustness to broken structure elements
Correct handling of byte alignment in CCITT decoding fixing an endless loop
Be more robust when extracting images

What's Changed

fix: restore python 3.8 compatibility by @dhdaines in #172
fix: accept non-integer values for rotate by @dhdaines in #173

Full Changelog: v0.7.0...v0.7.1

Contributors

dhdaines

Assets 2

05 Aug 00:03

dhdaines

PLAYA-PDF v0.7.0: Vastly improved structure and marked content

PLAYA 0.7.0: 2025-08-04

Remove long-deprecated functions
Add and document finalize method on ContentObjects
Make PageList work more or less like a Sequence
Support iteration over playa.structure.ContentItem
Greatly increase test coverage
Greatly optimize marked content section access
Add find and find_all methods to page.structure
Extract CMYK images (except JPEG/JPEG2000) as TIFF

What's Changed

fix: correct and test rotation behaviour which was very broken by @dhdaines in #167
fix: use decode_text almost everywhere for utf-16 by @dhdaines in #168
Improve coverage and fix bugs by @dhdaines in #161
refactor: image extraction part 2 by @lambdalemon in #163
feat: extract cmyk images as tiff by @lambdalemon in #170
Greatly accelerate and improve logical structure tasks by @dhdaines in #165

Full Changelog: v0.6.6...v0.7.0

Contributors

dhdaines and lambdalemon

Assets 2

01 Aug 23:29

dhdaines

PLAYA-PDF v0.6.6: Fix two awful and long-standing bugs

PLAYA 0.6.6: 2025-08-01

Correct and test rotation behaviour which was quite incorrect, and
also allow users to update rotation and space on an existing page
Fix a very long-standing and stupid bug in normalize_rect
Never crash on invalid UTF-16 (we mean it this time)

Full Changelog: v0.6.5...v0.6.6

Assets 2

01 Aug 16:35

dhdaines

PLAYA-PDF v0.6.5: Clean up another hasty release

PLAYA 0.6.5: 2025-08-01

Fix terrible error in xref detection and parsing
Support 1D and mixed CCITT fax decoding

What's Changed

Support one-dimensional and mixed CCITT image compression by @dhdaines in #162
feat: Add finalize() method to ContentObject by @dhdaines in #164

Full Changelog: v0.6.4...v0.6.5

Contributors

dhdaines

Assets 2

26 Jul 16:09

dhdaines

PLAYA-PDF 0.6.4: Clean up an overly hasty release

0.6.3 was released a bit too soon and had a big problem, and an unfixed bug.

From CHANGELOG.md

Fix terrible error in fallback indirect object parsing
Simplify and robustify xref detection
Stop stream parsing on endobj as well as endstream

What's Changed

Be robust to multiply broken PDFS (xrefs, streams, indirect objects) by @dhdaines in #160

Full Changelog: v0.6.3...v0.6.4

Contributors

dhdaines

Assets 2

26 Jul 13:37

dhdaines

PLAYA-PDF v0.6.3: Various bugfixes

From CHANGELOG.md

Correct and slightly optimize PNG predictor
Accept all standard number syntaxes (oops)
Fail fast on incorrect or damaged xref pointers
Accept fontsize of 0
Don't throw an exception on malformed text strings
Extract images with any colorspace
Correct ASCIIHexDecode for all odd-length strings (not just some)
Remove sketchy characters from image and font filenames
Track streamid in ObjectParser (this will become useful with time)
Cache inline images in ObjectParser

What's Changed

fix: always decode text (fixes: #153) by @dhdaines in #155
fix: accept text hidden by setting fontsize to 0 by @dhdaines in #157
feat: fail fast on incorrect or damaged xref pointers by @dhdaines in #156
fix: accept all standard number syntaxes (oops) by @dhdaines in #158
Correct and accelerate PNG predictor for multi-channel images by @dhdaines in #159
extract images with cie based colorspace by @lambdalemon in #152
refactor: cache inline image by @lambdalemon in #149

Full Changelog: v0.6.2...v0.6.3

Contributors

dhdaines and lambdalemon

Assets 2

21 Jul 15:04

dhdaines

PLAYA-PDF 0.6.2: Bug fixes and fonts

What's Changed

fix: look in ICC profile for N if missing (fixes: #140) by @dhdaines in #141
Fix: 1/2/4 bpc images by @lambdalemon in #144
refactor: image extraction by @lambdalemon in #145
feat: fontfile extraction by @lambdalemon in #147
feat: cid2gid for CIDFont by @lambdalemon in #148

Full Changelog: v0.6.1...v0.6.2

Contributors

dhdaines and lambdalemon

Assets 2

18 Jun 01:19

dhdaines

PLAYA-PDF 0.6.1: Regression, refactoring, image extraction

What's Changed

fix: correct bogus font descriptors with zero metrics by @dhdaines in #134
feat: extract masks, softmasks, and alternates by @dhdaines in #135
Good enough JBIG2 support by @dhdaines in #136
Save Indexed images properly by @dhdaines in #137 (contribution by @lambdalemon)
fix: avoid writing empty image files by @dhdaines in #139

Full Changelog: v0.6.0...v0.6.1

Contributors

dhdaines and lambdalemon

Assets 2

13 Jun 16:52

dhdaines

PLAYA-PDF 0.6.0: Structure and text improvements

What's Changed

Iterate over Form XObjects inside Form XObjects with .xobjects by @dhdaines in #126
Correct bbox on non-diagonal Type3 FontMatrix by @dhdaines in #127
Fixes and improvements to text extraction, marked content and logical structure by @dhdaines in #124
Add displacement property to text objects by @dhdaines in #129
Allow iteration over Type3 font programs by @dhdaines in #130
Extract images as PNMs if possible by @dhdaines in #131

Notes from CHANGELOG.md

Add structure to Page to access structure elements indexed by
marked content IDs (convenience wrapper over the parent tree)
Add structure to XObjectObject for the same reason
Add parent to all ContentObject to access parent structure
element (if any) via the parent tree
Descend into Form XObjects in Page.xobjects
Improve text extraction for rotated pages
Improve text extraction for tagged PDFs
Correct displacement and bbox for Type3 fonts with non-diagonal
FontMatrix
Add displacement property to TextObject
Add functioning __iter__ to GlyphObject in the case of
Type3 fonts, which works like XObjectObject
Extract non-JPEG images as PNM
BREAKING: Fix __len__ on PathObject which incorrectly returned
non-zero even though iteration is not possible
BREAKING: Remove misleading char_width, get_descent, and
get_ascent methods and hscale and vscale properties from font
objects
BREAKING: Do not guess basename for Type3 fonts (generally it
isn't different from fontname for other subset fonts)
BREAKING: Element.contents contains both structure.ContentItem
and structure.ContentObject

Full Changelog: v0.5.1...v0.6.0

Contributors

dhdaines

Assets 2

26 May 20:52

dhdaines

PLAYA-PDF 0.5.1: Bug fixes and a few little features

What's Changed

Detect and correct missing unicode mappings for Type3 fonts by @dhdaines in #121
Tolerate bogus line endings and blank lines in xref tables by @dhdaines in #122
feat: support bbox on Annotation (oops) by @dhdaines in #123
Implement do_gs by @lambdalemon in #118
Correct ParentTree and RoleMap in logical structure

Full Changelog: v0.5.0...v0.5.1

Contributors

dhdaines and lambdalemon

Assets 2