Releases: dhdaines/playa
Releases · dhdaines/playa
PLAYA-PDF v0.7.1: Python 3.8 compatibility and bug fixes
PLAYA 0.7.1: 2025-08-16
- Tolerate non-integer values for page rotation
- Restore Python 3.8 compatibility (oops!)
- Restore robustness to broken structure elements
- Correct handling of byte alignment in CCITT decoding fixing an endless loop
- Be more robust when extracting images
What's Changed
- fix: restore python 3.8 compatibility by @dhdaines in #172
- fix: accept non-integer values for rotate by @dhdaines in #173
Full Changelog: v0.7.0...v0.7.1
PLAYA-PDF v0.7.0: Vastly improved structure and marked content
PLAYA 0.7.0: 2025-08-04
- Remove long-deprecated functions
- Add and document
finalize
method on ContentObjects - Make
PageList
work more or less like aSequence
- Support iteration over
playa.structure.ContentItem
- Greatly increase test coverage
- Greatly optimize marked content section access
- Add
find
andfind_all
methods topage.structure
- Extract CMYK images (except JPEG/JPEG2000) as TIFF
What's Changed
- fix: correct and test rotation behaviour which was very broken by @dhdaines in #167
- fix: use decode_text almost everywhere for utf-16 by @dhdaines in #168
- Improve coverage and fix bugs by @dhdaines in #161
- refactor: image extraction part 2 by @lambdalemon in #163
- feat: extract cmyk images as tiff by @lambdalemon in #170
- Greatly accelerate and improve logical structure tasks by @dhdaines in #165
Full Changelog: v0.6.6...v0.7.0
PLAYA-PDF v0.6.6: Fix two awful and long-standing bugs
PLAYA 0.6.6: 2025-08-01
- Correct and test rotation behaviour which was quite incorrect, and
also allow users to update rotation and space on an existing page - Fix a very long-standing and stupid bug in
normalize_rect
- Never crash on invalid UTF-16 (we mean it this time)
Full Changelog: v0.6.5...v0.6.6
PLAYA-PDF v0.6.5: Clean up another hasty release
PLAYA 0.6.5: 2025-08-01
- Fix terrible error in xref detection and parsing
- Support 1D and mixed CCITT fax decoding
What's Changed
- Support one-dimensional and mixed CCITT image compression by @dhdaines in #162
- feat: Add finalize() method to ContentObject by @dhdaines in #164
Full Changelog: v0.6.4...v0.6.5
PLAYA-PDF 0.6.4: Clean up an overly hasty release
0.6.3 was released a bit too soon and had a big problem, and an unfixed bug.
From CHANGELOG.md
- Fix terrible error in fallback indirect object parsing
- Simplify and robustify xref detection
- Stop stream parsing on endobj as well as endstream
What's Changed
Full Changelog: v0.6.3...v0.6.4
PLAYA-PDF v0.6.3: Various bugfixes
From CHANGELOG.md
- Correct and slightly optimize PNG predictor
- Accept all standard number syntaxes (oops)
- Fail fast on incorrect or damaged xref pointers
- Accept fontsize of 0
- Don't throw an exception on malformed text strings
- Extract images with any colorspace
- Correct ASCIIHexDecode for all odd-length strings (not just some)
- Remove sketchy characters from image and font filenames
- Track streamid in ObjectParser (this will become useful with time)
- Cache inline images in ObjectParser
What's Changed
- fix: always decode text (fixes: #153) by @dhdaines in #155
- fix: accept text hidden by setting fontsize to 0 by @dhdaines in #157
- feat: fail fast on incorrect or damaged xref pointers by @dhdaines in #156
- fix: accept all standard number syntaxes (oops) by @dhdaines in #158
- Correct and accelerate PNG predictor for multi-channel images by @dhdaines in #159
- extract images with cie based colorspace by @lambdalemon in #152
- refactor: cache inline image by @lambdalemon in #149
Full Changelog: v0.6.2...v0.6.3
PLAYA-PDF 0.6.2: Bug fixes and fonts
What's Changed
- fix: look in ICC profile for N if missing (fixes: #140) by @dhdaines in #141
- Fix: 1/2/4 bpc images by @lambdalemon in #144
- refactor: image extraction by @lambdalemon in #145
- feat: fontfile extraction by @lambdalemon in #147
- feat: cid2gid for CIDFont by @lambdalemon in #148
Full Changelog: v0.6.1...v0.6.2
PLAYA-PDF 0.6.1: Regression, refactoring, image extraction
What's Changed
- fix: correct bogus font descriptors with zero metrics by @dhdaines in #134
- feat: extract masks, softmasks, and alternates by @dhdaines in #135
- Good enough JBIG2 support by @dhdaines in #136
- Save Indexed images properly by @dhdaines in #137 (contribution by @lambdalemon)
- fix: avoid writing empty image files by @dhdaines in #139
Full Changelog: v0.6.0...v0.6.1
PLAYA-PDF 0.6.0: Structure and text improvements
What's Changed
- Iterate over Form XObjects inside Form XObjects with
.xobjects
by @dhdaines in #126 - Correct bbox on non-diagonal Type3
FontMatrix
by @dhdaines in #127 - Fixes and improvements to text extraction, marked content and logical structure by @dhdaines in #124
- Add displacement property to text objects by @dhdaines in #129
- Allow iteration over Type3 font programs by @dhdaines in #130
- Extract images as PNMs if possible by @dhdaines in #131
Notes from CHANGELOG.md
- Add
structure
toPage
to access structure elements indexed by
marked content IDs (convenience wrapper over the parent tree) - Add
structure
toXObjectObject
for the same reason - Add
parent
to allContentObject
to access parent structure
element (if any) via the parent tree - Descend into Form XObjects in
Page.xobjects
- Improve text extraction for rotated pages
- Improve text extraction for tagged PDFs
- Correct displacement and bbox for Type3 fonts with non-diagonal
FontMatrix
- Add
displacement
property toTextObject
- Add functioning
__iter__
toGlyphObject
in the case of
Type3 fonts, which works likeXObjectObject
- Extract non-JPEG images as PNM
- BREAKING: Fix
__len__
onPathObject
which incorrectly returned
non-zero even though iteration is not possible - BREAKING: Remove misleading
char_width
,get_descent
, and
get_ascent
methods andhscale
andvscale
properties from font
objects - BREAKING: Do not guess
basename
for Type3 fonts (generally it
isn't different fromfontname
for other subset fonts) - BREAKING:
Element.contents
contains bothstructure.ContentItem
andstructure.ContentObject
Full Changelog: v0.5.1...v0.6.0
PLAYA-PDF 0.5.1: Bug fixes and a few little features
What's Changed
- Detect and correct missing unicode mappings for Type3 fonts by @dhdaines in #121
- Tolerate bogus line endings and blank lines in xref tables by @dhdaines in #122
- feat: support bbox on Annotation (oops) by @dhdaines in #123
- Implement do_gs by @lambdalemon in #118
- Correct
ParentTree
andRoleMap
in logical structure
Full Changelog: v0.5.0...v0.5.1