Skip to content

Support reading compressed blocks #37

@embray

Description

@embray

Support reading blocks compressed with bzip2 or zlib/deflate, the only two compression schemes currently mentioned in the standard.

The Python asdf library also now supports arbitrary compression extensions: https://www.asdf-format.org/projects/asdf/en/latest/asdf/extending/compressors.html This came out of the discussion in asdf-format/asdf-standard#408 but it seems to not be formally mentioned in the standard yet. So I guess for now compression extensions remain outside the scope of this issue.

The supported compression methods unfortunately do not support random/tile-based access. Decompression can still be formed lazily as needed when the data is accessed, though the raw block data access methods (see #29) would need a way to handle this. In this case it should provide an mmap'd region of memory large enough to hold the decompressed data (data_size). We can then implement lazy paging:

  • Traditional, slightly kludgier approach: install SIGSEGV handler that handles page faults on that region of memory, and decompresses some or all of the data, maybe taking into account any active madvise flags.

  • On supported Linux systems we can use the userfaultfd interface which is a bit nicer.

There should be a user-configurable option for how the decompressed data should be backed--either all RAM in an anonymous mmap, or backed by a temp file. There could also be a default threshold determined from the user's available system memory.

This is a bit tricky to get right but should be fun, but is also very low-priority right now (as I understand it STScI is not habitually using compression in ASDF files).

Metadata

Metadata

Assignees

Labels

enhancementNew feature or requeststandard-complianceMissing feature required to meet fully compliance with the ASDF standard

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions