-
Notifications
You must be signed in to change notification settings - Fork 2
Description
Support reading blocks compressed with bzip2 or zlib/deflate, the only two compression schemes currently mentioned in the standard.
The Python asdf library also now supports arbitrary compression extensions: https://www.asdf-format.org/projects/asdf/en/latest/asdf/extending/compressors.html This came out of the discussion in asdf-format/asdf-standard#408 but it seems to not be formally mentioned in the standard yet. So I guess for now compression extensions remain outside the scope of this issue.
The supported compression methods unfortunately do not support random/tile-based access. Decompression can still be formed lazily as needed when the data is accessed, though the raw block data access methods (see #29) would need a way to handle this. In this case it should provide an mmap'd region of memory large enough to hold the decompressed data (data_size
). We can then implement lazy paging:
-
Traditional, slightly kludgier approach: install
SIGSEGV
handler that handles page faults on that region of memory, and decompresses some or all of the data, maybe taking into account any active madvise flags. -
On supported Linux systems we can use the
userfaultfd
interface which is a bit nicer.
There should be a user-configurable option for how the decompressed data should be backed--either all RAM in an anonymous mmap, or backed by a temp file. There could also be a default threshold determined from the user's available system memory.
This is a bit tricky to get right but should be fun, but is also very low-priority right now (as I understand it STScI is not habitually using compression in ASDF files).