Skip to content

Feature Request: Block-based Compression Early Abort for Incompressible Data in gensquashfs #114

@wychen

Description

@wychen

gensquashfs currently retains the original data if the compressed output is larger than the source. However, performing heavy-duty compression on incompressible data and then discarding it may be wasteful. I propose adding a command line option to gensquashfs that enables a quick entropy measurement before performing compression. If a block is deemed incompressible, we can simply keep the original data without wasting computational resources on compression.

We could use a fast compression method, such as zstd level 1, to gauge the entropy. In this case, when using the default xz level 6, zstd level 1 introduces less than 2% of computational overhead. This approach would provide a net gain if the source files contain at least 2% of incompressible blocks, which is not an unreasonable scenario.

Alternative methods, such as file-based skipping mechanisms with filename matching or file type detection, may be less accurate. Specifically, files containing mixed compressibility resources, such as PDFs with both text (compressible) and JPEG images (not compressible), or uncompressed tar files or VM images containing various file types, could benefit from a more granular block-based approach.

This idea is inspired by the ZFS LZ4 early abort mechanism, although the requirements and trade-offs in our context may be different. For reference, I have filed a similar issue on the squashfs-tools repository at plougher/squashfs-tools#240.

I'm happy to refine my local prototype and send a PR, but I'd like to ensure that this feature aligns with the project's direction first. Thank you for your time and consideration. I'm looking forward to hearing your thoughts on this proposal and the potential advantages it could bring to squashfs-tools-ng and the community.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions