-
Notifications
You must be signed in to change notification settings - Fork 15
[Performance] Add memory compression and decompression pathways #301
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Performance] Add memory compression and decompression pathways #301
Conversation
map_module_to_scheme
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this ready for review? Still draft
src/compressed_tensors/compressors/model_compressors/model_compressor.py
Outdated
Show resolved
Hide resolved
Could we add a test to compress a model with sparsity+quantization? |
LGTM pending conflict, good work! |
src/compressed_tensors/compressors/model_compressors/model_compressor.py
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good pending verification that sparse only models can be compressed using these changes!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need to use CompressedLinear for compression? What about if we’re compressing something that isn’t a linear layer?
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
0e9544d
to
b2cad7e
Compare
…-compression-memory
Can you share sparse + fp8 models recipes where we have non-uniform sparsity and/or quantization cases? cc @kylesayrs |
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's beautiful Kyle 🥇 . Love the detailed summary and charts showing the improvement
src/compressed_tensors/compressors/model_compressors/model_compressor.py
Show resolved
Hide resolved
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
Purpose
Memory Visualization
Compression Memory Improvement
Stacked
Model Compression and Decompression
Stacked
Demonstration Script
Prerequisites
Changes
compress_model
anddecompress_model
, which both act on a model in memory rather than a state dict or model on diskcompress_model
compresses each module independently, implementshow_progress
oncompress
methods to squelch tqdm prints for each module compressiondecompress_from_state_dict
for sparsity compressorsget_nested_mappings_from_state_dict
to support returning unmatched params, similar toget_nested_weight_mappings
decompress_from_state_dict
where scheme was gotten instead of weight argsweight_name
was referring to a module path, not a weight nameremove_suffix
util which can be replaced withstr.removesuffix
as of python3.9+ (which is the minimum we support, double check with @dsikka @rahul-tuliget_execution_device
when initing params forCompressedLinear
Testing
test_compress_model
which tests that memory compression is equivalent to dict compressiontest_decompress_model
which tests that hfquantizer decompression (from disk) is equivalent to decompression from memory