-
Notifications
You must be signed in to change notification settings - Fork 36
[Performance] Add memory compression and decompression pathways #301
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Performance] Add memory compression and decompression pathways #301
Conversation
map_module_to_schemeThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this ready for review? Still draft
src/compressed_tensors/compressors/model_compressors/model_compressor.py
Outdated
Show resolved
Hide resolved
|
Could we add a test to compress a model with sparsity+quantization? |
|
LGTM pending conflict, good work! |
src/compressed_tensors/compressors/model_compressors/model_compressor.py
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good pending verification that sparse only models can be compressed using these changes!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need to use CompressedLinear for compression? What about if we’re compressing something that isn’t a linear layer?
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
0e9544d to
b2cad7e
Compare
…-compression-memory
Can you share sparse + fp8 models recipes where we have non-uniform sparsity and/or quantization cases? cc @kylesayrs |
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's beautiful Kyle 🥇 . Love the detailed summary and charts showing the improvement
src/compressed_tensors/compressors/model_compressors/model_compressor.py
Show resolved
Hide resolved
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
src/compressed_tensors/compressors/model_compressors/model_compressor.py
Show resolved
Hide resolved
…-project#301) * Implement memory compression and decompression Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * perform ops on cpu, move back to module device Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * add mixed tests Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> --------- Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
Purpose
Memory Visualization
Compression Memory Improvement
Stacked
Model Compression and Decompression
Stacked
Demonstration Script
Prerequisites
Changes
compress_modelanddecompress_model, which both act on a model in memory rather than a state dict or model on diskcompress_modelcompresses each module independently, implementshow_progressoncompressmethods to squelch tqdm prints for each module compressiondecompress_from_state_dictfor sparsity compressorsget_nested_mappings_from_state_dictto support returning unmatched params, similar toget_nested_weight_mappingsdecompress_from_state_dictwhere scheme was gotten instead of weight argsweight_namewas referring to a module path, not a weight nameremove_suffixutil which can be replaced withstr.removesuffixas of python3.9+ (which is the minimum we support, double check with @dsikka @rahul-tuliget_execution_devicewhen initing params forCompressedLinearTesting
test_compress_modelwhich tests that memory compression is equivalent to dict compressiontest_decompress_modelwhich tests that hfquantizer decompression (from disk) is equivalent to decompression from memory