-
Notifications
You must be signed in to change notification settings - Fork 11
[Performance] Reduce compression memory requirements via structure change #301
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
…-compression-memory
…-compression-memory
map_module_to_scheme
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
…uantized-compression-memory
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
…uantized-compression-memory
…uce-quantized-compression-memory
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this ready for review? Still draft
@@ -362,6 +364,29 @@ def get_unexpected_file_keys(self, model: Module) -> List[str]: | |||
|
|||
return list(unexpected_keys) | |||
|
|||
def apply_compression_status(self, model: Module) -> Module: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where does this get applied?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After this lands, we'll add model_compressor. apply_compression_status(model)
within the overridden save_pretrained
function. In the future, this will be used/folded into apply_compression_status
Could we add a test to compress a model with sparsity+quantization? |
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
…-compression-memory
…-compression-memory
LGTM pending conflict, good work! |
def apply_compression_status(self, model: Module): | ||
if self.quantization_config is None: | ||
for module in model.modules(): | ||
module.quantization_status = QuantizationStatus.COMPRESSED |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we run compression here too, for cases when we only have sparsity!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good pending verification that sparse only models can be compressed using these changes!
def unwrap_module_forward_quantized(module: Module): | ||
delattr(module, "forward") # revert to class implementation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does this then expose the forward method on the Linear
or CompressedLinear
class? why do we want to delete the attr?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this exposes the forward method of the class implementation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need to use CompressedLinear for compression? What about if we’re compressing something that isn’t a linear layer?
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
…-compression-memory
Purpose
Allocating separate compressed state dict (current)
Replacing model structure (new)
Prerequisites
Changes
apply_compression_status
which converts theLinear
layers of a model toCompressedLinear
smodule_replace_dfs
, which is more likely performant than replacing modules using existingreplace_module
utilityCompressedLinear
parameters on the execution deviceTesting
test_apply_compression_status