-
Notifications
You must be signed in to change notification settings - Fork 180
[Transform] QuIP Modifier #1648
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: bdellabe/transform-modifier
Are you sure you want to change the base?
[Transform] QuIP Modifier #1648
Conversation
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
👋 Hi! Thank you for contributing to llm-compressor. Please add the ready label when the PR is ready for review. Note: This is required to complete the testing suite, please only add the label once the PR is code complete and local testing has been performed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Summary of Changes
Hello @kylesayrs, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!
This pull request introduces a new QuIP transformation modifier and significantly enhances the existing SpinQuant modifier, providing more advanced techniques for preparing large language models for quantization. It includes foundational utilities for model layer manipulation and improves the handling of word embeddings, all aimed at reducing accuracy degradation during model compression. New examples and tests are also included to demonstrate and validate these capabilities.
Highlights
- New QuIP Modifier: I've introduced a new
QuIPModifier
underllmcompressor.modifiers.transform
. This modifier enables the application of QuIP (Quantization with Incoherence and Permutation) transforms to large language models. These transforms, based on recent research, aim to reduce accuracy loss during quantization by rotating weights and activations into a more quantization-friendly space. It supports 'hadamard', 'random-hadamard', and 'random-matrix' transform types. - Enhanced SpinQuant Modifier: The existing
SpinQuantModifier
has been significantly enhanced. It now leverages new utility functions for normalizing embeddings and fusing norm layers into linear layers, which are crucial pre-processing steps for effective transformations. I've also added more robust mapping capabilities to precisely target layers for R1 and R2 rotations. - Model Pre-processing Utilities: I've added new core utilities in
src/llmcompressor/modeling/fuse.py
fornormalize_embedding
andfuse_norm_linears
. These functions prepare model layers for transformations by normalizing embedding weights and fusing normalization layers into subsequent linear layers, ensuring transform invariance and improving quantization results. - Improved Word Embedding Untying: The
patch_tied_tensors_bug
function has been renamed tountie_word_embeddings
and its implementation has been refined. This ensures that word embeddings are correctly untied, even when model parameters are offloaded, addressing a known issue in Hugging Face Transformers and improving compatibility with various model loading and saving scenarios. - Pipeline Integration and Examples: I've updated the pipeline inference logic to ensure that
SpinQuantModifier
(and implicitly, other transform modifiers) correctly utilize the 'datafree' calibration pipeline, as these transformations do not require a full calibration dataset. New example scripts (compress_model.py
,llama3_example.py
) have been added to demonstrate the usage of these new transform capabilities.
Using Gemini Code Assist
The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.
Invoking Gemini
You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command>
or @gemini-code-assist <command>
. Below is a summary of the supported commands.
Feature | Command | Description |
---|---|---|
Code Review | /gemini review |
Performs a code review for the current pull request in its current state. |
Pull Request Summary | /gemini summary |
Provides a summary of the current pull request in its current state. |
Comment | @gemini-code-assist | Responds in comments when explicitly tagged, both in issue comments and review comments. |
Help | /gemini help |
Displays a list of available commands. |
Customization
To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/
folder in the base of the repository. Detailed instructions can be found here.
Limitations & Feedback
Gemini Code Assist is currently in preview and may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments to provide feedback.
You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.
Footnotes
-
Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
The code changes introduce the QuIPModifier
for applying QuIP transforms. There are a few issues related to documentation, a critical bug in a Pydantic validator, type hint violations, and a potential issue with handling the ignore
parameter. The proposed suggestions aim to fix these issues and improve code correctness and maintainability.
def validate_not_implemented(cls, value, info: ValidationInfo): | ||
raise NotImplementedError(f"{info.field_name} is not supported right now") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The validate_not_implemented
validator has two critical issues:
- It raises
NotImplementedError
even ifrandomize
orlearnable
are explicitly set toFalse
in a recipe. The check should only trigger if they are set toTrue
. - Pydantic validators must return the validated value. This implementation implicitly returns
None
, which will incorrectly overwrite the field's value.
These issues make the modifier unusable if these fields are present in the recipe.
def validate_not_implemented(cls, value, info: ValidationInfo): | |
raise NotImplementedError(f"{info.field_name} is not supported right now") | |
def validate_not_implemented(cls, value, info: ValidationInfo): | |
if value: | |
raise NotImplementedError(f"{info.field_name} is not supported right now") | |
return value |
def on_initialize(self, state: State, **kwargs) -> bool: | ||
if self.transform_config is not None: | ||
return True | ||
|
||
self.transform_config = self._create_config() | ||
return True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The ignore
field is a Union[str, List[str]]
, but the TransformArgs
it's passed to likely expects a list of strings. If a single string (like the default "lm_head"
) is passed, it could be iterated over character by character, leading to incorrect behavior. It's safer to ensure self.ignore
is always a list. This can be handled at the beginning of on_initialize
.
def on_initialize(self, state: State, **kwargs) -> bool: | |
if self.transform_config is not None: | |
return True | |
self.transform_config = self._create_config() | |
return True | |
def on_initialize(self, state: State, **kwargs) -> bool: | |
if isinstance(self.ignore, str): | |
self.ignore = [self.ignore] | |
if self.transform_config is not None: | |
return True | |
self.transform_config = self._create_config() | |
return True |
self.transform_config = self._create_config() | ||
return True | ||
|
||
def on_start(self, state: State, event: Event, **kwargs): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The on_start
method is called with None
for the event
parameter from within on_event
, but its signature event: Event
does not allow None
. This violates the type hint and could cause issues with static analysis tools. The signature should be updated to event: Optional[Event] = None
to reflect that the event is optional.
def on_start(self, state: State, event: Event, **kwargs): | |
def on_start(self, state: State, event: Optional[Event] = None, **kwargs): |
if not self.ended_: | ||
self.on_end(state, None) | ||
|
||
def on_end(self, state: State, event: Event, **kwargs): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar to on_start
, the on_end
method is called with None
for the event
parameter, but its signature event: Event
does not allow for None
. This should be corrected to event: Optional[Event] = None
to align with its usage.
def on_end(self, state: State, event: Event, **kwargs): | |
def on_end(self, state: State, event: Optional[Event] = None, **kwargs): |
@@ -0,0 +1,3 @@ | |||
# flake8: noqa | |||
|
|||
from .base import * |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
According to PEP 8, wildcard imports (from ... import *
) should be avoided as they make it unclear which names are present in the namespace. While base.py
defines __all__
, making the import safer, it's more explicit and maintainable to import the required names directly.
from .base import * | |
from .base import QuIPModifier |
[QuIP: 2-Bit Quantization of Large Language Models With Guarantees](https://arxiv.org/abs/2307.13304) # noqa: E501 | ||
|
||
Transforms (rotations) are extra layers added to a model which reduce the accuracy | ||
loss induced by quantization. This is achived through "rotating" weights and |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
`"random-matrix"` has more performance cost, but supports a much larger set of | ||
sizes. | ||
`"random-matrix"` has the greatest performance cost, but supports any size |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The documentation for transform_type
appears to have a copy-paste error. It lists "random-matrix"
twice and omits "random-hadamard"
, which is a valid option for transform_type
. This can be confusing for users. The docstring should be updated to correctly describe all available transform types.
`"random-matrix"` has more performance cost, but supports a much larger set of | |
sizes. | |
`"random-matrix"` has the greatest performance cost, but supports any size | |
`"random-hadamard"` has more performance cost, but supports a much larger set of | |
sizes. | |
`"random-matrix"` has the greatest performance cost, but supports any size |
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
No description provided.