Skip to content

[Transform] QuIP Modifier #1648

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 2 commits into
base: bdellabe/transform-modifier
Choose a base branch
from

Conversation

kylesayrs
Copy link
Collaborator

No description provided.

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
Copy link

👋 Hi! Thank you for contributing to llm-compressor. Please add the ready label when the PR is ready for review.

Note: This is required to complete the testing suite, please only add the label once the PR is code complete and local testing has been performed.

@kylesayrs kylesayrs changed the base branch from main to bdellabe/transform-modifier July 15, 2025 18:57
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary of Changes

Hello @kylesayrs, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces a new QuIP transformation modifier and significantly enhances the existing SpinQuant modifier, providing more advanced techniques for preparing large language models for quantization. It includes foundational utilities for model layer manipulation and improves the handling of word embeddings, all aimed at reducing accuracy degradation during model compression. New examples and tests are also included to demonstrate and validate these capabilities.

Highlights

  • New QuIP Modifier: I've introduced a new QuIPModifier under llmcompressor.modifiers.transform. This modifier enables the application of QuIP (Quantization with Incoherence and Permutation) transforms to large language models. These transforms, based on recent research, aim to reduce accuracy loss during quantization by rotating weights and activations into a more quantization-friendly space. It supports 'hadamard', 'random-hadamard', and 'random-matrix' transform types.
  • Enhanced SpinQuant Modifier: The existing SpinQuantModifier has been significantly enhanced. It now leverages new utility functions for normalizing embeddings and fusing norm layers into linear layers, which are crucial pre-processing steps for effective transformations. I've also added more robust mapping capabilities to precisely target layers for R1 and R2 rotations.
  • Model Pre-processing Utilities: I've added new core utilities in src/llmcompressor/modeling/fuse.py for normalize_embedding and fuse_norm_linears. These functions prepare model layers for transformations by normalizing embedding weights and fusing normalization layers into subsequent linear layers, ensuring transform invariance and improving quantization results.
  • Improved Word Embedding Untying: The patch_tied_tensors_bug function has been renamed to untie_word_embeddings and its implementation has been refined. This ensures that word embeddings are correctly untied, even when model parameters are offloaded, addressing a known issue in Hugging Face Transformers and improving compatibility with various model loading and saving scenarios.
  • Pipeline Integration and Examples: I've updated the pipeline inference logic to ensure that SpinQuantModifier (and implicitly, other transform modifiers) correctly utilize the 'datafree' calibration pipeline, as these transformations do not require a full calibration dataset. New example scripts (compress_model.py, llama3_example.py) have been added to demonstrate the usage of these new transform capabilities.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in issue comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist is currently in preview and may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments to provide feedback.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

The code changes introduce the QuIPModifier for applying QuIP transforms. There are a few issues related to documentation, a critical bug in a Pydantic validator, type hint violations, and a potential issue with handling the ignore parameter. The proposed suggestions aim to fix these issues and improve code correctness and maintainability.

Comment on lines +55 to +56
def validate_not_implemented(cls, value, info: ValidationInfo):
raise NotImplementedError(f"{info.field_name} is not supported right now")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

The validate_not_implemented validator has two critical issues:

  1. It raises NotImplementedError even if randomize or learnable are explicitly set to False in a recipe. The check should only trigger if they are set to True.
  2. Pydantic validators must return the validated value. This implementation implicitly returns None, which will incorrectly overwrite the field's value.

These issues make the modifier unusable if these fields are present in the recipe.

Suggested change
def validate_not_implemented(cls, value, info: ValidationInfo):
raise NotImplementedError(f"{info.field_name} is not supported right now")
def validate_not_implemented(cls, value, info: ValidationInfo):
if value:
raise NotImplementedError(f"{info.field_name} is not supported right now")
return value

Comment on lines +58 to +63
def on_initialize(self, state: State, **kwargs) -> bool:
if self.transform_config is not None:
return True

self.transform_config = self._create_config()
return True
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The ignore field is a Union[str, List[str]], but the TransformArgs it's passed to likely expects a list of strings. If a single string (like the default "lm_head") is passed, it could be iterated over character by character, leading to incorrect behavior. It's safer to ensure self.ignore is always a list. This can be handled at the beginning of on_initialize.

Suggested change
def on_initialize(self, state: State, **kwargs) -> bool:
if self.transform_config is not None:
return True
self.transform_config = self._create_config()
return True
def on_initialize(self, state: State, **kwargs) -> bool:
if isinstance(self.ignore, str):
self.ignore = [self.ignore]
if self.transform_config is not None:
return True
self.transform_config = self._create_config()
return True

self.transform_config = self._create_config()
return True

def on_start(self, state: State, event: Event, **kwargs):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The on_start method is called with None for the event parameter from within on_event, but its signature event: Event does not allow None. This violates the type hint and could cause issues with static analysis tools. The signature should be updated to event: Optional[Event] = None to reflect that the event is optional.

Suggested change
def on_start(self, state: State, event: Event, **kwargs):
def on_start(self, state: State, event: Optional[Event] = None, **kwargs):

if not self.ended_:
self.on_end(state, None)

def on_end(self, state: State, event: Event, **kwargs):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Similar to on_start, the on_end method is called with None for the event parameter, but its signature event: Event does not allow for None. This should be corrected to event: Optional[Event] = None to align with its usage.

Suggested change
def on_end(self, state: State, event: Event, **kwargs):
def on_end(self, state: State, event: Optional[Event] = None, **kwargs):

@@ -0,0 +1,3 @@
# flake8: noqa

from .base import *
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

According to PEP 8, wildcard imports (from ... import *) should be avoided as they make it unclear which names are present in the namespace. While base.py defines __all__, making the import safer, it's more explicit and maintainable to import the required names directly.

Suggested change
from .base import *
from .base import QuIPModifier

[QuIP: 2-Bit Quantization of Large Language Models With Guarantees](https://arxiv.org/abs/2307.13304) # noqa: E501

Transforms (rotations) are extra layers added to a model which reduce the accuracy
loss induced by quantization. This is achived through "rotating" weights and
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

There is a typo in the docstring. "achived" should be "achieved".

Suggested change
loss induced by quantization. This is achived through "rotating" weights and
loss induced by quantization. This is achieved through "rotating" weights and

Comment on lines +34 to +36
`"random-matrix"` has more performance cost, but supports a much larger set of
sizes.
`"random-matrix"` has the greatest performance cost, but supports any size
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The documentation for transform_type appears to have a copy-paste error. It lists "random-matrix" twice and omits "random-hadamard", which is a valid option for transform_type. This can be confusing for users. The docstring should be updated to correctly describe all available transform types.

Suggested change
`"random-matrix"` has more performance cost, but supports a much larger set of
sizes.
`"random-matrix"` has the greatest performance cost, but supports any size
`"random-hadamard"` has more performance cost, but supports a much larger set of
sizes.
`"random-matrix"` has the greatest performance cost, but supports any size

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant