[Transform] QuIP Modifier #1648

kylesayrs · 2025-07-15T18:57:11Z

No description provided.

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

github-actions · 2025-07-15T18:57:18Z

👋 Hi! Thank you for contributing to llm-compressor. Please add the ready label when the PR is ready for review.

Note: This is required to complete the testing suite, please only add the label once the PR is code complete and local testing has been performed.

gemini-code-assist

Summary of Changes

Hello @kylesayrs, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces a new QuIP transformation modifier and significantly enhances the existing SpinQuant modifier, providing more advanced techniques for preparing large language models for quantization. It includes foundational utilities for model layer manipulation and improves the handling of word embeddings, all aimed at reducing accuracy degradation during model compression. New examples and tests are also included to demonstrate and validate these capabilities.

Highlights

New QuIP Modifier: I've introduced a new QuIPModifier under llmcompressor.modifiers.transform. This modifier enables the application of QuIP (Quantization with Incoherence and Permutation) transforms to large language models. These transforms, based on recent research, aim to reduce accuracy loss during quantization by rotating weights and activations into a more quantization-friendly space. It supports 'hadamard', 'random-hadamard', and 'random-matrix' transform types.
Enhanced SpinQuant Modifier: The existing SpinQuantModifier has been significantly enhanced. It now leverages new utility functions for normalizing embeddings and fusing norm layers into linear layers, which are crucial pre-processing steps for effective transformations. I've also added more robust mapping capabilities to precisely target layers for R1 and R2 rotations.
Model Pre-processing Utilities: I've added new core utilities in src/llmcompressor/modeling/fuse.py for normalize_embedding and fuse_norm_linears. These functions prepare model layers for transformations by normalizing embedding weights and fusing normalization layers into subsequent linear layers, ensuring transform invariance and improving quantization results.
Improved Word Embedding Untying: The patch_tied_tensors_bug function has been renamed to untie_word_embeddings and its implementation has been refined. This ensures that word embeddings are correctly untied, even when model parameters are offloaded, addressing a known issue in Hugging Face Transformers and improving compatibility with various model loading and saving scenarios.
Pipeline Integration and Examples: I've updated the pipeline inference logic to ensure that SpinQuantModifier (and implicitly, other transform modifiers) correctly utilize the 'datafree' calibration pipeline, as these transformations do not require a full calibration dataset. New example scripts (compress_model.py, llama3_example.py) have been added to demonstrate the usage of these new transform capabilities.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in issue comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist is currently in preview and may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments to provide feedback.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

The code changes introduce the QuIPModifier for applying QuIP transforms. There are a few issues related to documentation, a critical bug in a Pydantic validator, type hint violations, and a potential issue with handling the ignore parameter. The proposed suggestions aim to fix these issues and improve code correctness and maintainability.

gemini-code-assist · 2025-07-15T18:59:59Z

src/llmcompressor/modifiers/transform/quip/base.py

+    def validate_not_implemented(cls, value, info: ValidationInfo):
+        raise NotImplementedError(f"{info.field_name} is not supported right now")


The validate_not_implemented validator has two critical issues:

It raises NotImplementedError even if randomize or learnable are explicitly set to False in a recipe. The check should only trigger if they are set to True.

Pydantic validators must return the validated value. This implementation implicitly returns None, which will incorrectly overwrite the field's value.

These issues make the modifier unusable if these fields are present in the recipe.

Suggested change

def validate_not_implemented(cls, value, info: ValidationInfo):

raise NotImplementedError(f"{info.field_name} is not supported right now")

def validate_not_implemented(cls, value, info: ValidationInfo):

if value:

raise NotImplementedError(f"{info.field_name} is not supported right now")

return value

gemini-code-assist · 2025-07-15T19:00:00Z

src/llmcompressor/modifiers/transform/quip/base.py

+    def on_initialize(self, state: State, **kwargs) -> bool:
+        if self.transform_config is not None:
+            return True
+
+        self.transform_config = self._create_config()
+        return True


The ignore field is a Union[str, List[str]], but the TransformArgs it's passed to likely expects a list of strings. If a single string (like the default "lm_head") is passed, it could be iterated over character by character, leading to incorrect behavior. It's safer to ensure self.ignore is always a list. This can be handled at the beginning of on_initialize.

Suggested change

def on_initialize(self, state: State, **kwargs) -> bool:

if self.transform_config is not None:

return True

self.transform_config = self._create_config()

return True

def on_initialize(self, state: State, **kwargs) -> bool:

if isinstance(self.ignore, str):

self.ignore = [self.ignore]

if self.transform_config is not None:

return True

self.transform_config = self._create_config()

return True

gemini-code-assist · 2025-07-15T19:00:00Z

src/llmcompressor/modifiers/transform/quip/base.py

+        self.transform_config = self._create_config()
+        return True
+
+    def on_start(self, state: State, event: Event, **kwargs):


The on_start method is called with None for the event parameter from within on_event, but its signature event: Event does not allow None. This violates the type hint and could cause issues with static analysis tools. The signature should be updated to event: Optional[Event] = None to reflect that the event is optional.

Suggested change

def on_start(self, state: State, event: Event, **kwargs):

def on_start(self, state: State, event: Optional[Event] = None, **kwargs):

gemini-code-assist · 2025-07-15T19:00:00Z

src/llmcompressor/modifiers/transform/quip/base.py

+            if not self.ended_:
+                self.on_end(state, None)
+
+    def on_end(self, state: State, event: Event, **kwargs):


Similar to on_start, the on_end method is called with None for the event parameter, but its signature event: Event does not allow for None. This should be corrected to event: Optional[Event] = None to align with its usage.

Suggested change

def on_end(self, state: State, event: Event, **kwargs):

def on_end(self, state: State, event: Optional[Event] = None, **kwargs):

gemini-code-assist · 2025-07-15T19:00:00Z

src/llmcompressor/modifiers/transform/quip/__init__.py

@@ -0,0 +1,3 @@
+# flake8: noqa
+
+from .base import *


According to PEP 8, wildcard imports (from ... import *) should be avoided as they make it unclear which names are present in the namespace. While base.py defines __all__, making the import safer, it's more explicit and maintainable to import the required names directly.

Suggested change

from .base import *

from .base import QuIPModifier

gemini-code-assist · 2025-07-15T19:00:00Z

src/llmcompressor/modifiers/transform/quip/base.py

+    [QuIP: 2-Bit Quantization of Large Language Models With Guarantees](https://arxiv.org/abs/2307.13304)  # noqa: E501
+
+    Transforms (rotations) are extra layers added to a model which reduce the accuracy
+    loss induced by quantization. This is achived through "rotating" weights and


There is a typo in the docstring. "achived" should be "achieved".

Suggested change

loss induced by quantization. This is achived through "rotating" weights and

loss induced by quantization. This is achieved through "rotating" weights and

gemini-code-assist · 2025-07-15T19:00:00Z

src/llmcompressor/modifiers/transform/quip/base.py

+        `"random-matrix"` has more performance cost, but supports a much larger set of
+            sizes.
+        `"random-matrix"` has the greatest performance cost, but supports any size


The documentation for transform_type appears to have a copy-paste error. It lists "random-matrix" twice and omits "random-hadamard", which is a valid option for transform_type. This can be confusing for users. The docstring should be updated to correctly describe all available transform types.

Suggested change

`"random-matrix"` has more performance cost, but supports a much larger set of

sizes.

`"random-matrix"` has the greatest performance cost, but supports any size

`"random-hadamard"` has more performance cost, but supports a much larger set of

sizes.

`"random-matrix"` has the greatest performance cost, but supports any size

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

implement quip

344aa71

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

kylesayrs changed the base branch from main to bdellabe/transform-modifier July 15, 2025 18:57

gemini-code-assist bot reviewed Jul 15, 2025

View reviewed changes

add example, cleanup

7805321

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Transform] QuIP Modifier #1648

[Transform] QuIP Modifier #1648

kylesayrs commented Jul 15, 2025

Uh oh!

github-actions bot commented Jul 15, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Jul 15, 2025

Uh oh!

gemini-code-assist bot Jul 15, 2025

Uh oh!

gemini-code-assist bot Jul 15, 2025

Uh oh!

gemini-code-assist bot Jul 15, 2025

Uh oh!

gemini-code-assist bot Jul 15, 2025

Uh oh!

gemini-code-assist bot Jul 15, 2025

Uh oh!

gemini-code-assist bot Jul 15, 2025

Uh oh!

Uh oh!

		def validate_not_implemented(cls, value, info: ValidationInfo):
		raise NotImplementedError(f"{info.field_name} is not supported right now")

	def on_start(self, state: State, event: Event, **kwargs):
	def on_start(self, state: State, event: Optional[Event] = None, **kwargs):

	def on_end(self, state: State, event: Event, **kwargs):
	def on_end(self, state: State, event: Optional[Event] = None, **kwargs):

	loss induced by quantization. This is achived through "rotating" weights and
	loss induced by quantization. This is achieved through "rotating" weights and

[Transform] QuIP Modifier #1648

Are you sure you want to change the base?

[Transform] QuIP Modifier #1648

Conversation

kylesayrs commented Jul 15, 2025

Uh oh!

github-actions bot commented Jul 15, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Jul 15, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jul 15, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jul 15, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jul 15, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jul 15, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jul 15, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jul 15, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!