[Tests] Spinquant dummy model tests #1647

kylesayrs · 2025-07-15T16:47:09Z

SUMMARY:
"please provide a brief summary"

TEST PLAN:
"please outline how the changes were tested"

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

Signed-off-by: Brian Dellabetta <bdellabe@redhat.com>

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

github-actions · 2025-07-15T16:47:17Z

👋 Hi! Thank you for contributing to llm-compressor. Please add the ready label when the PR is ready for review.

Note: This is required to complete the testing suite, please only add the label once the PR is code complete and local testing has been performed.

gemini-code-assist

Summary of Changes

Hello @kylesayrs, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request integrates the SpinQuant transformation technique into the llmcompressor library, enhancing its capabilities for post-training quantization. It introduces a new modifier to apply these transformations, along with essential model preprocessing utilities for embedding normalization and norm-linear fusion. Furthermore, it refines the handling of tied word embeddings during model loading and saving, improving overall robustness and compatibility. The changes are validated through new examples and a dedicated test for a dummy model, confirming the successful integration and functionality of SpinQuant within a data-free compression pipeline.

Highlights

New SpinQuant Modifier: Introduced the SpinQuantModifier to apply rotation-based transformations (R1, R2) to LLM weights, making them more amenable to post-training quantization. This modifier orchestrates the necessary preprocessing steps.
Model Preprocessing Utilities: Added normalize_embedding and fuse_norm_linears functions within the modeling package. These utilities preprocess model layers (embeddings and norm-linear pairs) by centering embedding weights and fusing norm weights into subsequent linear layers, which is crucial for ensuring transform invariance in SpinQuant.
Robust Tied Embedding Handling: Refactored and improved the untie_word_embeddings utility to correctly handle and untie word embeddings, even when offloaded. This resolves previous issues and ensures broader compatibility with various model architectures and compression techniques.
Data-Free Pipeline Integration: Configured the oneshot pipeline to recognize SpinQuantModifier as a data-free operation. This allows SpinQuant to be applied without requiring a calibration dataset, streamlining the compression process.
Dummy Model Testing: Introduced a dedicated test with a simplified DummyModel that mimics an LLM architecture. This test validates the core SpinQuant transformation logic, ensuring its correctness and consistency on a minimal setup.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in issue comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist is currently in preview and may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments to provide feedback.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

The pull request introduces the SpinQuantModifier and related functionalities, including model fusion logic and utility scripts. I've found a couple of high-severity issues in the core modifier logic that could lead to incorrect behavior, specifically in the initialization and norm fusion methods. I've also suggested improvements to a new test file and a utility script to make them more robust. Addressing these points will significantly improve the quality and reliability of this new feature.

gemini-code-assist · 2025-07-15T16:50:55Z

src/llmcompressor/modifiers/transform/spinquant/base.py

+        if self.transform_config is not None:
+            if self.mappings is not None:
+                raise ValueError()
+
+            return True


There's a logic issue here. self.mappings is assigned on line 54, so the condition if self.mappings is not None: on line 58 will always be true when self.transform_config is provided. This causes a ValueError to be raised unconditionally and prevents a user-provided transform_config from ever being used.

The intention seems to be that if transform_config is provided, it should be used while skipping the automatic configuration building. The check on self.mappings seems incorrect. Consider removing the conditional ValueError to fix this bug.

if self.transform_config is not None: return True

gemini-code-assist · 2025-07-15T16:50:55Z

src/llmcompressor/modifiers/transform/spinquant/base.py

+    def _fuse_norms(self, model: PreTrainedModel):
+        for mapping in self.norm_mappings:
+            targets = (mapping.norm, *mapping.linears)
+            matches = dict()
+
+            for name, module in model.named_modules():
+                # match until we get a full set
+                for target in targets:
+                    if is_match(name, module, target):
+                        if target in matches:
+                            raise ValueError("Cannot match twice")
+                        matches[target] = module
+
+                # once we have a full set, fuse and reset
+                if all(target in matches for target in targets):
+                    fuse_norm_linears(
+                        matches[mapping.norm],
+                        (matches[target] for target in mapping.linears),
+                    )
+                    matches = dict()


The current implementation of _fuse_norms is not robust and can lead to incorrect fusions. It iterates through all modules for each norm mapping configuration. If a model block is missing a module that is expected to be fused, the matches dictionary will not be reset. This can cause it to incorrectly accumulate modules from different blocks, leading to the fusion of layers that should not be fused together.

For example, if layer_1 is missing v_proj, matches will contain input_layernorm, q_proj, and k_proj from layer_1. When the loop proceeds to layer_2, it might find layer_2.v_proj, which would complete the set of targets and then incorrectly fuse modules from both layer_1 and layer_2.

A more robust approach would be to first group all matching modules by the model block (e.g., by decoder layer index) and then apply the fusion logic to each group independently. This ensures that fusions only happen between modules within the same block.

gemini-code-assist · 2025-07-15T16:50:55Z

compress_model.py

+
+def parse_args():
+    parser = argparse.ArgumentParser()
+    parser.add_argument("--model_id", type=str, help="Model stub to compress")


The --model_id argument is essential for this script to run. Consider making it a required argument to provide a clearer usage error to the user if it's not provided.

Suggested change

parser.add_argument("--model_id", type=str, help="Model stub to compress")

parser.add_argument("--model_id", type=str, required=True, help="Model stub to compress")

gemini-code-assist · 2025-07-15T16:50:56Z

tests/llmcompressor/modifiers/transform/test_dummy_model.py

+    transformed_output = model(input_ids)
+
+    print(f"Norm Diff {(orig_output-transformed_output).norm()}")
+    print(f"Norm {orig_output.norm()}, {transformed_output.norm()}")


This test calculates the original and transformed outputs but only prints their difference. Consider adding an assertion to automatically verify the behavior. This will make the test more effective at catching regressions.

Given the TODO on line 79, the difference might not be exactly zero, so torch.allclose with an appropriate tolerance would be suitable.

Suggested change

transformed_output = model(input_ids)

print(f"Norm Diff {(orig_output-transformed_output).norm()}")

print(f"Norm {orig_output.norm()}, {transformed_output.norm()}")

transformed_output = model(input_ids)

assert torch.allclose(orig_output, transformed_output, atol=1e-6), \

f"Norm Diff {(orig_output-transformed_output).norm()}"

kylesayrs and others added 30 commits June 23, 2025 19:34

wip

ba617db

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

use random-hadamard, add correctness tests

2f5b1c8

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

add correctness test, note that precision makes a large difference

3aa35e7

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

add on lifecycle methods

b6c088e

Signed-off-by: Brian Dellabetta <bdellabe@redhat.com>

Merge branch 'main' into kylesayrs/transform-modifier

d1eb2a1

TransformModifier with SpinQuant R1&R2

3207124

Signed-off-by: Brian Dellabetta <bdellabe@redhat.com>

spinquant and quip_online, running but outputting gibberish

a88ca3c

Signed-off-by: Brian Dellabetta <bdellabe@redhat.com>

updated example

5bd51df

Signed-off-by: Brian Dellabetta <bdellabe@redhat.com>

DummyModel script

3c216dd

Signed-off-by: Brian Dellabetta <bdellabe@redhat.com>

implement fuse_norm_linears

bbcdc8c

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

Merge branch 'kylesayrs/fuse-helpers' into bdellabe/transform-modifier

bd7f4d5

R1 working

f5c2150

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

add r2, increase precision

dc5c30c

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

spinquant modifier

7172c26

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

remove space

9298e82

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

use iterable

f77226d

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

add rotation validation

fdb64b5

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

embedding fusion

5daa2d5

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

add missing norm fusion

0e9af7b

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

use norm mappings

fce83be

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

break into separate files

a979f8a

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

small cleanup

4cab29e

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

cleanup

f1cc987

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

more cleanup

a7bb2e2

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

make new weight on cpu

0cf0188

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

standardize, make modifier serializable

53ea307

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

add compress model script

4b4257f

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

use untie_word_embeddings

dc7ac1a

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

style

8542f8d

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

better registery logic

b1e637e

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

gemini-code-assist bot reviewed Jul 15, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Tests] Spinquant dummy model tests #1647

[Tests] Spinquant dummy model tests #1647

kylesayrs commented Jul 15, 2025

Uh oh!

github-actions bot commented Jul 15, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Jul 15, 2025

Uh oh!

gemini-code-assist bot Jul 15, 2025

Uh oh!

gemini-code-assist bot Jul 15, 2025

Uh oh!

gemini-code-assist bot Jul 15, 2025

Uh oh!

Uh oh!

	parser.add_argument("--model_id", type=str, help="Model stub to compress")
	parser.add_argument("--model_id", type=str, required=True, help="Model stub to compress")

[Tests] Spinquant dummy model tests #1647

Are you sure you want to change the base?

[Tests] Spinquant dummy model tests #1647

Conversation

kylesayrs commented Jul 15, 2025

Uh oh!

github-actions bot commented Jul 15, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Jul 15, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jul 15, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jul 15, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jul 15, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!