Skip to content

[Tests] Spinquant dummy model tests #1647

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 30 commits into
base: main
Choose a base branch
from

Conversation

kylesayrs
Copy link
Collaborator

SUMMARY:
"please provide a brief summary"

TEST PLAN:
"please outline how the changes were tested"

kylesayrs and others added 30 commits June 23, 2025 19:34
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
Signed-off-by: Brian Dellabetta <bdellabe@redhat.com>
Signed-off-by: Brian Dellabetta <bdellabe@redhat.com>
Signed-off-by: Brian Dellabetta <bdellabe@redhat.com>
Signed-off-by: Brian Dellabetta <bdellabe@redhat.com>
Signed-off-by: Brian Dellabetta <bdellabe@redhat.com>
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
Copy link

👋 Hi! Thank you for contributing to llm-compressor. Please add the ready label when the PR is ready for review.

Note: This is required to complete the testing suite, please only add the label once the PR is code complete and local testing has been performed.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary of Changes

Hello @kylesayrs, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request integrates the SpinQuant transformation technique into the llmcompressor library, enhancing its capabilities for post-training quantization. It introduces a new modifier to apply these transformations, along with essential model preprocessing utilities for embedding normalization and norm-linear fusion. Furthermore, it refines the handling of tied word embeddings during model loading and saving, improving overall robustness and compatibility. The changes are validated through new examples and a dedicated test for a dummy model, confirming the successful integration and functionality of SpinQuant within a data-free compression pipeline.

Highlights

  • New SpinQuant Modifier: Introduced the SpinQuantModifier to apply rotation-based transformations (R1, R2) to LLM weights, making them more amenable to post-training quantization. This modifier orchestrates the necessary preprocessing steps.
  • Model Preprocessing Utilities: Added normalize_embedding and fuse_norm_linears functions within the modeling package. These utilities preprocess model layers (embeddings and norm-linear pairs) by centering embedding weights and fusing norm weights into subsequent linear layers, which is crucial for ensuring transform invariance in SpinQuant.
  • Robust Tied Embedding Handling: Refactored and improved the untie_word_embeddings utility to correctly handle and untie word embeddings, even when offloaded. This resolves previous issues and ensures broader compatibility with various model architectures and compression techniques.
  • Data-Free Pipeline Integration: Configured the oneshot pipeline to recognize SpinQuantModifier as a data-free operation. This allows SpinQuant to be applied without requiring a calibration dataset, streamlining the compression process.
  • Dummy Model Testing: Introduced a dedicated test with a simplified DummyModel that mimics an LLM architecture. This test validates the core SpinQuant transformation logic, ensuring its correctness and consistency on a minimal setup.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in issue comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist is currently in preview and may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments to provide feedback.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

The pull request introduces the SpinQuantModifier and related functionalities, including model fusion logic and utility scripts. I've found a couple of high-severity issues in the core modifier logic that could lead to incorrect behavior, specifically in the initialization and norm fusion methods. I've also suggested improvements to a new test file and a utility script to make them more robust. Addressing these points will significantly improve the quality and reliability of this new feature.

Comment on lines +57 to +61
if self.transform_config is not None:
if self.mappings is not None:
raise ValueError()

return True
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

There's a logic issue here. self.mappings is assigned on line 54, so the condition if self.mappings is not None: on line 58 will always be true when self.transform_config is provided. This causes a ValueError to be raised unconditionally and prevents a user-provided transform_config from ever being used.

The intention seems to be that if transform_config is provided, it should be used while skipping the automatic configuration building. The check on self.mappings seems incorrect. Consider removing the conditional ValueError to fix this bug.

        if self.transform_config is not None:
            return True

Comment on lines +116 to +135
def _fuse_norms(self, model: PreTrainedModel):
for mapping in self.norm_mappings:
targets = (mapping.norm, *mapping.linears)
matches = dict()

for name, module in model.named_modules():
# match until we get a full set
for target in targets:
if is_match(name, module, target):
if target in matches:
raise ValueError("Cannot match twice")
matches[target] = module

# once we have a full set, fuse and reset
if all(target in matches for target in targets):
fuse_norm_linears(
matches[mapping.norm],
(matches[target] for target in mapping.linears),
)
matches = dict()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The current implementation of _fuse_norms is not robust and can lead to incorrect fusions. It iterates through all modules for each norm mapping configuration. If a model block is missing a module that is expected to be fused, the matches dictionary will not be reset. This can cause it to incorrectly accumulate modules from different blocks, leading to the fusion of layers that should not be fused together.

For example, if layer_1 is missing v_proj, matches will contain input_layernorm, q_proj, and k_proj from layer_1. When the loop proceeds to layer_2, it might find layer_2.v_proj, which would complete the set of targets and then incorrectly fuse modules from both layer_1 and layer_2.

A more robust approach would be to first group all matching modules by the model block (e.g., by decoder layer index) and then apply the fusion logic to each group independently. This ensures that fusions only happen between modules within the same block.


def parse_args():
parser = argparse.ArgumentParser()
parser.add_argument("--model_id", type=str, help="Model stub to compress")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The --model_id argument is essential for this script to run. Consider making it a required argument to provide a clearer usage error to the user if it's not provided.

Suggested change
parser.add_argument("--model_id", type=str, help="Model stub to compress")
parser.add_argument("--model_id", type=str, required=True, help="Model stub to compress")

Comment on lines +104 to +107
transformed_output = model(input_ids)

print(f"Norm Diff {(orig_output-transformed_output).norm()}")
print(f"Norm {orig_output.norm()}, {transformed_output.norm()}")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This test calculates the original and transformed outputs but only prints their difference. Consider adding an assertion to automatically verify the behavior. This will make the test more effective at catching regressions.

Given the TODO on line 79, the difference might not be exactly zero, so torch.allclose with an appropriate tolerance would be suitable.

Suggested change
transformed_output = model(input_ids)
print(f"Norm Diff {(orig_output-transformed_output).norm()}")
print(f"Norm {orig_output.norm()}, {transformed_output.norm()}")
transformed_output = model(input_ids)
assert torch.allclose(orig_output, transformed_output, atol=1e-6), \
f"Norm Diff {(orig_output-transformed_output).norm()}"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants