[Transforms] Initial Implementation #277

dsikka · 2025-03-11T21:05:49Z

Summary

Add initial support for transforms in compressed-tensors. The following PRs will enable the initial functionality:

[Transforms] Transform Registry Support #274 Transform Registry and Hadamard Utils support
[Transforms] Transform Arg, Scheme, and Data support #275 Transform Arg, Scheme, Config and Data support
[Transforms] Apply, serialize, deserialize #276 Enable support to apply transforms, serialize the transform parameters to disk, and deserialize for evaluation
[Transforms] Enable shared memory and introduce permutations #284 Enable permuations; make a series of updates to the registry to cache transforms
[Transforms][WIP] Update wrapped_forward to use hooks; apply transforms to activations #286 Remove wrapped_forward in favour of hooks; support activation qaunt transforms

Through this functionality, users can define a recipe targeting layers and their specific parameters and activations with transforms. Transforms are loaded from a registry, attached to the layer being targeted, and then applied to a layer's weights (for now) as well as during QDQ during forward passes (such as during generation). Once quantized, the transforms along with a transform_config are saved to disk. They can then be deserialized and loaded for evals.

Examples:

With this functionality, we can now apply "QuIP-style" weight transforms
QuIP applies two transforms, which have the following definition: # U(W)V.T where U and V are hadamard rotation matrices and W is the linear weight. The transforms are not fused into the weights.

For a Llama-3.1-1b-Instruct, the following recipe was applied and tested to apply QuIP style rotations to all the linear weights as part of Int4 and Int8 quantization.

Transform Args:

ignore = ["re:.*.mlp.down_proj$", "lm_head"]
module_targets = [ModuleTarget.WEIGHT.value]

# Start with a processed
targets = ["Linear"]  # 2048 * 2048
v_linear_args = TransformationArgs(
    targets=targets,
    module_targets=module_targets,
    ignore=ignore,
    call_args={"transpose": True, "first": False},
)

targets = ["re:.*.mlp.down_proj$"]  # 8192 * 8192
v_down_proj = TransformationArgs(
    targets=targets,
    module_targets=module_targets,
    call_args={"transpose": True, "first": False},
)

targets = [
    "re:.*.attn.q_proj$",
    "re:.*.attn.o_proj$",
    "re:.*.mlp.down_proj$",
]  # 2048 * 2048
u_q_o_down_proj = TransformationArgs(
    targets=targets,
    module_targets=module_targets,
)

targets = ["re:.*.mlp.gate_proj$", "re:.*.mlp.up_proj$"]  # 8192 * 8192
u_gate_up_proj = TransformationArgs(
    targets=targets,
    module_targets=module_targets,
)

targets = ["re:.*.attn.k_proj$", "re:.*.attn.v_proj$"]  # 512 * 512
u_k_v_proj = TransformationArgs(
    targets=targets,
    module_targets=module_targets,
)

Transform Schemes, defining the different hadamard rotations

u_scheme_q_o_down_proj = TransformationScheme(
    transform_type="random-hadamard",
    groups=[u_q_o_down_proj],
    transform_creation_args={"size": 2048},
)

u_scheme_gate_up_proj = TransformationScheme(
    transform_type="random-hadamard",
    groups=[u_gate_up_proj],
    transform_creation_args={"size": 8192},
)

u_scheme_k_v_proj = TransformationScheme(
    transform_type="random-hadamard",
    groups=[u_k_v_proj],
    transform_creation_args={"size": 512},
)

v1 = TransformationScheme(
    transform_type="random-hadamard",
    groups=[v_linear_args],
    transform_creation_args={"size": 2048},
)

v2 = TransformationScheme(
    transform_type="random-hadamard",
    groups=[v_down_proj],
    transform_creation_args={"size": 8192},
)

Transform Config, passed into the QuantizationModifier

# QuIP Recipe with weight only quantization
config = TransformationConfig(
    transform_groups={
        "u_transform_k_v_proj": u_scheme_k_v_proj,
        "u_transform_q_o_down_proj": u_scheme_q_o_down_proj,
        "u_transform_gate_up_proj": u_scheme_gate_up_proj,
        "v1": v1, 
        "v2": v2
    }
)

Once set-up, the TransformConfig can be passed to the QuantizationModifier to start, where it will be processed and applied to the layers through these compressed-tensors PRs
Sample serialized model: nm-testing/Llama-3.2-1B-Instruct-W4A16-uncompressed-mse-hadamard Note: this model is saved dense as there is currently no support to fuse in the transforms and will be handled in a follow-up

Next Steps:

Activation Transform Support
Fusing in transforms or further compressing them when saving to disk

Required PRs:

[Transforms] Apply, serialize, deserialize #276 Can land but wont be functional until the following transformers PR lands:
[WIP] Add support to load models with transforms huggingface/transformers#36621

initial commit

bc6718b

dsikka marked this pull request as ready for review March 11, 2025 21:31

dsikka mentioned this pull request May 5, 2025

[Transforms] Enable transforms to be applied to weights during quantization vllm-project/llm-compressor#1243

Draft

kylesayrs marked this pull request as draft July 14, 2025 18:41

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Transforms] Initial Implementation #277

[Transforms] Initial Implementation #277

dsikka commented Mar 11, 2025 •

edited

Loading

Uh oh!

Uh oh!

[Transforms] Initial Implementation #277

Are you sure you want to change the base?

[Transforms] Initial Implementation #277

Conversation

dsikka commented Mar 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Examples:

Transform Args:

Transform Schemes, defining the different hadamard rotations

Transform Config, passed into the QuantizationModifier

Next Steps:

Required PRs:

Uh oh!

Uh oh!

dsikka commented Mar 11, 2025 •

edited

Loading