Skip to content

[Transforms] Initial Implementation #277

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

dsikka
Copy link
Collaborator

@dsikka dsikka commented Mar 11, 2025

Summary

Add initial support for transforms in compressed-tensors. The following PRs will enable the initial functionality:

  1. [Transforms] Transform Registry Support #274 Transform Registry and Hadamard Utils support
  2. [Transforms] Transform Arg, Scheme, and Data support #275 Transform Arg, Scheme, Config and Data support
  3. [Transforms] Apply, serialize, deserialize  #276 Enable support to apply transforms, serialize the transform parameters to disk, and deserialize for evaluation

Through this functionality, users can define a recipe targeting layers and their specific parameters and activations with transforms. Transforms are loaded from a registry, attached to the layer being targeted, and then applied to a layer's weights (for now) as well as during QDQ during forward passes (such as during generation). Once quantized, the transforms along with a transform_config are saved to disk. They can then be deserialized and loaded for evals.

Examples:

  • With this functionality, we can now apply "QuIP-style" weight transforms
  • QuIP applies two transforms, which have the following definition: # U(W)V.T where U and V are hadamard rotation matrices and W is the linear weight. The transforms are not fused into the weights.

For a Llama-3.1-1b-Instruct, the following recipe was applied and tested to apply QuIP style rotations to all the linear weights as part of Int4 and Int8 quantization.

Transform Args:

ignore = ["re:.*.mlp.down_proj$", "lm_head"]
module_targets = [ModuleTarget.WEIGHT.value]

# Start with a processed
targets = ["Linear"]  # 2048 * 2048
v_linear_args = TransformationArgs(
    targets=targets,
    module_targets=module_targets,
    ignore=ignore,
    call_args={"transpose": True, "first": False},
)

targets = ["re:.*.mlp.down_proj$"]  # 8192 * 8192
v_down_proj = TransformationArgs(
    targets=targets,
    module_targets=module_targets,
    call_args={"transpose": True, "first": False},
)

targets = [
    "re:.*.attn.q_proj$",
    "re:.*.attn.o_proj$",
    "re:.*.mlp.down_proj$",
]  # 2048 * 2048
u_q_o_down_proj = TransformationArgs(
    targets=targets,
    module_targets=module_targets,
)

targets = ["re:.*.mlp.gate_proj$", "re:.*.mlp.up_proj$"]  # 8192 * 8192
u_gate_up_proj = TransformationArgs(
    targets=targets,
    module_targets=module_targets,
)

targets = ["re:.*.attn.k_proj$", "re:.*.attn.v_proj$"]  # 512 * 512
u_k_v_proj = TransformationArgs(
    targets=targets,
    module_targets=module_targets,
)

Transform Schemes, defining the different hadamard rotations

u_scheme_q_o_down_proj = TransformationScheme(
    transform_type="random-hadamard",
    groups=[u_q_o_down_proj],
    transform_creation_args={"size": 2048},
)

u_scheme_gate_up_proj = TransformationScheme(
    transform_type="random-hadamard",
    groups=[u_gate_up_proj],
    transform_creation_args={"size": 8192},
)

u_scheme_k_v_proj = TransformationScheme(
    transform_type="random-hadamard",
    groups=[u_k_v_proj],
    transform_creation_args={"size": 512},
)

v1 = TransformationScheme(
    transform_type="random-hadamard",
    groups=[v_linear_args],
    transform_creation_args={"size": 2048},
)

v2 = TransformationScheme(
    transform_type="random-hadamard",
    groups=[v_down_proj],
    transform_creation_args={"size": 8192},
)

Transform Config, passed into the QuantizationModifier

# QuIP Recipe with weight only quantization
config = TransformationConfig(
    transform_groups={
        "u_transform_k_v_proj": u_scheme_k_v_proj,
        "u_transform_q_o_down_proj": u_scheme_q_o_down_proj,
        "u_transform_gate_up_proj": u_scheme_gate_up_proj,
        "v1": v1, 
        "v2": v2
    }
)
  • Once set-up, the TransformConfig can be passed to the QuantizationModifier to start, where it will be processed and applied to the layers through these compressed-tensors PRs
  • Sample serialized model: nm-testing/Llama-3.2-1B-Instruct-W4A16-uncompressed-mse-hadamard Note: this model is saved dense as there is currently no support to fuse in the transforms and will be handled in a follow-up

Next Steps:

  • Activation Transform Support
  • Fusing in transforms or further compressing them when saving to disk

Required PRs:

@dsikka dsikka marked this pull request as ready for review March 11, 2025 21:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant