[WIP] [Transform] q_attn and k_cache locations #334

kylesayrs · 2025-05-31T05:49:00Z

Because attention is very standardized in transformers via the AttentionInterface, this provides a convenient way to hook into attention and apply transforms

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

dsikka

Is this ready for review?

kylesayrs · 2025-07-10T14:41:27Z

I prefer this
https://github.com/vllm-project/llm-compressor/blob/7f8366706b7f22e51c9f2d9a50881b7cdd892617/src/llmcompressor/modifiers/quantization/attention.py

kylesayrs changed the base branch from main to kylesayrs/transform_factory May 31, 2025 05:49

Base automatically changed from kylesayrs/transform_factory to main June 10, 2025 15:24

wip concept

da36ca6

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

kylesayrs force-pushed the kylesayrs/transform_attn_locations branch from 96dee4e to da36ca6 Compare June 12, 2025 04:23

dsikka reviewed Jul 3, 2025

View reviewed changes

kylesayrs closed this Jul 10, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[WIP] [Transform] q_attn and k_cache locations #334

[WIP] [Transform] q_attn and k_cache locations #334

kylesayrs commented May 31, 2025 •

edited

Loading

Uh oh!

dsikka left a comment

Uh oh!

kylesayrs commented Jul 10, 2025

Uh oh!

Uh oh!

[WIP] [Transform] q_attn and k_cache locations #334

[WIP] [Transform] q_attn and k_cache locations #334

Conversation

kylesayrs commented May 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dsikka left a comment

Choose a reason for hiding this comment

Uh oh!

kylesayrs commented Jul 10, 2025

Uh oh!

Uh oh!

kylesayrs commented May 31, 2025 •

edited

Loading