[Bugfix] Only quant-compress modules with weight quantization #387

kylesayrs · 2025-07-10T02:32:46Z

Purpose

Skip compression for attention modules, which have a quantization config but do not have weights to quantize
- Without these changes, attention modules will be compressed. This is not a problem in itself (since they have no weights, they are no opped), but [Bugfix] Safeguard against submodule parameter deletion in decompress_model #347 fixes a bug which occurs when non-leaf modules are attempted to be compressed

Changes

map_module_to_scheme used to only quantize compress leaf modules. Now, this method only quantize compresses modules with weight quantization

Testing

KV cache tests pass with [Bugfix] infer_quantization_format when model only has activation quantization vllm-project/llm-compressor#1635

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

only compress modules with weight quantization

8e674e2

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

kylesayrs mentioned this pull request Jul 10, 2025

[Bugfix] infer_quantization_format when model only has activation quantization vllm-project/llm-compressor#1635

Open

kylesayrs changed the title ~~[Bugfix] Only comp~~ [Bugfix] Only quant-compress modules with weight quantization Jul 10, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Bugfix] Only quant-compress modules with weight quantization #387

[Bugfix] Only quant-compress modules with weight quantization #387

Uh oh!

kylesayrs commented Jul 10, 2025 •

edited

Loading

Uh oh!

Uh oh!

[Bugfix] Only quant-compress modules with weight quantization #387

Are you sure you want to change the base?

[Bugfix] Only quant-compress modules with weight quantization #387

Uh oh!

Conversation

kylesayrs commented Jul 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Changes

Testing

Uh oh!

Uh oh!

kylesayrs commented Jul 10, 2025 •

edited

Loading