Skip to content

[Bugfix] infer_quantization_format when model only has activation quantization #1635

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Jul 11, 2025
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -3,10 +3,7 @@
from compressed_tensors import CompressionFormat
from compressed_tensors.config import SparsityStructure
from compressed_tensors.quantization import QuantizationStrategy, QuantizationType
from compressed_tensors.quantization.utils import (
is_model_quantized,
is_module_quantized,
)
from compressed_tensors.quantization.utils import is_module_quantized

__all__ = ["infer_quantization_format"]

Expand Down Expand Up @@ -47,14 +44,14 @@ def infer_quantization_format(
:param save_compressed: used to infer a quantization format if None is provided
:return compression format appropriate for model
"""
if not is_model_quantized(model):
return None

if quantization_format is not None:
return quantization_format

weight_args, input_args = _get_unique_quant_args(model)
if len(weight_args) <= 0:
return None

if save_compressed:
weight_args, input_args = _get_unique_quant_args(model)
is_24_structure = (
SparsityStructure(sparsity_structure) == SparsityStructure.TWO_FOUR
)
Expand Down