Skip to content

RuntimeError when loading llmcompressor W8A8 quantized model: int8 dtype in weight initialization #390

@AdelineXinyi

Description

@AdelineXinyi

I'm trying to load the quantized model [RedHatAI/Qwen2.5-VL-7B-Instruct-quantized.w8a8](https://huggingface.co/RedHatAI/Qwen2.5-VL-7B-Instruct-quantized.w8a8) but encountering a dtype compatibility issue during model initialization. The model appears to be quantized using llmcompressor with W8A8 quantization scheme.

Note: I need to load this model without vLLM because I may need to add custom hooks for my research, so I'm looking for a direct loading method using transformers/llmcompressor.

Error Message

RuntimeError: expected a floating-point or complex dtype, but got dtype=torch.int8

Full Stack Trace:

File "/transformers/models/qwen2_5_vl/modeling_qwen2_5_vl.py", line 366, in _init_weights
    module.weight.data.normal_(mean=0.0, std=std)
File "/torch/_refs/__init__.py", line 6214, in normal_
    return normal(mean, std, self.shape, out=self, generator=generator)
...
RuntimeError: expected a floating-point or complex dtype, but got dtype=torch.int8

Traceback

The error occurs during model weight initialization where transformers tries to call normal_() on int8 tensors. The normal_() function in PyTorch only works with floating-point tensors, but the quantized model contains int8 weights.

Specific failure point:

  • File: modeling_qwen2_5_vl.py, line 366
  • Function: _init_weights()
  • Operation: module.weight.data.normal_(mean=0.0, std=std)
  • Issue: Trying to apply normal distribution to int8 tensors

Model Information

Based on the model's config.json:

  • Quantization method: compressed-tensors
  • Format: int-quantized
  • Scheme: W8A8 (8-bit weights and activations)
  • Base model: Qwen/Qwen2.5-VL-7B-Instruct
  • Compression ratio: ~1.2x
  • Ignored layers: All visual layers (visual.blocks.*, visual.merger.*, lm_head)

What I've Tried

1. llmcompressor methods:

# Method 1: TraceableQwen2_5_VLForConditionalGeneration
from llmcompressor.transformers.tracing import TraceableQwen2_5_VLForConditionalGeneration
model = TraceableQwen2_5_VLForConditionalGeneration.from_pretrained(
    model_path, device_map="auto", torch_dtype="auto", trust_remote_code=True
)

# Method 2: SparseAutoModelForCausalLM  
from llmcompressor.transformers import SparseAutoModelForCausalLM
model = SparseAutoModelForCausalLM.from_pretrained(
    model_path, device_map="auto", torch_dtype="auto", trust_remote_code=True
)

2. Standard transformers methods:

# Method 3: Various dtype configurations
model = Qwen2_5_VLForConditionalGeneration.from_pretrained(
    model_path,
    torch_dtype=torch.bfloat16,  # Also tried: torch.float16, "auto", None
    trust_remote_code=True,
    device_map="auto"
)

# Method 4: AutoModelForCausalLM
model = AutoModelForCausalLM.from_pretrained(
    model_path, trust_remote_code=True, torch_dtype="auto"
)

All methods fail at the same weight initialization step, so I wonder should the model be loaded with _fast_init=False or other special parameters?

Additional Observations

  1. Warning about ignored layers: The loader warns about missing visual layers, but this seems expected since they were ignored during quantization
  2. Model files exist: The quantized model directory contains the expected .safetensors files and configuration
  3. Original model works: The base Qwen/Qwen2.5-VL-7B-Instruct loads and works perfectly

Environment

  • Python: 3.10
  • PyTorch: 2.7.0+cu126
  • Transformers: 4.52.4
  • LLMCompressor: 0.6.0
  • Compressed-tensors: 0.10.2

This model was likely created using llmcompressor's oneshot quantization:

from llmcompressor.modifiers.quantization import GPTQModifier
from llmcompressor.transformers import oneshot

recipe = [
    GPTQModifier(
        targets="Linear",
        scheme="W8A8", 
        sequential_targets=["Qwen2_5_VLDecoderLayer"],
        ignore=["lm_head", "re:visual.*"],
    ),
]

If this is more of an llmcompressor-specific model loading issue rather than a transformers compatibility issue, please let me know and I'll file this issue in the llmcompressor repository instead.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions