-
Notifications
You must be signed in to change notification settings - Fork 180
Closed
Labels
bugSomething isn't workingSomething isn't working
Description
Describe the bug
NVFP4 quantization fails on Qwen3-235B (MoE) because FX tracing hits torch.vmap proxies in transformers masking_utils
Expected behavior
It should work
Environment
package | version |
---|---|
llm-compressor | 0.6.0 (latest PyPI) |
transformers | 4.43.0 |
torch | 2.3.1 + CUDA 12.1 |
python | 3.11 |
GPU | A100 / H100 (same on both) |
To Reproduce
from transformers import AutoModelForCausalLM, AutoTokenizer
from llmcompressor import oneshot
from llmcompressor.modifiers.quantization import QuantizationModifier
import torch, datasets
model_id = "Qwen/Qwen3-235B-A22B"
model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype="auto")
tok = AutoTokenizer.from_pretrained(model_id)
ds = datasets.load_dataset("HuggingFaceH4/ultrachat_200k",
split="train_sft[:8]") # tiny calib
recipe = QuantizationModifier(targets="Linear", scheme="NVFP4", ignore=["lm_head"])
oneshot(model=model, dataset=ds, recipe=recipe, max_seq_length=2048,
num_calibration_samples=8) # --> TraceError / vmap-proxy error
Errors
ValueError: vmap(wrapped, in_dims=(0, None, None, None), ...)(<inputs>):
Got in_dim=0 for an input but the input is of type
<class 'transformers.utils.fx.HFProxy'>. We cannot vmap over non-Tensor arguments
Additional context
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working