nvfp4 w4a4 doesn't work on Qwen/Qwen3-235b-A22B

**Describe the bug**
NVFP4 quantization fails on Qwen3-235B (MoE) because FX tracing hits torch.vmap proxies in transformers masking_utils

**Expected behavior**
It should work

**Environment**
| package | version |
|---------|---------|
| llm-compressor | 0.6.0 (latest PyPI) |
| transformers   | 4.43.0 |
| torch          | 2.3.1 + CUDA 12.1 |
| python         | 3.11 |
| GPU            | A100 / H100 (same on both) |

**To Reproduce**

```python
from transformers import AutoModelForCausalLM, AutoTokenizer
from llmcompressor import oneshot
from llmcompressor.modifiers.quantization import QuantizationModifier
import torch, datasets

model_id  = "Qwen/Qwen3-235B-A22B"
model     = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype="auto")
tok       = AutoTokenizer.from_pretrained(model_id)
ds        = datasets.load_dataset("HuggingFaceH4/ultrachat_200k",
                                  split="train_sft[:8]")  # tiny calib

recipe = QuantizationModifier(targets="Linear", scheme="NVFP4", ignore=["lm_head"])
oneshot(model=model, dataset=ds, recipe=recipe, max_seq_length=2048,
        num_calibration_samples=8)          # --> TraceError / vmap-proxy error
```

**Errors**
```
ValueError: vmap(wrapped, in_dims=(0, None, None, None), ...)(<inputs>):
Got in_dim=0 for an input but the input is of type
<class 'transformers.utils.fx.HFProxy'>. We cannot vmap over non-Tensor arguments
```

**Additional context**



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

nvfp4 w4a4 doesn't work on Qwen/Qwen3-235b-A22B #1624

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

package	version
llm-compressor	0.6.0 (latest PyPI)
transformers	4.43.0
torch	2.3.1 + CUDA 12.1
python	3.11
GPU	A100 / H100 (same on both)

nvfp4 w4a4 doesn't work on Qwen/Qwen3-235b-A22B #1624

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions