Skip to content
Merged
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
41 changes: 34 additions & 7 deletions docs/source/3x/PT_MXQuant.md
Original file line number Diff line number Diff line change
Expand Up @@ -83,19 +83,46 @@ The exponent (exp) is equal to torch.floor(torch.log2(amax)), MAX is the represe

## Get Started with Microscaling Quantization API

To get a model quantized with Microscaling Data Types, users can use the Microscaling Quantization API as follows.
To get a model quantized with Microscaling Data Types, users can use the AutoRound Quantization API as follows.

```python
from neural_compressor.torch.quantization import MXQuantConfig, prepare, convert

quant_config = MXQuantConfig(w_dtype=args.w_dtype, act_dtype=args.act_dtype, weight_only=args.woq)
user_model = prepare(model=user_model, quant_config=quant_config)
user_model = convert(model=user_model)
from neural_compressor.torch.quantization import AutoRoundConfig, prepare, convert
from transformers import AutoModelForCausalLM, AutoTokenizer

fp32_model = AutoModelForCausalLM.from_pretrained(
"facebook/opt-125m",
device_map="auto",
)
tokenizer = AutoTokenizer.from_pretrained("facebook/opt-125m", trust_remote_code=True)
output_dir = "./saved_inc"

# quantization configuration
quant_config = AutoRoundConfig(
tokenizer=tokenizer,
nsamples=32,
seqlen=32,
iters=20,
scheme="MXFP4", # MXFP4, MXFP8
export_format="auto_round",
output_dir=output_dir, # default is "temp_auto_round"
)

# quantize the model and save to output_dir
model = prepare(model=fp32_model, quant_config=quant_config)
model = convert(model)

# loading
model = AutoModelForCausalLM.from_pretrained(output_dir, torch_dtype="auto", device_map="auto")

# inference
text = "There is a girl who likes adventure,"
inputs = tokenizer(text, return_tensors="pt").to(model.device)
print(tokenizer.decode(model.generate(**inputs, max_new_tokens=10)[0]))
```

## Examples

- PyTorch [huggingface models](/examples/pytorch/nlp/huggingface_models/language-modeling/quantization/mx_quant)
- PyTorch [huggingface models](/examples/pytorch/multimodal-modeling/quantization/auto_round/llama4)


## Reference
Expand Down