What's Changed
- [Experimental] Mistral-format FP8 quantization by @mgoin in #1359
- [Examples] [Bugfix] skip sparsity stats when saving checkpoints by @kylesayrs in #1528
- [Examples] [Bugfix] Fix debug message by @kylesayrs in #1529
- [Tests][NVFP4] No longer skip NVFP4A16 e2e test by @dsikka in #1538
- [AWQ] Support for Calibration Datasets of varying feature dimension by @brian-dellabetta in #1536
- fix qwen 2.5 VL multimodal example by @brian-dellabetta in #1541
- [Example] [Bugfix] Fix Gemma ignore list by @kylesayrs in #1531
- [Tests][NVFP4] Add e2e nvfp4 test by @dsikka in #1543
- [Examples] Use more robust splits by @kylesayrs in #1544
- [Bugfix] [Autowrapper] Fix visit_Delete by @kylesayrs in #1532
- [Example] Fix Qwen VL ignore list by @arunmadhusud in #1545
- [Tests] Fix
Qwen2.5-VL-7B-Instruct
Recipe by @dsikka in #1548 - [Bugfix] Fix gemma2 generation by @kylesayrs in #1552
- fix skipif check on tests involving gated HF models by @brian-dellabetta in #1553
- [NVFP4] Fix global scale update when dealing with offloaded layers by @dsikka in #1554
- oneshot entrypoint update by @ved1beta in #1445
- LM Eval tests -- ignore vision tower for VL fp8 test by @brian-dellabetta in #1562
- [Performance] Sequential onloading by @kylesayrs in #1263
- [BugFix] Explicitly set gpu_memory_utilization by @rahul-tuli in #1560
- Add Axolotl blog link by @rahul-tuli in #1563
- [Bugfix] Fix multigpu
dispatch_for_generation
by @kylesayrs in #1567 - [Testing] Set
VLLM_WORKER_MULTIPROC_METHOD
for e2e testing by @dsikka in #1569 - [BugFix] Fix quantizaiton_2of4_sparse_w4a16 example by @shanjiaz in #1565
- [Pipelines] infer model device with optional override by @kylesayrs in #1572
- bump up requirement for compressed-tensors to 0.10.2 by @dhuangnm in #1581
New Contributors
- @arunmadhusud made their first contribution in #1545
Full Changelog: 0.5.2...0.6.0