Release v0.6.0 · vllm-project/llm-compressor

What's Changed

[Experimental] Mistral-format FP8 quantization by @mgoin in #1359
[Examples] [Bugfix] skip sparsity stats when saving checkpoints by @kylesayrs in #1528
[Examples] [Bugfix] Fix debug message by @kylesayrs in #1529
[Tests][NVFP4] No longer skip NVFP4A16 e2e test by @dsikka in #1538
[AWQ] Support for Calibration Datasets of varying feature dimension by @brian-dellabetta in #1536
fix qwen 2.5 VL multimodal example by @brian-dellabetta in #1541
[Example] [Bugfix] Fix Gemma ignore list by @kylesayrs in #1531
[Tests][NVFP4] Add e2e nvfp4 test by @dsikka in #1543
[Examples] Use more robust splits by @kylesayrs in #1544
[Bugfix] [Autowrapper] Fix visit_Delete by @kylesayrs in #1532
[Example] Fix Qwen VL ignore list by @arunmadhusud in #1545
[Tests] Fix Qwen2.5-VL-7B-Instruct Recipe by @dsikka in #1548
[Bugfix] Fix gemma2 generation by @kylesayrs in #1552
fix skipif check on tests involving gated HF models by @brian-dellabetta in #1553
[NVFP4] Fix global scale update when dealing with offloaded layers by @dsikka in #1554
oneshot entrypoint update by @ved1beta in #1445
LM Eval tests -- ignore vision tower for VL fp8 test by @brian-dellabetta in #1562
[Performance] Sequential onloading by @kylesayrs in #1263
[BugFix] Explicitly set gpu_memory_utilization by @rahul-tuli in #1560
Add Axolotl blog link by @rahul-tuli in #1563
[Bugfix] Fix multigpu dispatch_for_generation by @kylesayrs in #1567
[Testing] Set VLLM_WORKER_MULTIPROC_METHOD for e2e testing by @dsikka in #1569
[BugFix] Fix quantizaiton_2of4_sparse_w4a16 example by @shanjiaz in #1565
[Pipelines] infer model device with optional override by @kylesayrs in #1572
bump up requirement for compressed-tensors to 0.10.2 by @dhuangnm in #1581

New Contributors

@arunmadhusud made their first contribution in #1545

Full Changelog: 0.5.2...0.6.0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

v0.6.0

What's Changed

New Contributors

Contributors

Uh oh!