Intel Neural Compressor Release 3.3

thuang6 released this 04 Mar 08:55

· 92 commits to master since this release

679def0

Highlights
Features
Improvements
Bug Fixes
Validated Hardware
Validated Configurations

Highlights

Aligned Gaudi SW Release 1.20 with the improvements on FP8 and INT4 quantization for Intel® Gaudi® AI accelerator
VLM INT4 weight-only quantization support in transformers-like API on Intel CPU/GPU

Features

Saving vLLM compatible FP8 model on Gaudi
FP8 Per-channel Q/DQ and GC integration on Gaudi
FP8 quantization for mixture of experts (MoE) module on Gaudi
Saving Hugging Face compatible weight-only INT4 format on Gaudi
VLM quantization with AutoRound in transformers-like API on Intel CPU/GPU
Accuracy-aware tuning on PT2E including mixed precision support

Improvements

FP8 multi-device (Gaudi & GPU) infrastructure support
Support scaler scale save on Gaudi

Bug Fixes

Fix incorrect hf_device_map setting for Transformers-like API
Fix missing IPEX CPU dependency in Transformers-like API example
Fix device mapping issue found in GPTQ on Llama model
Fix saving issue in weight-only per-channel quantization

Validated Hardware 

Intel Gaudi Al Accelerators (Gaudi 2 and 3)
Intel Xeon Scalable processor (4th, 5th, 6th Gen)
Intel Core Ultra Processors (Series 1 and 2)
Intel Data Center GPU Max Series (1100)
Intel® Arc™ B-Series Graphics GPU (B580)

Validated Configurations

Centos 8.4 & Ubuntu 24.04 & Win 11
Python 3.9, 3.10, 3.11, 3.12
PyTorch/IPEX 2.3, 2.4, 2.5

Assets 2