Intel Neural Compressor Release 3.3
- Highlights
- Features
- Improvements
- Bug Fixes
- Validated Hardware
- Validated Configurations
Highlights
- Aligned Gaudi SW Release 1.20 with the improvements on FP8 and INT4 quantization for Intel® Gaudi® AI accelerator
- VLM INT4 weight-only quantization support in transformers-like API on Intel CPU/GPU
Features
- Saving vLLM compatible FP8 model on Gaudi
- FP8 Per-channel Q/DQ and GC integration on Gaudi
- FP8 quantization for mixture of experts (MoE) module on Gaudi
- Saving Hugging Face compatible weight-only INT4 format on Gaudi
- VLM quantization with AutoRound in transformers-like API on Intel CPU/GPU
- Accuracy-aware tuning on PT2E including mixed precision support
Improvements
- FP8 multi-device (Gaudi & GPU) infrastructure support
- Support scaler scale save on Gaudi
Bug Fixes
- Fix incorrect hf_device_map setting for Transformers-like API
- Fix missing IPEX CPU dependency in Transformers-like API example
- Fix device mapping issue found in GPTQ on Llama model
- Fix saving issue in weight-only per-channel quantization
Validated Hardware
- Intel Gaudi Al Accelerators (Gaudi 2 and 3)
- Intel Xeon Scalable processor (4th, 5th, 6th Gen)
- Intel Core Ultra Processors (Series 1 and 2)
- Intel Data Center GPU Max Series (1100)
- Intel® Arc™ B-Series Graphics GPU (B580)
Validated Configurations
- Centos 8.4 & Ubuntu 24.04 & Win 11
- Python 3.9, 3.10, 3.11, 3.12
- PyTorch/IPEX 2.3, 2.4, 2.5