Skip to content

Intel Neural Compressor Release 3.3

Compare
Choose a tag to compare
@thuang6 thuang6 released this 04 Mar 08:55
· 92 commits to master since this release
679def0
  • Highlights
  • Features
  • Improvements
  • Bug Fixes
  • Validated Hardware
  • Validated Configurations

Highlights

  • Aligned Gaudi SW Release 1.20 with the improvements on FP8 and INT4 quantization for Intel® Gaudi® AI accelerator
  • VLM INT4 weight-only quantization support in transformers-like API on Intel CPU/GPU

Features

  • Saving vLLM compatible FP8 model on Gaudi
  • FP8 Per-channel Q/DQ and GC integration on Gaudi
  • FP8 quantization for mixture of experts (MoE) module on Gaudi
  • Saving Hugging Face compatible weight-only INT4 format on Gaudi
  • VLM quantization with AutoRound in transformers-like API on Intel CPU/GPU
  • Accuracy-aware tuning on PT2E including mixed precision support

Improvements

  • FP8 multi-device (Gaudi & GPU) infrastructure support
  • Support scaler scale save on Gaudi

Bug Fixes

  • Fix incorrect hf_device_map setting for Transformers-like API
  • Fix missing IPEX CPU dependency in Transformers-like API example
  • Fix device mapping issue found in GPTQ on Llama model
  • Fix saving issue in weight-only per-channel quantization

Validated Hardware

  • Intel Gaudi Al Accelerators (Gaudi 2 and 3)
  • Intel Xeon Scalable processor (4th, 5th, 6th Gen)
  • Intel Core Ultra Processors (Series 1 and 2)
  • Intel Data Center GPU Max Series (1100)
  • Intel® Arc™ B-Series Graphics GPU (B580)

Validated Configurations

  • Centos 8.4 & Ubuntu 24.04 & Win 11
  • Python 3.9, 3.10, 3.11, 3.12
  • PyTorch/IPEX 2.3, 2.4, 2.5