Skip to content

Commit 2809ff7

Browse files
authored
Highlight WOQ support on Gaudi2 in Readme (#1680)
Signed-off-by: Huang, Tai <tai.huang@intel.com>
1 parent b2d607f commit 2809ff7

File tree

1 file changed

+48
-7
lines changed

1 file changed

+48
-7
lines changed

README.md

Lines changed: 48 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -25,6 +25,9 @@ In particular, the tool provides the key features, typical examples, and open co
2525

2626
* Collaborate with cloud marketplaces such as [Google Cloud Platform](https://console.cloud.google.com/marketplace/product/bitnami-launchpad/inc-tensorflow-intel?project=verdant-sensor-286207), [Amazon Web Services](https://aws.amazon.com/marketplace/pp/prodview-yjyh2xmggbmga#pdp-support), and [Azure](https://azuremarketplace.microsoft.com/en-us/marketplace/apps/bitnami.inc-tensorflow-intel), software platforms such as [Alibaba Cloud](https://www.intel.com/content/www/us/en/developer/articles/technical/quantize-ai-by-oneapi-analytics-on-alibaba-cloud.html), [Tencent TACO](https://new.qq.com/rain/a/20221202A00B9S00) and [Microsoft Olive](https://github.com/microsoft/Olive), and open AI ecosystem such as [Hugging Face](https://huggingface.co/blog/intel), [PyTorch](https://pytorch.org/tutorials/recipes/intel_neural_compressor_for_pytorch.html), [ONNX](https://github.com/onnx/models#models), [ONNX Runtime](https://github.com/microsoft/onnxruntime), and [Lightning AI](https://github.com/Lightning-AI/lightning/blob/master/docs/source-pytorch/advanced/post_training_quantization.rst)
2727

28+
## What's New
29+
* [2024/03] A new SOTA approach [AutoRound](https://github.com/intel/auto-round) Weight-Only Quantization on [Intel Gaudi2 AI accelerator](https://habana.ai/products/gaudi2/) is available for LLMs.
30+
2831
## Installation
2932

3033
### Install from pypi
@@ -36,23 +39,61 @@ pip install neural-compressor
3639
3740
## Getting Started
3841

39-
Setting up the environment:
42+
Setting up the environment:
4043
```bash
4144
pip install "neural-compressor>=2.3" "transformers>=4.34.0" torch torchvision
4245
```
4346
After successfully installing these packages, try your first quantization program.
4447

45-
### Weight-Only Quantization (LLMs)
48+
### Weight-Only Quantization (LLMs)
49+
Following example code demonstrates Weight-Only Quantization on LLMs, it supports Intel CPU, Intel Gauid2 AI Accelerator, Nvidia GPU, best device will be selected automatically.
50+
51+
To try on Intel Gaudi2, docker image with Gaudi Software Stack is recommended, please refer to following script for environment setup. More details can be found in [Gaudi Guide](https://docs.habana.ai/en/latest/Installation_Guide/Bare_Metal_Fresh_OS.html#launch-docker-image-that-was-built).
52+
```bash
53+
docker run -it --runtime=habana -e HABANA_VISIBLE_DEVICES=all -e OMPI_MCA_btl_vader_single_copy_mechanism=none --cap-add=sys_nice --net=host --ipc=host vault.habana.ai/gaudi-docker/1.14.0/ubuntu22.04//habanalabs/pytorch-installer-2.1.1:latest
54+
55+
# Check the container ID
56+
docker ps
57+
58+
# Login into container
59+
docker exec -it <container_id> bash
60+
61+
# Install the optimum-habana
62+
pip install --upgrade-strategy eager optimum[habana]
63+
64+
# Install INC/auto_round
65+
pip install neural-compressor auto_round
66+
```
67+
Run the example:
4668
```python
47-
from transformers import AutoModel
69+
from transformers import AutoModel, AutoTokenizer
4870

4971
from neural_compressor.config import PostTrainingQuantConfig
5072
from neural_compressor.quantization import fit
51-
52-
float_model = AutoModel.from_pretrained("mistralai/Mistral-7B-v0.1")
53-
woq_conf = PostTrainingQuantConfig(approach="weight_only")
54-
quantized_model = fit(model=float_model, conf=woq_conf)
73+
from neural_compressor.adaptor.torch_utils.auto_round import get_dataloader
74+
75+
model_name = "EleutherAI/gpt-neo-125m"
76+
float_model = AutoModel.from_pretrained(model_name)
77+
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
78+
dataloader = get_dataloader(tokenizer, seqlen=2048)
79+
80+
woq_conf = PostTrainingQuantConfig(
81+
approach="weight_only",
82+
op_type_dict={
83+
".*": { # match all ops
84+
"weight": {
85+
"dtype": "int",
86+
"bits": 4,
87+
"algorithm": "AUTOROUND",
88+
},
89+
}
90+
},
91+
)
92+
quantized_model = fit(model=float_model, conf=woq_conf, calib_dataloader=dataloader)
5593
```
94+
**Note:**
95+
96+
To try INT4 model inference, please directly use [Intel Extension for Transformers](https://github.com/intel/intel-extension-for-transformers), which leverages Intel Neural Compressor for model quantization.
5697

5798
### Static Quantization (Non-LLMs)
5899

0 commit comments

Comments
 (0)