Skip to content

Commit cf847c1

Browse files
authored
Update README with xpu quantization steps (#1412)
Signed-off-by: Cheng, Zixuan <zixuan.cheng@intel.com>
1 parent a4a2f57 commit cf847c1

File tree

4 files changed

+73
-7
lines changed

4 files changed

+73
-7
lines changed

docs/source/quantization.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -466,7 +466,7 @@ Intel(R) Neural Compressor support multi-framework: PyTorch, Tensorflow, ONNX Ru
466466
<td align="left">IPEX</td>
467467
<td align="left">OneDNN</td>
468468
<td align="left">"ipex"</td>
469-
<td align="left">cpu</td>
469+
<td align="left">cpu | gpu</td>
470470
</tr>
471471
<tr>
472472
<td rowspan="5" align="left">ONNX Runtime</td>

examples/pytorch/nlp/huggingface_models/question-answering/quantization/ptq_static/ipex/README.md

Lines changed: 69 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -16,6 +16,8 @@ python -m pip install intel_extension_for_pytorch -f https://software.intel.com/
1616
> Note: Intel® Extension for PyTorch* has PyTorch version requirement. Please check more detailed information via the URL below.
1717
1818
# Quantization
19+
20+
## 1. Quantization with CPU
1921
If IPEX version is equal or higher than 1.12, please install transformers 4.19.0.
2022
```shell
2123
python run_qa.py
@@ -32,6 +34,72 @@ python run_qa.py
3234
>
3335
> /path/to/checkpoint/dir is the path to finetune output_dir
3436
37+
## 2. Quantization with XPU
38+
Please build an IPEX docker container with following steps. Please also refer to the [official guide](https://github.com/intel/intel-extension-for-pytorch/tree/xpu-master/docker).
39+
#### 2.1 Build Container and Environment Variables
40+
```bash
41+
wget https://raw.githubusercontent.com/intel/intel-extension-for-pytorch/xpu-master/docker/Dockerfile.xpu
42+
wget https://raw.githubusercontent.com/intel/intel-extension-for-pytorch/xpu-master/docker/build.sh
43+
./build.sh xpu-flex
44+
45+
export IMAGE_NAME=intel/intel-extension-for-pytorch:xpu-flex
46+
export VIDEO=$(getent group video | sed -E 's,^video:[^:]*:([^:]*):.*$,\1,')
47+
export RENDER=$(getent group render | sed -E 's,^render:[^:]*:([^:]*):.*$,\1,')
48+
test -z "$RENDER" || RENDER_GROUP="--group-add ${RENDER}"
49+
```
50+
51+
#### 2.2 Run Container
52+
```bash
53+
docker run --rm \
54+
-v .:/workspace \
55+
--group-add ${VIDEO} \
56+
${RENDER_GROUP} \
57+
--device=/dev/dri \
58+
--ipc=host \
59+
-e http_proxy=$http_proxy \
60+
-e https_proxy=$https_proxy \
61+
-e no_proxy=$no_proxy \
62+
-it $IMAGE_NAME bash
63+
```
64+
65+
#### 2.3 Environment Settings
66+
Please set basekit configurations as following:
67+
```bash
68+
bash l_BaseKit_p_2024.0.0.49261_offline.sh -a -s --eula accept --components intel.oneapi.lin.tbb.devel:intel.oneapi.lin.ccl.devel:intel.oneapi.lin.mkl.devel:intel.oneapi.lin.dpcpp-cpp-compiler --install-dir ${HOME}/intel/oneapi
69+
source ./20240921_xmainrel/env/vars.sh
70+
# source ${HOME}/intel/oneapi/compiler/latest/env/vars.sh
71+
source ${HOME}/intel/oneapi/mkl/latest/env/vars.sh
72+
source ${HOME}/intel/oneapi/tbb/latest/env/vars.sh
73+
export MKL_DPCPP_ROOT=${MKLROOT}
74+
export LD_LIBRARY_PATH=${MKL_DPCPP_ROOT}/lib:${MKL_DPCPP_ROOT}/lib64:${MKL_DPCPP_ROOT}/lib/intel64:${LD_LIBRARY_PATH}
75+
export LIBRARY_PATH=${MKL_DPCPP_ROOT}/lib:${MKL_DPCPP_ROOT}/lib64:${MKL_DPCPP_ROOT}/lib/intel64:$LIBRARY_PATH
76+
```
77+
Prebuilt wheel files are available for Python python 3.8, python 3.9, python 3.10, python 3.11.
78+
```bash
79+
conda install intel-extension-for-pytorch=2.0.110 pytorch=2.0.1 -c intel -c conda-forge
80+
```
81+
You can run a simple sanity test to double confirm if the correct version is installed, and if the software stack can get correct hardware information onboard your system. The command should return PyTorch and IPEX versions installed, as well as GPU card(s) information detected.
82+
```bash
83+
source {DPCPPROOT}/env/vars.sh
84+
source {MKLROOT}/env/vars.sh
85+
python -c "import torch; import intel_extension_for_pytorch as ipex; print(torch.__version__); print(ipex.__version__); [print(f'[{i}]: {torch.xpu.get_device_properties(i)}') for i in range(torch.xpu.device_count())];"
86+
```
87+
Please also refer to this [tutorial](https://intel.github.io/intel-extension-for-pytorch/index.html#installation?platform=gpu&version=v2.0.110%2Bxpu) to check system requirements and install dependencies.
88+
89+
90+
#### 2.4 Quantization Command
91+
```shell
92+
python run_qa.py
93+
--model_name_or_path bert-large-uncased-whole-word-masking-finetuned-squad \
94+
--dataset_name squad \
95+
--do_eval \
96+
--max_seq_length 384 \
97+
--doc_stride 128 \
98+
--xpu \
99+
--tune \
100+
--output_dir ./savedresult
101+
```
102+
35103
# Tutorial of How to Enable NLP Model with Intel® Neural Compressor.
36104
### Intel® Neural Compressor supports two usages:
37105

@@ -80,6 +148,4 @@ q_model = quantization.fit(model,
80148
calib_dataloader=eval_dataloader,
81149
eval_func=eval_func)
82150
q_model.save(training_args.output_dir)
83-
```
84-
85-
151+
```

examples/pytorch/nlp/huggingface_models/question-answering/quantization/ptq_static/ipex/run_qa.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -687,7 +687,7 @@ def eval_func(model):
687687
example_inputs = get_example_inputs(model, eval_dataloader)
688688
model = ipex.optimize(model)
689689
with torch.no_grad():
690-
model = torch.jit.trace(model, example_inputs=example_inputs, strict=False)
690+
model = torch.jit.trace(model, example_kwarg_inputs=example_inputs, strict=False)
691691
model = torch.jit.freeze(model)
692692

693693
if model_args.benchmark or model_args.accuracy_only:

examples/pytorch/nlp/huggingface_models/question-answering/quantization/ptq_static/ipex/run_quant.sh

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -44,7 +44,7 @@ function run_tuning {
4444
--dataset_name squad \
4545
--do_eval \
4646
--max_seq_length 384 \
47-
--no_cuda \ # remove if using xpu
47+
--no_cuda \
4848
--tune \
4949
--output_dir $tuned_checkpoint
5050
fi
@@ -55,7 +55,7 @@ function run_tuning {
5555
--dataset_name squad \
5656
--do_eval \
5757
--max_seq_length 384 \
58-
--no_cuda \ # remove if using xpu
58+
--no_cuda \
5959
--tune \
6060
--output_dir $tuned_checkpoint
6161
fi

0 commit comments

Comments
 (0)