Skip to content

Commit 047b657

Browse files
authored
Update GPTQ example's documentation (#1457)
Signed-off-by: YIYANGCAI <yiyang.cai@intel.com>
1 parent d154c5d commit 047b657

File tree

2 files changed

+78
-14
lines changed

2 files changed

+78
-14
lines changed

examples/pytorch/nlp/huggingface_models/language-modeling/quantization/llm/README.md

Lines changed: 78 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -31,6 +31,7 @@ python run_clm_no_trainer.py \
3131
--alpha 1.0 \
3232
--output_dir "saved_results" \
3333
--ipex
34+
3435
```
3536

3637
**Notes**: Smooth quantization here is based on torch.jit. Without past key value in example_inputs, the quantized model cannot be used for text-generation. For text-generation task, please go to [link](https://github.com/intel/intel-extension-for-transformers/tree/main/examples/huggingface/pytorch/text-generation/quantization)
@@ -47,8 +48,23 @@ python run_clm_no_trainer.py \
4748
--woq_algo RTN \
4849
--woq_enable_mse_search \
4950
--output_dir "saved_results"
51+
52+
# "--woq_algo GPTQ" is used to enable GPTQ algorithms
53+
python run_clm_no_trainer.py \
54+
--model EleutherAI/gpt-j-6B \
55+
--dataset NeelNanda/pile-10k \
56+
--seed 0 \
57+
--quantize \
58+
--approach weight_only \
59+
--woq_algo GPTQ \
60+
--woq_bits 4 \
61+
--woq_scheme asym \
62+
--woq_group_size 128 \
63+
--gptq_pad_max_length 2048 \
64+
--gptq_use_max_length \
65+
--gptq_debug
5066
```
51-
**Notes**: Weight-only quantization based on fake quantization is previewly supported and supports RTN, GPTQ[1], AWQ[2], TEQ algorithms. For more details, please refer to [link](https://github.com/intel/neural-compressor/blob/master/docs/source/quantization_weight_only.md)
67+
**Notes**: Weight-only quantization based on fake quantization is previewly supported and supports RTN, GPTQ[1], AWQ[2], TEQ algorithms. For more details, please refer to [link](https://github.com/intel/neural-compressor/blob/master/docs/source/quantization_weight_only.md). Our GPTQ API support various CLMs including GPTJ, OPTs, Blooms, Llamas, Falcons, MPTs, ChatGLMs, etc. Simply replace the "--model" argument with other models to quantize different CLMs with GPTQ.
5268

5369

5470
#### Accuracy with lm_eval
@@ -79,6 +95,21 @@ python run_clm_no_trainer.py \
7995
--ipex \
8096
--output_dir "saved_results" \
8197
--int8_bf16_mixed
98+
99+
# "--woq_algo GPTQ" is used to enable GPTQ algorithms
100+
python run_clm_no_trainer.py \
101+
--model facebook/opt-1.3b \
102+
--dataset NeelNanda/pile-10k \
103+
--seed 0 \
104+
--quantize \
105+
--approach weight_only \
106+
--woq_algo GPTQ \
107+
--woq_bits 4 \
108+
--woq_scheme asym \
109+
--woq_group_size 128 \
110+
--gptq_pad_max_length 2048 \
111+
--gptq_use_max_length \
112+
--gptq_debug
82113
```
83114

84115
#### Accuracy with lm_eval
@@ -108,6 +139,21 @@ python run_clm_no_trainer.py \
108139
--ipex \
109140
--output_dir "saved_results" \
110141
--int8_bf16_mixed
142+
143+
# "--woq_algo GPTQ" is used to enable GPTQ algorithms
144+
python run_clm_no_trainer.py \
145+
--model meta-llama/Llama-2-7b-hf \
146+
--dataset NeelNanda/pile-10k \
147+
--seed 0 \
148+
--quantize \
149+
--approach weight_only \
150+
--woq_algo GPTQ \
151+
--woq_bits 4 \
152+
--woq_scheme asym \
153+
--woq_group_size 128 \
154+
--gptq_pad_max_length 2048 \
155+
--gptq_use_max_length \
156+
--gptq_debug
111157
```
112158

113159
#### Accuracy with lm_eval
@@ -134,6 +180,21 @@ python run_clm_no_trainer.py \
134180
--sq \
135181
--alpha 0.5 \
136182
--output_dir "saved_results"
183+
184+
# "--woq_algo GPTQ" is used to enable GPTQ algorithms
185+
python run_clm_no_trainer.py \
186+
--model bigscience/bloom-560m \
187+
--dataset NeelNanda/pile-10k \
188+
--seed 0 \
189+
--quantize \
190+
--approach weight_only \
191+
--woq_algo GPTQ \
192+
--woq_bits 4 \
193+
--woq_scheme asym \
194+
--woq_group_size 128 \
195+
--gptq_pad_max_length 2048 \
196+
--gptq_use_max_length \
197+
--gptq_debug
137198
```
138199
#### Accuracy with lm_eval
139200
```bash
@@ -149,6 +210,7 @@ python run_clm_no_trainer.py \
149210
```
150211

151212
### Falcon-7b
213+
#### Quantization
152214
```bash
153215
# "--sq" is used to enable smooth quant
154216
python run_clm_no_trainer.py \
@@ -157,6 +219,21 @@ python run_clm_no_trainer.py \
157219
--sq \
158220
--alpha 0.5 \
159221
--output_dir "saved_results"
222+
223+
# "--woq_algo GPTQ" is used to enable GPTQ algorithms
224+
python run_clm_no_trainer.py \
225+
--model tiiuae/falcon-7b-instruct \
226+
--dataset NeelNanda/pile-10k \
227+
--seed 0 \
228+
--quantize \
229+
--approach weight_only \
230+
--woq_algo GPTQ \
231+
--woq_bits 4 \
232+
--woq_scheme asym \
233+
--woq_group_size 128 \
234+
--gptq_pad_max_length 2048 \
235+
--gptq_use_max_length \
236+
--gptq_debug
160237
```
161238
#### Accuracy with lm_eval
162239
```bash

examples/pytorch/nlp/huggingface_models/language-modeling/quantization/llm/run-gptq-llm.sh

Lines changed: 0 additions & 13 deletions
This file was deleted.

0 commit comments

Comments
 (0)