You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: examples/pytorch/nlp/huggingface_models/language-modeling/quantization/llm/README.md
+78-1Lines changed: 78 additions & 1 deletion
Original file line number
Diff line number
Diff line change
@@ -31,6 +31,7 @@ python run_clm_no_trainer.py \
31
31
--alpha 1.0 \
32
32
--output_dir "saved_results" \
33
33
--ipex
34
+
34
35
```
35
36
36
37
**Notes**: Smooth quantization here is based on torch.jit. Without past key value in example_inputs, the quantized model cannot be used for text-generation. For text-generation task, please go to [link](https://github.com/intel/intel-extension-for-transformers/tree/main/examples/huggingface/pytorch/text-generation/quantization)
@@ -47,8 +48,23 @@ python run_clm_no_trainer.py \
47
48
--woq_algo RTN \
48
49
--woq_enable_mse_search \
49
50
--output_dir "saved_results"
51
+
52
+
# "--woq_algo GPTQ" is used to enable GPTQ algorithms
53
+
python run_clm_no_trainer.py \
54
+
--model EleutherAI/gpt-j-6B \
55
+
--dataset NeelNanda/pile-10k \
56
+
--seed 0 \
57
+
--quantize \
58
+
--approach weight_only \
59
+
--woq_algo GPTQ \
60
+
--woq_bits 4 \
61
+
--woq_scheme asym \
62
+
--woq_group_size 128 \
63
+
--gptq_pad_max_length 2048 \
64
+
--gptq_use_max_length \
65
+
--gptq_debug
50
66
```
51
-
**Notes**: Weight-only quantization based on fake quantization is previewly supported and supports RTN, GPTQ[1], AWQ[2], TEQ algorithms. For more details, please refer to [link](https://github.com/intel/neural-compressor/blob/master/docs/source/quantization_weight_only.md)
67
+
**Notes**: Weight-only quantization based on fake quantization is previewly supported and supports RTN, GPTQ[1], AWQ[2], TEQ algorithms. For more details, please refer to [link](https://github.com/intel/neural-compressor/blob/master/docs/source/quantization_weight_only.md). Our GPTQ API support various CLMs including GPTJ, OPTs, Blooms, Llamas, Falcons, MPTs, ChatGLMs, etc. Simply replace the "--model" argument with other models to quantize different CLMs with GPTQ.
52
68
53
69
54
70
#### Accuracy with lm_eval
@@ -79,6 +95,21 @@ python run_clm_no_trainer.py \
79
95
--ipex \
80
96
--output_dir "saved_results" \
81
97
--int8_bf16_mixed
98
+
99
+
# "--woq_algo GPTQ" is used to enable GPTQ algorithms
0 commit comments