|
1 |
| -LLMs Quantization Recipes |
2 |
| ---- |
| 1 | +## LLMs Quantization Recipes |
3 | 2 |
|
4 |
| -Intel® Neural Compressor supported advanced large language models (LLMs) quantization technologies including SmoothQuant (SQ) and Weight-Only Quant (WOQ), |
5 |
| -and verified a list of LLMs on 4th Gen Intel® Xeon® Scalable Processor (codenamed Sapphire Rapids) with [PyTorch](https://pytorch.org/), |
6 |
| -[Intel® Extension for PyTorch](https://github.com/intel/intel-extension-for-pytorch) and [Intel® Extension for Transformers](https://github.com/intel/intel-extension-for-transformers). |
| 3 | +Intel® Neural Compressor supported advanced large language models (LLMs) quantization technologies including SmoothQuant (SQ) and Weight-Only Quant (WOQ), |
| 4 | +and verified a list of LLMs on 4th Gen Intel® Xeon® Scalable Processor (codenamed Sapphire Rapids) with [PyTorch](https://pytorch.org/), |
| 5 | +[Intel® Extension for PyTorch](https://github.com/intel/intel-extension-for-pytorch) and [Intel® Extension for Transformers](https://github.com/intel/intel-extension-for-transformers). |
7 | 6 | This document aims to publish the specific recipes we achieved for the popular LLMs and help users to quickly get an optimized LLM with limited 1% accuracy loss.
|
8 | 7 |
|
9 |
| -> Notes: |
10 |
| -> - The quantization algorithms provide by [Intel® Neural Compressor](https://github.com/intel/neural-compressor) and the evaluate functions provide by [Intel® Extension for Transformers](https://github.com/intel/intel-extension-for-transformers). |
11 |
| -> - The model list are continuing update, please expect to find more LLMs in the future. |
| 8 | +> Notes: |
| 9 | +> |
| 10 | +> - The quantization algorithms provide by [Intel® Neural Compressor](https://github.com/intel/neural-compressor) and the evaluate functions provide by [Intel® Extension for Transformers](https://github.com/intel/intel-extension-for-transformers). |
| 11 | +> - The model list are continuing update, please expect to find more LLMs in the future. |
12 | 12 |
|
13 | 13 | ## IPEX key models
|
14 |
| -| Models | SQ INT8 | WOQ INT8 | WOQ INT4 | |
15 |
| -|:-------------------------:|:-------:|:--------:|:--------:| |
16 |
| -| EleutherAI/gpt-j-6b | ✔ | ✔ | ✔ | |
17 |
| -| facebook/opt-1.3b | ✔ | ✔ | ✔ | |
18 |
| -| facebook/opt-30b | ✔ | ✔ | ✔ | |
19 |
| -| meta-llama/Llama-2-7b-hf | ✔ | ✔ | ✔ | |
20 |
| -| meta-llama/Llama-2-13b-hf | ✔ | ✔ | ✔ | |
21 |
| -| meta-llama/Llama-2-70b-hf | ✔ | ✔ | ✔ | |
22 |
| -| tiiuae/falcon-40b | ✔ | ✔ | ✔ | |
23 |
| - |
| 14 | + |
| 15 | +| Models | SQ INT8 | WOQ INT8 | WOQ INT4 | |
| 16 | +| :-----------------------------: | :-----: | :------: | :------: | |
| 17 | +| EleutherAI/gpt-j-6b | ✔ | ✔ | ✔ | |
| 18 | +| facebook/opt-1.3b | ✔ | ✔ | ✔ | |
| 19 | +| facebook/opt-30b | ✔ | ✔ | ✔ | |
| 20 | +| meta-llama/Llama-2-7b-hf | WIP | ✔ | ✔ | |
| 21 | +| meta-llama/Llama-2-13b-hf | ✔ | ✔ | ✔ | |
| 22 | +| meta-llama/Llama-2-70b-hf | ✔ | ✔ | ✔ | |
| 23 | +| tiiuae/falcon-7b | ✔ | ✔ | ✔ | |
| 24 | +| tiiuae/falcon-40b | ✔ | ✔ | ✔ | |
| 25 | +| baichuan-inc/Baichuan-13B-Chat | ✔ | ✔ | ✔ | |
| 26 | +| baichuan-inc/Baichuan2-13B-Chat | ✔ | ✔ | ✔ | |
| 27 | +| baichuan-inc/Baichuan2-7B-Chat | ✔ | ✔ | ✔ | |
| 28 | +| bigscience/bloom-1b7 | ✔ | ✔ | ✔ | |
| 29 | +| databricks/dolly-v2-12b | ✖ | ✔ | ✖ | |
| 30 | +| EleutherAI/gpt-neox-20b | ✖ | ✔ | ✖ | |
| 31 | +| mistralai/Mistral-7B-v0.1 | ✖ | ✔ | ✔ | |
| 32 | +| THUDM/chatglm2-6b | WIP | ✔ | WIP | |
| 33 | +| THUDM/chatglm3-6b | WIP | ✔ | WIP | |
| 34 | + |
24 | 35 | **Detail recipes can be found [HERE](https://github.com/intel/intel-extension-for-transformers/blob/main/examples/huggingface/pytorch/text-generation/quantization/llm_quantization_recipes.md).**
|
25 |
| -> Notes: |
26 |
| -> - This model list comes from [IPEX](https://intel.github.io/intel-extension-for-pytorch/cpu/latest/tutorials/llm.html). |
27 |
| -> - WOQ INT4 recipes will be published soon. |
| 36 | + |
| 37 | +> Notes: |
| 38 | +> |
| 39 | +> - This model list comes from [IPEX](https://intel.github.io/intel-extension-for-pytorch/cpu/latest/tutorials/llm.html). |
| 40 | +> - The WIP recipes will be published soon. |
0 commit comments