Skip to content

Commit 24419c9

Browse files
authored
update LLM recipes (#1692)
Signed-off-by: Sun, Xuehao <xuehao.sun@intel.com>
1 parent f1fb63c commit 24419c9

File tree

1 file changed

+34
-21
lines changed

1 file changed

+34
-21
lines changed

docs/source/llm_recipes.md

Lines changed: 34 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -1,27 +1,40 @@
1-
LLMs Quantization Recipes
2-
---
1+
## LLMs Quantization Recipes
32

4-
Intel® Neural Compressor supported advanced large language models (LLMs) quantization technologies including SmoothQuant (SQ) and Weight-Only Quant (WOQ),
5-
and verified a list of LLMs on 4th Gen Intel® Xeon® Scalable Processor (codenamed Sapphire Rapids) with [PyTorch](https://pytorch.org/),
6-
[Intel® Extension for PyTorch](https://github.com/intel/intel-extension-for-pytorch) and [Intel® Extension for Transformers](https://github.com/intel/intel-extension-for-transformers).
3+
Intel® Neural Compressor supported advanced large language models (LLMs) quantization technologies including SmoothQuant (SQ) and Weight-Only Quant (WOQ),
4+
and verified a list of LLMs on 4th Gen Intel® Xeon® Scalable Processor (codenamed Sapphire Rapids) with [PyTorch](https://pytorch.org/),
5+
[Intel® Extension for PyTorch](https://github.com/intel/intel-extension-for-pytorch) and [Intel® Extension for Transformers](https://github.com/intel/intel-extension-for-transformers).
76
This document aims to publish the specific recipes we achieved for the popular LLMs and help users to quickly get an optimized LLM with limited 1% accuracy loss.
87

9-
> Notes:
10-
> - The quantization algorithms provide by [Intel® Neural Compressor](https://github.com/intel/neural-compressor) and the evaluate functions provide by [Intel® Extension for Transformers](https://github.com/intel/intel-extension-for-transformers).
11-
> - The model list are continuing update, please expect to find more LLMs in the future.
8+
> Notes:
9+
>
10+
> - The quantization algorithms provide by [Intel® Neural Compressor](https://github.com/intel/neural-compressor) and the evaluate functions provide by [Intel® Extension for Transformers](https://github.com/intel/intel-extension-for-transformers).
11+
> - The model list are continuing update, please expect to find more LLMs in the future.
1212
1313
## IPEX key models
14-
| Models | SQ INT8 | WOQ INT8 | WOQ INT4 |
15-
|:-------------------------:|:-------:|:--------:|:--------:|
16-
| EleutherAI/gpt-j-6b ||||
17-
| facebook/opt-1.3b ||||
18-
| facebook/opt-30b ||||
19-
| meta-llama/Llama-2-7b-hf ||||
20-
| meta-llama/Llama-2-13b-hf ||||
21-
| meta-llama/Llama-2-70b-hf ||||
22-
| tiiuae/falcon-40b ||||
23-
14+
15+
| Models | SQ INT8 | WOQ INT8 | WOQ INT4 |
16+
| :-----------------------------: | :-----: | :------: | :------: |
17+
| EleutherAI/gpt-j-6b ||||
18+
| facebook/opt-1.3b ||||
19+
| facebook/opt-30b ||||
20+
| meta-llama/Llama-2-7b-hf | WIP |||
21+
| meta-llama/Llama-2-13b-hf ||||
22+
| meta-llama/Llama-2-70b-hf ||||
23+
| tiiuae/falcon-7b ||||
24+
| tiiuae/falcon-40b ||||
25+
| baichuan-inc/Baichuan-13B-Chat ||||
26+
| baichuan-inc/Baichuan2-13B-Chat ||||
27+
| baichuan-inc/Baichuan2-7B-Chat ||||
28+
| bigscience/bloom-1b7 ||||
29+
| databricks/dolly-v2-12b ||||
30+
| EleutherAI/gpt-neox-20b ||||
31+
| mistralai/Mistral-7B-v0.1 ||||
32+
| THUDM/chatglm2-6b | WIP || WIP |
33+
| THUDM/chatglm3-6b | WIP || WIP |
34+
2435
**Detail recipes can be found [HERE](https://github.com/intel/intel-extension-for-transformers/blob/main/examples/huggingface/pytorch/text-generation/quantization/llm_quantization_recipes.md).**
25-
> Notes:
26-
> - This model list comes from [IPEX](https://intel.github.io/intel-extension-for-pytorch/cpu/latest/tutorials/llm.html).
27-
> - WOQ INT4 recipes will be published soon.
36+
37+
> Notes:
38+
>
39+
> - This model list comes from [IPEX](https://intel.github.io/intel-extension-for-pytorch/cpu/latest/tutorials/llm.html).
40+
> - The WIP recipes will be published soon.

0 commit comments

Comments
 (0)