Skip to content

Commit 67fac4b

Browse files
phymbertggerganov
andauthored
docs : how to add a model (ggml-org#6565)
* docs: how to add a model * docs: model: typo and docs * docs: model: add prevision on RoPE * docs: model: rephrasing README.md * docs: model: rephrasing README.md * docs: model: README.md fix trailing spaces * docs : some fixes * Update README.md --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
1 parent 29122d3 commit 67fac4b

File tree

2 files changed

+119
-0
lines changed

2 files changed

+119
-0
lines changed

README.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -122,6 +122,8 @@ Typically finetunes of the base models below are supported as well.
122122
- [x] [SEA-LION](https://huggingface.co/models?search=sea-lion)
123123
- [x] [GritLM-7B](https://huggingface.co/GritLM/GritLM-7B) + [GritLM-8x7B](https://huggingface.co/GritLM/GritLM-8x7B)
124124

125+
(instructions for supporting more models: [HOWTO-add-model.md](./docs/HOWTO-add-model.md))
126+
125127
**Multimodal models:**
126128

127129
- [x] [LLaVA 1.5 models](https://huggingface.co/collections/liuhaotian/llava-15-653aac15d994e992e2677a7e), [LLaVA 1.6 models](https://huggingface.co/collections/liuhaotian/llava-16-65b9e40155f60fd046a5ccf2)

docs/HOWTO-add-model.md

Lines changed: 117 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,117 @@
1+
## Add a new model architecture to `llama.cpp`
2+
3+
Adding a model requires few steps:
4+
5+
1. Convert the model to GGUF
6+
2. Define the model architecture in `llama.cpp`
7+
3. Build the GGML graph implementation
8+
9+
After following these steps, you can open PR.
10+
11+
Also, it is important to check that the examples and main ggml backends (CUDA, METAL, CPU) are working with the new architecture, especially:
12+
- [main](../examples/main)
13+
- [imatrix](../examples/imatrix)
14+
- [quantize](../examples/quantize)
15+
- [server](../examples/server)
16+
17+
### 1. Convert the model to GGUF
18+
19+
This step is done in python with a `convert` script using the [gguf](https://pypi.org/project/gguf/) library.
20+
Depending on the model architecture, you can use either [convert.py](../convert.py) or [convert-hf-to-gguf.py](../convert-hf-to-gguf.py).
21+
22+
The convert script reads the model configuration, tokenizer, tensor names+data and converts them to GGUF metadata and tensors.
23+
24+
The required steps to implement for an HF model are:
25+
26+
1. Define the model `Model.register` annotation in a new `Model` subclass, example:
27+
28+
```python
29+
@Model.register("MyModelForCausalLM")
30+
class MyModel(Model):
31+
model_arch = gguf.MODEL_ARCH.GROK
32+
```
33+
34+
2. Define the layout of the GGUF tensors in [constants.py](../gguf-py/gguf/constants.py)
35+
36+
Add an enum entry in `MODEL_ARCH`, the model human friendly name in `MODEL_ARCH_NAMES` and the GGUF tensor names in `MODEL_TENSORS`.
37+
38+
Example for `falcon` model:
39+
```python
40+
MODEL_ARCH.FALCON: [
41+
MODEL_TENSOR.TOKEN_EMBD,
42+
MODEL_TENSOR.OUTPUT_NORM,
43+
MODEL_TENSOR.OUTPUT,
44+
MODEL_TENSOR.ATTN_NORM,
45+
MODEL_TENSOR.ATTN_NORM_2,
46+
MODEL_TENSOR.ATTN_QKV,
47+
MODEL_TENSOR.ATTN_OUT,
48+
MODEL_TENSOR.FFN_DOWN,
49+
MODEL_TENSOR.FFN_UP,
50+
]
51+
```
52+
53+
3. Map the original tensor names to the standardize equivalent in GGUF
54+
55+
As a general rule, before adding a new tensor name to GGUF, be sure the equivalent naming does not already exist.
56+
57+
Once you have found the GGUF tensor name equivalent, add it to the [tensor_mapping.py](../gguf-py/gguf/tensor_mapping.py) file.
58+
59+
If the tensor name is part of a repetitive layer/block, the key word `bid` substitutes it.
60+
61+
Example for the normalization tensor in attention layers:
62+
63+
```python
64+
block_mappings_cfg: dict[MODEL_TENSOR, tuple[str, ...]] = {
65+
# Attention norm
66+
MODEL_TENSOR.ATTN_NORM: (
67+
"gpt_neox.layers.{bid}.input_layernorm", # gptneox
68+
"transformer.h.{bid}.ln_1", # gpt2 gpt-j refact qwen
69+
"transformer.blocks.{bid}.norm_1", # mpt
70+
...
71+
)
72+
}
73+
```
74+
75+
`transformer.blocks.{bid}.norm_1` will be mapped to `blk.{bid}.attn_norm` in GGUF.
76+
77+
Depending on the model configuration, tokenizer, code and tensors layout, you will have to override:
78+
- `Model#set_gguf_parameters`
79+
- `Model#set_vocab`
80+
- `Model#write_tensors`
81+
82+
NOTE: Tensor names must end with `.weight` suffix, that is the convention and several tools like `quantize` expect this to proceed the weights.
83+
84+
### 2. Define the model architecture in `llama.cpp`
85+
86+
The model params and tensors layout must be defined in `llama.cpp`:
87+
1. Define a new `llm_arch`
88+
2. Define the tensors layout in `LLM_TENSOR_NAMES`
89+
3. Add any non standard metadata in `llm_load_hparams`
90+
4. Create the tensors for inference in `llm_load_tensors`
91+
5. If the model has a RoPE operation, add the rope type in `llama_rope_type`
92+
93+
NOTE: The dimensions in `ggml` are typically in the reverse order of the `pytorch` dimensions.
94+
95+
### 3. Build the GGML graph implementation
96+
97+
This is the funniest part, you have to provide the inference graph implementation of the new model architecture in `llama_build_graph`.
98+
99+
Have a look to existing implementation like `build_llama`, `build_dbrx` or `build_bert`.
100+
101+
When implementing a new graph, please note that the underlying `ggml` backends might not support them all, support of missing backend operations can be added in another PR.
102+
103+
## GGUF specification
104+
105+
https://github.com/ggerganov/ggml/blob/master/docs/gguf.md
106+
107+
## Resources
108+
109+
- YaRN RoPE scaling https://github.com/ggerganov/llama.cpp/pull/2268
110+
- support Baichuan serial models https://github.com/ggerganov/llama.cpp/pull/3009
111+
- support attention bias https://github.com/ggerganov/llama.cpp/pull/4283
112+
- Mixtral support https://github.com/ggerganov/llama.cpp/pull/4406
113+
- BERT embeddings https://github.com/ggerganov/llama.cpp/pull/5423
114+
- Grok-1 support https://github.com/ggerganov/llama.cpp/pull/6204
115+
- Command R Plus support https://github.com/ggerganov/llama.cpp/pull/6491
116+
- support arch DBRX https://github.com/ggerganov/llama.cpp/pull/6515
117+
- How to convert HuggingFace model to GGUF format https://github.com/ggerganov/llama.cpp/discussions/2948

0 commit comments

Comments
 (0)