Skip to content

Commit 85e4e2b

Browse files
committed
Fix CI, scripts, readme files
1 parent dc24e7e commit 85e4e2b

File tree

9 files changed

+30
-127
lines changed

9 files changed

+30
-127
lines changed

CMakeLists.txt

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1305,7 +1305,7 @@ set_target_properties(llama PROPERTIES PUBLIC_HEADER ${CMAKE_CURRENT_SOURCE_DIR}
13051305
install(TARGETS llama LIBRARY PUBLIC_HEADER)
13061306

13071307
install(
1308-
FILES convert.py
1308+
FILES convert-hf-to-gguf.py
13091309
PERMISSIONS
13101310
OWNER_READ
13111311
OWNER_WRITE

README.md

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -690,7 +690,8 @@ Building the program with BLAS support may lead to some performance improvements
690690
691691
To obtain the official LLaMA 2 weights please see the <a href="#obtaining-and-using-the-facebook-llama-2-model">Obtaining and using the Facebook LLaMA 2 model</a> section. There is also a large selection of pre-quantized `gguf` models available on Hugging Face.
692692
693-
Note: `convert.py` does not support LLaMA 3, you can use `convert-hf-to-gguf.py` with LLaMA 3 downloaded from Hugging Face.
693+
Note: `convert.py` has been moved to `examples/convert-no-torch.py` and shouldn't be used for anything other than `Llama/Llama2/Mistral` models and their derievatives.
694+
It does not support LLaMA 3, you can use `convert-hf-to-gguf.py` with LLaMA 3 downloaded from Hugging Face.
694695

695696
```bash
696697
# obtain the official LLaMA model weights and place them in ./models
@@ -707,10 +708,10 @@ ls ./models
707708
python3 -m pip install -r requirements.txt
708709
709710
# convert the model to ggml FP16 format
710-
python3 convert.py models/mymodel/
711+
python3 convert-hf-to-gguf.py models/mymodel/
711712
712713
# [Optional] for models using BPE tokenizers
713-
python convert.py models/mymodel/ --vocab-type bpe
714+
python convert-hf-to-gguf.py models/mymodel/ --vocab-type bpe
714715
715716
# quantize the model to 4-bits (using Q4_K_M method)
716717
./quantize ./models/mymodel/ggml-model-f16.gguf ./models/mymodel/ggml-model-Q4_K_M.gguf Q4_K_M

ci/run.sh

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -282,7 +282,7 @@ function gg_run_open_llama_3b_v2 {
282282
(time cmake -DCMAKE_BUILD_TYPE=Release ${CMAKE_EXTRA} -DLLAMA_QKK_64=1 .. ) 2>&1 | tee -a $OUT/${ci}-cmake.log
283283
(time make -j ) 2>&1 | tee -a $OUT/${ci}-make.log
284284

285-
python3 ../convert.py ${path_models}
285+
python3 ../examples/convert-no-torch.py ${path_models}
286286

287287
model_f16="${path_models}/ggml-model-f16.gguf"
288288
model_q8_0="${path_models}/ggml-model-q8_0.gguf"
@@ -417,7 +417,7 @@ function gg_run_open_llama_7b_v2 {
417417
(time cmake -DCMAKE_BUILD_TYPE=Release ${CMAKE_EXTRA} -DLLAMA_CUDA=1 .. ) 2>&1 | tee -a $OUT/${ci}-cmake.log
418418
(time make -j ) 2>&1 | tee -a $OUT/${ci}-make.log
419419

420-
python3 ../convert.py ${path_models}
420+
python3 ../examples/convert-no-torch.py ${path_models}
421421

422422
model_f16="${path_models}/ggml-model-f16.gguf"
423423
model_q8_0="${path_models}/ggml-model-q8_0.gguf"

docs/HOWTO-add-model.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -17,7 +17,7 @@ Also, it is important to check that the examples and main ggml backends (CUDA, M
1717
### 1. Convert the model to GGUF
1818

1919
This step is done in python with a `convert` script using the [gguf](https://pypi.org/project/gguf/) library.
20-
Depending on the model architecture, you can use either [convert.py](../convert.py) or [convert-hf-to-gguf.py](../convert-hf-to-gguf.py).
20+
Depending on the model architecture, you can use either [convert-hf-to-gguf.py](../convert-hf-to-gguf.py) or [examples/convert-no-torch.py](../examples/convert-no-torch.py) (for `llama/llama2` models in `.pth` format).
2121

2222
The convert script reads the model configuration, tokenizer, tensor names+data and converts them to GGUF metadata and tensors.
2323

examples/llava/MobileVLM-README.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -54,10 +54,10 @@ python ./examples/llava/convert-image-encoder-to-gguf \
5454
--projector-type ldpv2
5555
```
5656

57-
4. Use `convert.py` to convert the LLaMA part of LLaVA to GGUF:
57+
4. Use `examples/convert-no-torch.py` to convert the LLaMA part of LLaVA to GGUF:
5858

5959
```sh
60-
python ./convert.py path/to/MobileVLM-1.7B
60+
python ./examples/convert-no-torch.py path/to/MobileVLM-1.7B
6161
```
6262

6363
5. Use `quantize` to convert LLaMA part's DataType from `fp16` to `q4_k`

examples/llava/README.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -50,10 +50,10 @@ python ./examples/llava/llava-surgery.py -m ../llava-v1.5-7b
5050
python ./examples/llava/convert-image-encoder-to-gguf.py -m ../clip-vit-large-patch14-336 --llava-projector ../llava-v1.5-7b/llava.projector --output-dir ../llava-v1.5-7b
5151
```
5252

53-
5. Use `convert.py` to convert the LLaMA part of LLaVA to GGUF:
53+
5. Use `examples/convert-no-torch.py` to convert the LLaMA part of LLaVA to GGUF:
5454

5555
```sh
56-
python ./convert.py ../llava-v1.5-7b --skip-unknown
56+
python ./examples/convert-no-torch.py ../llava-v1.5-7b --skip-unknown
5757
```
5858

5959
Now both the LLaMA part and the image encoder are in the `llava-v1.5-7b` directory.
@@ -92,7 +92,7 @@ python ./examples/llava/convert-image-encoder-to-gguf.py -m vit --llava-projecto
9292

9393
6) Then convert the model to gguf format:
9494
```console
95-
python ./convert.py ../llava-v1.6-vicuna-7b/ --skip-unknown
95+
python ./examples/convert-no-torch.py ../llava-v1.6-vicuna-7b/ --skip-unknown
9696
```
9797

9898
7) And finally we can run the llava-cli using the 1.6 model version:

examples/make-ggml.py

Lines changed: 0 additions & 98 deletions
This file was deleted.

scripts/convert-gg.sh

Lines changed: 10 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -3,20 +3,20 @@
33
set -e
44

55
# LLaMA v1
6-
python3 convert.py ../llama1/7B --outfile models/llama-7b/ggml-model-f16.gguf --outtype f16
7-
python3 convert.py ../llama1/13B --outfile models/llama-13b/ggml-model-f16.gguf --outtype f16
8-
python3 convert.py ../llama1/30B --outfile models/llama-30b/ggml-model-f16.gguf --outtype f16
9-
python3 convert.py ../llama1/65B --outfile models/llama-65b/ggml-model-f16.gguf --outtype f16
6+
python3 examples/convert-no-torch.py ../llama1/7B --outfile models/llama-7b/ggml-model-f16.gguf --outtype f16
7+
python3 examples/convert-no-torch.py ../llama1/13B --outfile models/llama-13b/ggml-model-f16.gguf --outtype f16
8+
python3 examples/convert-no-torch.py ../llama1/30B --outfile models/llama-30b/ggml-model-f16.gguf --outtype f16
9+
python3 examples/convert-no-torch.py ../llama1/65B --outfile models/llama-65b/ggml-model-f16.gguf --outtype f16
1010

1111
# LLaMA v2
12-
python3 convert.py ../llama2/llama-2-7b --outfile models/llama-7b-v2/ggml-model-f16.gguf --outtype f16
13-
python3 convert.py ../llama2/llama-2-13b --outfile models/llama-13b-v2/ggml-model-f16.gguf --outtype f16
14-
python3 convert.py ../llama2/llama-2-70b --outfile models/llama-70b-v2/ggml-model-f16.gguf --outtype f16
12+
python3 examples/convert-no-torch.py ../llama2/llama-2-7b --outfile models/llama-7b-v2/ggml-model-f16.gguf --outtype f16
13+
python3 examples/convert-no-torch.py ../llama2/llama-2-13b --outfile models/llama-13b-v2/ggml-model-f16.gguf --outtype f16
14+
python3 examples/convert-no-torch.py ../llama2/llama-2-70b --outfile models/llama-70b-v2/ggml-model-f16.gguf --outtype f16
1515

1616
# Code Llama
17-
python3 convert.py ../codellama/CodeLlama-7b/ --outfile models/codellama-7b/ggml-model-f16.gguf --outtype f16
18-
python3 convert.py ../codellama/CodeLlama-13b/ --outfile models/codellama-13b/ggml-model-f16.gguf --outtype f16
19-
python3 convert.py ../codellama/CodeLlama-34b/ --outfile models/codellama-34b/ggml-model-f16.gguf --outtype f16
17+
python3 examples/convert-no-torch.py ../codellama/CodeLlama-7b/ --outfile models/codellama-7b/ggml-model-f16.gguf --outtype f16
18+
python3 examples/convert-no-torch.py ../codellama/CodeLlama-13b/ --outfile models/codellama-13b/ggml-model-f16.gguf --outtype f16
19+
python3 examples/convert-no-torch.py ../codellama/CodeLlama-34b/ --outfile models/codellama-34b/ggml-model-f16.gguf --outtype f16
2020

2121
# Falcon
2222
python3 convert-falcon-hf-to-gguf.py ../falcon/falcon-7b 1

scripts/pod-llama.sh

Lines changed: 7 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -75,7 +75,7 @@ if [ "$1" -eq "1" ]; then
7575

7676
cd /workspace/llama.cpp
7777

78-
python3 convert.py ./models/tinyllama-1b --outfile ./models/tinyllama-1b/ggml-model-f16.gguf --outtype f16
78+
python3 examples/convert-no-torch.py ./models/tinyllama-1b --outfile ./models/tinyllama-1b/ggml-model-f16.gguf --outtype f16
7979

8080
./quantize ./models/tinyllama-1b/ggml-model-f16.gguf ./models/tinyllama-1b/ggml-model-q4_0.gguf q4_0
8181
./quantize ./models/tinyllama-1b/ggml-model-f16.gguf ./models/tinyllama-1b/ggml-model-q4_k.gguf q4_k
@@ -90,7 +90,7 @@ if [ "$1" -eq "2" ]; then
9090

9191
cd /workspace/llama.cpp
9292

93-
python3 convert.py ./models/codellama-7b --outfile ./models/codellama-7b/ggml-model-f16.gguf --outtype f16
93+
python3 examples/convert-no-torch.py ./models/codellama-7b --outfile ./models/codellama-7b/ggml-model-f16.gguf --outtype f16
9494

9595
./quantize ./models/codellama-7b/ggml-model-f16.gguf ./models/codellama-7b/ggml-model-q4_0.gguf q4_0
9696
./quantize ./models/codellama-7b/ggml-model-f16.gguf ./models/codellama-7b/ggml-model-q4_k.gguf q4_k
@@ -105,7 +105,7 @@ if [ "$1" -eq "3" ]; then
105105

106106
cd /workspace/llama.cpp
107107

108-
python3 convert.py ./models/codellama-13b --outfile ./models/codellama-13b/ggml-model-f16.gguf --outtype f16
108+
python3 examples/convert-no-torch.py ./models/codellama-13b --outfile ./models/codellama-13b/ggml-model-f16.gguf --outtype f16
109109

110110
./quantize ./models/codellama-13b/ggml-model-f16.gguf ./models/codellama-13b/ggml-model-q4_0.gguf q4_0
111111
./quantize ./models/codellama-13b/ggml-model-f16.gguf ./models/codellama-13b/ggml-model-q4_k.gguf q4_k
@@ -120,7 +120,7 @@ if [ "$1" -eq "4" ]; then
120120

121121
cd /workspace/llama.cpp
122122

123-
python3 convert.py ./models/codellama-34b --outfile ./models/codellama-34b/ggml-model-f16.gguf --outtype f16
123+
python3 examples/convert-no-torch.py ./models/codellama-34b --outfile ./models/codellama-34b/ggml-model-f16.gguf --outtype f16
124124

125125
./quantize ./models/codellama-34b/ggml-model-f16.gguf ./models/codellama-34b/ggml-model-q4_0.gguf q4_0
126126
./quantize ./models/codellama-34b/ggml-model-f16.gguf ./models/codellama-34b/ggml-model-q4_k.gguf q4_k
@@ -135,7 +135,7 @@ if [ "$1" -eq "5" ]; then
135135

136136
cd /workspace/llama.cpp
137137

138-
python3 convert.py ./models/codellama-7b-instruct --outfile ./models/codellama-7b-instruct/ggml-model-f16.gguf --outtype f16
138+
python3 examples/convert-no-torch.py ./models/codellama-7b-instruct --outfile ./models/codellama-7b-instruct/ggml-model-f16.gguf --outtype f16
139139

140140
./quantize ./models/codellama-7b-instruct/ggml-model-f16.gguf ./models/codellama-7b-instruct/ggml-model-q4_0.gguf q4_0
141141
./quantize ./models/codellama-7b-instruct/ggml-model-f16.gguf ./models/codellama-7b-instruct/ggml-model-q4_k.gguf q4_k
@@ -150,7 +150,7 @@ if [ "$1" -eq "6" ]; then
150150

151151
cd /workspace/llama.cpp
152152

153-
python3 convert.py ./models/codellama-13b-instruct --outfile ./models/codellama-13b-instruct/ggml-model-f16.gguf --outtype f16
153+
python3 examples/convert-no-torch.py ./models/codellama-13b-instruct --outfile ./models/codellama-13b-instruct/ggml-model-f16.gguf --outtype f16
154154

155155
./quantize ./models/codellama-13b-instruct/ggml-model-f16.gguf ./models/codellama-13b-instruct/ggml-model-q4_0.gguf q4_0
156156
./quantize ./models/codellama-13b-instruct/ggml-model-f16.gguf ./models/codellama-13b-instruct/ggml-model-q4_k.gguf q4_k
@@ -165,7 +165,7 @@ if [ "$1" -eq "7" ]; then
165165

166166
cd /workspace/llama.cpp
167167

168-
python3 convert.py ./models/codellama-34b-instruct --outfile ./models/codellama-34b-instruct/ggml-model-f16.gguf --outtype f16
168+
python3 examples/convert-no-torch.py ./models/codellama-34b-instruct --outfile ./models/codellama-34b-instruct/ggml-model-f16.gguf --outtype f16
169169

170170
./quantize ./models/codellama-34b-instruct/ggml-model-f16.gguf ./models/codellama-34b-instruct/ggml-model-q4_0.gguf q4_0
171171
./quantize ./models/codellama-34b-instruct/ggml-model-f16.gguf ./models/codellama-34b-instruct/ggml-model-q4_k.gguf q4_k

0 commit comments

Comments
 (0)