You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Merge branch 'master' of github.com:ggerganov/llama.cpp into grammar-example
* 'master' of github.com:ggerganov/llama.cpp:
convert : remove bug in convert.py permute function (ggml-org#3364)
make-ggml.py : compatibility with more models and GGUF (ggml-org#3290)
gguf : fix a few general keys (ggml-org#3341)
metal : reusing llama.cpp logging (ggml-org#3152)
build : add ACCELERATE_NEW_LAPACK to fix warning on macOS Sonoma (ggml-org#3342)
readme : add some recent perplexity and bpw measurements to READMES, link for k-quants (ggml-org#3340)
cmake : fix build-info.h on MSVC (ggml-org#3309)
docs: Fix typo CLBlast_DIR var. (ggml-org#3330)
nix : add cuda, use a symlinked toolkit for cmake (ggml-org#3202)
- --model: (Required) The directory of the downloaded Hugging Face model or the name of the Hugging Face model repository. If the model directory does not exist, it will be downloaded from the Hugging Face model hub.
9
+
- model: (Required) The directory of the downloaded Hugging Face model or the name of the Hugging Face model repository. If the model directory does not exist, it will be downloaded from the Hugging Face model hub.
10
+
- --model_type: (Required) The type of the model to be converted. Choose from llama, starcoder, falcon, baichuan, or gptneox.
10
11
- --outname: (Optional) The name of the output model. If not specified, the last part of the model directory path or the Hugging Face model repo name will be used.
11
12
- --outdir: (Optional) The directory where the output model(s) will be stored. If not specified, '../models/{outname}' will be used.
12
13
- --quants: (Optional) The types of quantization to apply. This should be a space-separated list. The default is 'Q4_K_M Q5_K_S'.
13
14
- --keep_fp16: (Optional) If specified, the FP16 model will not be deleted after the quantized models are created.
14
15
15
-
Quant types:
16
+
Old quant types (some base model types require these):
16
17
- Q4_0: small, very high quality loss - legacy, prefer using Q3_K_M
17
18
- Q4_1: small, substantial quality loss - legacy, prefer using Q3_K_L
18
19
- Q5_0: medium, balanced quality - legacy, prefer using Q4_K_M
19
20
- Q5_1: medium, low quality loss - legacy, prefer using Q5_K_M
21
+
22
+
New quant types (recommended):
20
23
- Q2_K: smallest, extreme quality loss - not recommended
parser=argparse.ArgumentParser(description='Convert/Quantize HF to GGML. If you have the HF model downloaded already, pass the path to the model dir. Otherwise, pass the Hugging Face model repo name. You need to be in the /examples folder for it to work.')
85
-
parser.add_argument('--model', required=True, help='Downloaded model dir or Hugging Face model repo name')
88
+
parser=argparse.ArgumentParser(description='Convert/Quantize HF models to GGUF. If you have the HF model downloaded already, pass the path to the model dir. Otherwise, pass the Hugging Face model repo name. You need to be in the /examples folder for it to work.')
89
+
parser.add_argument('model', help='Downloaded model dir or Hugging Face model repo name')
90
+
parser.add_argument('--model_type', required=True, choices=['llama', 'starcoder', 'falcon', 'baichuan', 'gptneox'], help='Type of the model to be converted. Choose from llama, starcoder, falcon, baichuan, or gptneox.')
0 commit comments