How to further quantize GGUF to Q4 format using llama.cpp? #10680

jasonsu123 · 2024-12-05T23:16:55Z

jasonsu123
Dec 5, 2024

Dear all,

I am using a Windows environment.
Currently, I can successfully convert Safetensors files from Hugging Face into Q8 GGUF format using the following example:

python convert_hf_to_gguf.py D:\Ollama\TAIDE-LX-8B-Chat-Alpha1 --outfile D:\Ollama\TAIDE-LX-8B-Chat-Alpha1-q8_0.gguf --outtype q8_0

However, how can I quantize the model into Q1~Q7 or different versions like K_M and K_S?
I tried modifying the query based on the above example,
python convert_hf_to_gguf.py D:\Ollama\TAIDE-LX-8B-Chat-Alpha1 --outfile D:\Ollama\TAIDE-LX-8B-Chat-Alpha1-q4_0.gguf --outtype q4_0

but I encountered the following error:

convert_hf_to_gguf.py: error: argument --outtype: invalid choice: 'q4_0' (choose from 'f32', 'f16', 'bf16', 'q8_0', 'tq1_0', 'tq2_0', 'auto')

it seems like the ./quantize command is required for this,
./quantize ./models/Bailong-instruct-7B-f16.gguf ./models/Bailong-instruct-7B-v0.1-Q5_K_M.gguf q5_k_m

but since I am a Windows user, how should I modify the query to work in my environment?

Thank you!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

How to further quantize GGUF to Q4 format using llama.cpp? #10680

Uh oh!

{{title}}

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

How to further quantize GGUF to Q4 format using llama.cpp? #10680

Uh oh!

jasonsu123 Dec 5, 2024

Replies: 0 comments

jasonsu123
Dec 5, 2024