How to further quantize GGUF to Q4 format using llama.cpp? #10680
Unanswered
jasonsu123
asked this question in
Q&A
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Dear all,
I am using a Windows environment.
Currently, I can successfully convert Safetensors files from Hugging Face into Q8 GGUF format using the following example:
python convert_hf_to_gguf.py D:\Ollama\TAIDE-LX-8B-Chat-Alpha1 --outfile D:\Ollama\TAIDE-LX-8B-Chat-Alpha1-q8_0.gguf --outtype q8_0
However, how can I quantize the model into Q1~Q7 or different versions like K_M and K_S?
I tried modifying the query based on the above example,
python convert_hf_to_gguf.py D:\Ollama\TAIDE-LX-8B-Chat-Alpha1 --outfile D:\Ollama\TAIDE-LX-8B-Chat-Alpha1-q4_0.gguf --outtype q4_0
but I encountered the following error:
it seems like the ./quantize command is required for this,
./quantize ./models/Bailong-instruct-7B-f16.gguf ./models/Bailong-instruct-7B-v0.1-Q5_K_M.gguf q5_k_m
but since I am a Windows user, how should I modify the query to work in my environment?
Thank you!
Beta Was this translation helpful? Give feedback.
All reactions