Challenges in Quantizing llama.cpp Models on Windows #10730
Unanswered
jasonsu123
asked this question in
Q&A
Replies: 1 comment 3 replies
-
🤖: Sure, here's a concise guide to help you through the process on Windows 10:
This should help you quantize your model to 👨: Btw, if step 2 fails, you can download the pre-build executables from https://github.com/ggerganov/llama.cpp/releases |
Beta Was this translation helpful? Give feedback.
3 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Hello everyone,
Previously, I asked how to convert the safetensors model from the Hugging Face website into a GGUF file. Later, someone provided instructional resources, and I'm currently able to convert it to a GGUF file using the convert_hf_to_gguf.py script from llama.cpp.
The process is as follows:
Enter the following commands in the CMD:
Then, I use the downloaded llama.cpp code in the llama.cpp folder to execute the following command:
python convert_hf_to_gguf.py D:\Ollama\TAIDE-LX-8B-Chat-Alpha1 --outfile D:\Ollama\TAIDE-LX-8B-Chat-Alpha1-q8_0.gguf --outtype q8_0
However, I'm unable to proceed further with the quantization.
For example, when I try to quantize to q4
I encounter an issue:
Copyerror: argument --outtype: invalid choice: 'q4_0' (choose from 'f32', 'f16', 'bf16', 'q8_0', 'tq1_0', 'tq2_0', 'auto')
It seems that I need to use the ./quantize or ./llama-quantize command, such as the examples in the tutorials:
However, I'm using Windows 10, so how can I modify these commands to work in my terminal?
It seems that the quantization process can only be done in a Linux environment, but I'm a programming newbie and don't know how to compile the quantize tool and then use it to quantize the GGUF model.
Could someone please provide a simple tutorial on how to do this?
I would really appreciate it.
Thank you.
Beta Was this translation helpful? Give feedback.
All reactions