-
Notifications
You must be signed in to change notification settings - Fork 12
Description
I was trying to convert the GGUF model but there is a major limitation that the script is not handling.
The RKLLM is only cappable of converting the output weigths of q_4_0 and fp16 quants. It's not about the generic GGUF quant defined in filename but the output weight.
For example there GGUF files from q_2 to q_6 of Qwen 2.5 has output of q_6. Only the q_8 has q_8 output quant and then the fp16 has fp16.
In fact there is no way of converting Qwen2.5 using GGUF other than fp16... In this case i would use HF instead.
Second thing is that the GGUF logic downloads all files while the GGUF loading function expects only the one specific filename - it should use q_4_0 probably however in real scenario we should first determine if the output quant is q_4_0. I don't know if the HF library has an option to read metadata ? Because there is the weights detailed information.