Skip to content

GGUF #1

@imkebe

Description

@imkebe

I was trying to convert the GGUF model but there is a major limitation that the script is not handling.
The RKLLM is only cappable of converting the output weigths of q_4_0 and fp16 quants. It's not about the generic GGUF quant defined in filename but the output weight.

For example there GGUF files from q_2 to q_6 of Qwen 2.5 has output of q_6. Only the q_8 has q_8 output quant and then the fp16 has fp16.

In fact there is no way of converting Qwen2.5 using GGUF other than fp16... In this case i would use HF instead.

Second thing is that the GGUF logic downloads all files while the GGUF loading function expects only the one specific filename - it should use q_4_0 probably however in real scenario we should first determine if the output quant is q_4_0. I don't know if the HF library has an option to read metadata ? Because there is the weights detailed information.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions