Skip to content

Which quantization algo is used under the hood in GGML? #69

Answered by cmp-nct
tarunmcom asked this question in Q&A
Discussion options

You must be logged in to vote

GGML has an own wide range of quantization options, gptq is pytorch only
The K type quantizers available outperform it from what I've in memory of benchmarks, the 4_1 should be lower quality than gptq but you can always go up to 5 or 6 bit or down to 2.5 bit.
q4_0 is a quite simple conversion with a delta value, q4_1 has additional scaling q4_k has a 256 bit superblock on top

For falcon 7B I currently only support Q type, for 40B Q and QK type

Replies: 1 comment

Comment options

You must be logged in to vote
0 replies
Answer selected by cmp-nct
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
2 participants