-
When I convert to 4bit quntised, i want to know, Which quantization algo is used under the hood in GGML? Is it GPTQ or simple weight conversion? |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment
-
GGML has an own wide range of quantization options, gptq is pytorch only For falcon 7B I currently only support Q type, for 40B Q and QK type |
Beta Was this translation helpful? Give feedback.
GGML has an own wide range of quantization options, gptq is pytorch only
The K type quantizers available outperform it from what I've in memory of benchmarks, the 4_1 should be lower quality than gptq but you can always go up to 5 or 6 bit or down to 2.5 bit.
q4_0 is a quite simple conversion with a delta value, q4_1 has additional scaling q4_k has a 256 bit superblock on top
For falcon 7B I currently only support Q type, for 40B Q and QK type