Which quantization algo is used under the hood in GGML? #69

tarunmcom · 2023-07-18T14:18:39Z

tarunmcom
Jul 18, 2023

When I convert to 4bit quntised, i want to know, Which quantization algo is used under the hood in GGML? Is it GPTQ or simple weight conversion?

Answered by cmp-nct

Jul 20, 2023

GGML has an own wide range of quantization options, gptq is pytorch only
The K type quantizers available outperform it from what I've in memory of benchmarks, the 4_1 should be lower quality than gptq but you can always go up to 5 or 6 bit or down to 2.5 bit.
q4_0 is a quite simple conversion with a delta value, q4_1 has additional scaling q4_k has a 256 bit superblock on top

For falcon 7B I currently only support Q type, for 40B Q and QK type

View full answer

cmp-nct · 2023-07-20T23:33:04Z

cmp-nct
Jul 20, 2023
Maintainer

GGML has an own wide range of quantization options, gptq is pytorch only
The K type quantizers available outperform it from what I've in memory of benchmarks, the 4_1 should be lower quality than gptq but you can always go up to 5 or 6 bit or down to 2.5 bit.
q4_0 is a quite simple conversion with a delta value, q4_1 has additional scaling q4_k has a 256 bit superblock on top

For falcon 7B I currently only support Q type, for 40B Q and QK type

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Which quantization algo is used under the hood in GGML? #69

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Select a reply

Uh oh!

Which quantization algo is used under the hood in GGML? #69

Uh oh!

tarunmcom Jul 18, 2023

Replies: 1 comment

Uh oh!

Uh oh!

cmp-nct Jul 20, 2023 Maintainer

tarunmcom
Jul 18, 2023

cmp-nct
Jul 20, 2023
Maintainer