General questions around quant methods and types #6561

younesbelkada · 2024-04-09T13:10:46Z

younesbelkada
Apr 9, 2024

Hi everyone,

First of all thanks for this great project and the amount of very nice information that we can get through the discussions and PRs.
I would like to get started at understanding how the underlying quantization methods work in llama.cpp, I might miss important details so please correct me at any time!

I started to learn more about the internals of gguf quants here: #1684 and my questions are mostly about 3/4 bits quantization schemes and not the recent addition of 1-bit quants.

My understanding of the core building block around GGUF seems to be group-wise quantization, is this correct? In that case are the activations always in half / full precision?
Some quant method seems to quantize the scales as well - do you have a rough idea of the potential overhead that this might introduce vs non-quantizing the scales?
If I understood correctly, different quant schemes are usually combined together - for example the query layer could be quantized with Q3_K but the key layer with Q5_K. I first thought this was architecture-specifc, meaning each arch has its own combination - but this does not seem to be the case - so I was wondering how does the combination of different quant tensors are determined?

e.g. below are two screenshots from two different models that derive from the same base model, which is mistral-7b and as you can see the combination of quant types look different

Below is for TheBloke/CapybaraHermes-2.5-Mistral-7B

Below is for NousResearch/Hermes-2-Pro-Mistral-7B

It also seems that the LM head is most of the times quantized, have you observed in your experience any important model quality drop when quantizing the LM head?

cc @ikawrakow @ggerganov

Thanks so much and let me know if there is another discussion / issue I might have overlooked !

younesbelkada · 2024-04-09T13:24:32Z

younesbelkada
Apr 9, 2024
Author

Found out #5063 for more elaborated information

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

General questions around quant methods and types #6561

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Select a reply

Uh oh!

General questions around quant methods and types #6561

Uh oh!

younesbelkada Apr 9, 2024

Replies: 1 comment

Uh oh!

Uh oh!

younesbelkada Apr 9, 2024 Author

younesbelkada
Apr 9, 2024

younesbelkada
Apr 9, 2024
Author