Q4_K Quantization Scheme adaptation #6760
Unanswered
wilderfield
asked this question in
Q&A
Replies: 1 comment 3 replies
-
@ikawrakow I’m wondering if you might be able to help me with this math/quantization question. If you have more important things to do I completely understand. |
Beta Was this translation helpful? Give feedback.
3 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
So I see that to dequantize a weight in Q4_K format I have to do:
y = s * q - m
y is the dequantized weight (float)
s is the scale (float)
q is the quantized weight (int4)
m is the zero point offset (float)
Now the challenge is that I have hardware that expects the following scheme:
y = s * (q - z)
The difference from above is that z is the zero point (int4)
This is also the scheme that pytorch uses: https://pytorch.org/blog/quantization-in-practice/
I want to use the parameters from llama.cpp with my hardware, so I try to do some math...
Set the two equations equal to each other, and solve for z.
I get z = round(m/s)
When I simulate this adaptation, I get catastrophic accuracy loss, even without hardware involved.
Is there something fundamentally wrong with this math?
Is it not possible to reconcile these two quantization schemes?
Beta Was this translation helpful? Give feedback.
All reactions