k-quants are scary! #4140
KerfuffleV2
started this conversation in
Show and tell
Replies: 1 comment 3 replies
-
Well .. when running something like a 3.6TB model 1.X bit quantization might come handy :) But getting perplexity tests is going to be interesting :) |
Beta Was this translation helpful? Give feedback.
3 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Story time, because I just got something working!
A while back, I added the ability for
convert.py
to convert models directly toQ8_0
format.Q8_0
quantization is pretty simple:(My original version had explicit loops and stuff, cebtenzzre numpy-ized it and made it actually perform fast enough to be usable.)
Anyway, I've been planning to try to port other quantizations to Python just to mess around with. My ultimate goal is to make a 1-bit K-quant type quantization even though I know it will be useless. I finally got around to it today.
Oh man, k-quants are so much more complicated than something like
Q8_0
. I finally got it working and producing the same output as the C version (at least for a block that just has -128 through 127).Feast your eyes on this:
Expand... if you dare!
Obviously it's not useful at all in its current hacked together just barely working state but it actually does seem to produce the same output as the C version.
Beta Was this translation helpful? Give feedback.
All reactions