Replies: 1 comment
-
According to this paper, per channel quant work better for K state. Per channel quant is a column major quant of Matrix A. So i wonder whether llama.cpp support this per channel quant. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
I am researching quant of llama.cpp. It seems all quant types quant weights in one row of Matrix A and weights in one column of Matrix B when calculating C = A x B. Does llama.cpp provide SGEMM of column major quant?
Beta Was this translation helpful? Give feedback.
All reactions