Replies: 1 comment 1 reply
-
https://newsletter.maartengrootendorst.com/p/a-visual-guide-to-quantization Breakdown for K quantization, but more information about them is in pull requests. |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Hello Team,
I have been learning about transformer quantization and is particularly interested in full integer 8-bit quantization. What quantization scheme (like GPTQ, AWQ, SmoothQuant) is supported in full integer quantization in
llama.cpp
? To give a context, TFLite uses symmetric int8 quantization for calibration and inference.Or does llama.cpp support any quantization type as long as the format is in GGUF or GGML? I really appreciate any help or pointers. Thanks.
Beta Was this translation helpful? Give feedback.
All reactions