-
There are a lot of quantization options for weights, I wonder whether there is a quantization process for activations? When I add
|
Beta Was this translation helpful? Give feedback.
Replies: 4 comments 4 replies
-
|
Beta Was this translation helpful? Give feedback.
-
I guess there may be some unpredictable troubles applying integer quantization to activations, since some non-linear operators may cause math-related issues. |
Beta Was this translation helpful? Give feedback.
-
@KerfuffleV2 The activations in
They are not quantized yet with Metal. On the CPU, even though The activations are always quantized to 8-bits ( For more info: |
Beta Was this translation helpful? Give feedback.
-
How do we verify for sure what is the activation quantization? |
Beta Was this translation helpful? Give feedback.
@KerfuffleV2
The term "activations" refers to the intermediate results obtained during the evaluation of the transformer. It does not mean the 1D tensors in the model. This is a terminology that I also recently learned
The activations in
ggml
are generally quantized when:They are not quantized yet with Metal.
On the CPU, even though
src1
has typeF32
, it is still being quantized internally in the matrix multiplication call:https://github.com/ggerganov/llama.cpp/blob/a40f2b656fab364ce0aff98dbefe9bd9c3721cc9/ggml.c#L11333-L11349
The activations are always quantized to 8-bits (
.vec_dot_type
):https://github.com/ggerganov/llama.cpp/blob/a40f2b656fab364…