Skip to content

No quantization for activations? #3349

Answered by ggerganov
Z-KN asked this question in Q&A
Sep 27, 2023 · 4 comments · 4 replies
Discussion options

You must be logged in to vote

@KerfuffleV2
The term "activations" refers to the intermediate results obtained during the evaluation of the transformer. It does not mean the 1D tensors in the model. This is a terminology that I also recently learned

The activations in ggml are generally quantized when:

  • running on the CPU
  • running with CUDA

They are not quantized yet with Metal.

On the CPU, even though src1 has type F32, it is still being quantized internally in the matrix multiplication call:

https://github.com/ggerganov/llama.cpp/blob/a40f2b656fab364ce0aff98dbefe9bd9c3721cc9/ggml.c#L11333-L11349

The activations are always quantized to 8-bits (.vec_dot_type):

https://github.com/ggerganov/llama.cpp/blob/a40f2b656fab364…

Replies: 4 comments 4 replies

Comment options

You must be logged in to vote
2 replies
@Z-KN
Comment options

@KerfuffleV2
Comment options

Comment options

You must be logged in to vote
0 replies
Comment options

You must be logged in to vote
2 replies
@KerfuffleV2
Comment options

@Z-KN
Comment options

Answer selected by Z-KN
Comment options

You must be logged in to vote
0 replies
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
5 participants