How does kv cache discard is done in llama.cpp? #14026

DavidZyy · 2025-06-05T11:19:01Z

DavidZyy
Jun 5, 2025

For example, the capicity of kv-cache is 64 token(n_ctx in examples/main/main.cpp) , when 64 is full, for example, we will discard the kv-cache of 0-31 token. I have seen the codes of rope shift of k-cache (for example, shift the position of 32-63 token to 0-31 token), but I do not see the codes about discard them (for example, set the memory of discard kv-cache to 0, the red area of matrix in the below pictrue means tokens to be discard), and seems the discard tokens still participate in attention operation when doing matrix multiply. Can someone help me?

Answered by ggerganov

Jun 5, 2025

The data remains in the memory buffer and is computed during the attention, but then it is discarded by the attention mask.

View full answer

ggerganov · 2025-06-05T11:22:44Z

ggerganov
Jun 5, 2025
Maintainer

The data remains in the memory buffer and is computed during the attention, but then it is discarded by the attention mask.

0 replies

DavidZyy · 2025-06-06T10:08:47Z

DavidZyy
Jun 6, 2025
Author

Thanks for your reply! More details for someone who have the same question (in my understanding) :
The mask tensor inp->self_kq_mask is created at function llm_graph_context::build_attn_inp_kv_unified().
The data of mask is set at llm_graph_input_attn_kv_unified::set_input. Through the if statement kv_self->cells[i].has_seq_id(seq_id), we can judge the token that has been discarded and set the its corresponding mask data to -INF, than when doing softmax operation, it will be set to 0.

0 replies

DavidZyy · 2025-06-06T10:12:57Z

DavidZyy
Jun 6, 2025
Author

The version of codes I used.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

How does kv cache discard is done in llama.cpp? #14026

Uh oh!

{{title}}

Uh oh!

Replies: 3 comments

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

How does kv cache discard is done in llama.cpp? #14026

Uh oh!

DavidZyy Jun 5, 2025

Replies: 3 comments

Uh oh!

ggerganov Jun 5, 2025 Maintainer

Uh oh!

DavidZyy Jun 6, 2025 Author

Uh oh!

DavidZyy Jun 6, 2025 Author

DavidZyy
Jun 5, 2025

ggerganov
Jun 5, 2025
Maintainer

DavidZyy
Jun 6, 2025
Author

DavidZyy
Jun 6, 2025
Author