Skip to content

How does kv cache discard is done in llama.cpp? #14026

Answered by ggerganov
DavidZyy asked this question in Q&A
Discussion options

You must be logged in to vote

The data remains in the memory buffer and is computed during the attention, but then it is discarded by the attention mask.

Replies: 3 comments

Comment options

You must be logged in to vote
0 replies
Answer selected by DavidZyy
Comment options

You must be logged in to vote
0 replies
Comment options

You must be logged in to vote
0 replies
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
2 participants