Skip to content

kv_cache confusion, build_k_shift but no build_v_shift #7887

Answered by ggerganov
mjkpolo asked this question in Q&A
Discussion options

You must be logged in to vote

In the attention, the positions of the tokens is encoded via RoPE (i.e. rotations of the hidden state). Since the RoPE encoding is additive, we can "shift" cached keys by applying RoPE using the delta in the new and old positions. We don't apply it for the values (V) because they are not RoPEd explicitly

This operation is not mathematically equivalent to recomputing the new context from scratch, but it is much faster and seems to produce reasonable results for some reason

Replies: 1 comment 2 replies

Comment options

You must be logged in to vote
2 replies
@mjkpolo
Comment options

@ggerganov
Comment options

Answer selected by mjkpolo
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
2 participants