kv_cache confusion, build_k_shift but no build_v_shift #7887

mjkpolo · 2024-06-11T20:22:31Z

mjkpolo
Jun 11, 2024

I see in examples/main/main.cpp that when n_ctx is exceeded, half the cache is "erased" and the other half is shifted with a delta. But I only see build_k_shift using this delta via inp_K_shift input tensor to shift k_l but not v_l.

Where does the delta come into play for v_l? Does the build_K_shift not do what I think it does? I am also struggling to understand how inp_K_shift is actually used to shift k_l.

Also, what happens when llama_kv_cache_seq_rm is called. Even if half of seq_id 0 is deleted from the cache table, won't it stay in k_l and v_l?

Answered by ggerganov

Jun 12, 2024

In the attention, the positions of the tokens is encoded via RoPE (i.e. rotations of the hidden state). Since the RoPE encoding is additive, we can "shift" cached keys by applying RoPE using the delta in the new and old positions. We don't apply it for the values (V) because they are not RoPEd explicitly

This operation is not mathematically equivalent to recomputing the new context from scratch, but it is much faster and seems to produce reasonable results for some reason

View full answer

ggerganov · 2024-06-12T11:13:33Z

ggerganov
Jun 12, 2024
Maintainer

In the attention, the positions of the tokens is encoded via RoPE (i.e. rotations of the hidden state). Since the RoPE encoding is additive, we can "shift" cached keys by applying RoPE using the delta in the new and old positions. We don't apply it for the values (V) because they are not RoPEd explicitly

This operation is not mathematically equivalent to recomputing the new context from scratch, but it is much faster and seems to produce reasonable results for some reason

2 replies

mjkpolo Jun 12, 2024
Author

thank you so much! So If I understand, only k's RoPE domain is shifted, because kv_head is used for both v_l and k_l and since they're multiplied, as long as k_l's rope is encoded to the delta, then v_l will line up? Especially since v_l is never RoPE'd itself?

Finally, I am wondering why kv_head loops back around. For example if kv cache has tokens |A|B|C|D|E|, n_keep is 1, n_ctx is 5, when we generate F, I'd expect D and E to be shifted back by 2, and the kv cache to look like |A|D|E|F| |. Instead it seems to me that kv_head loops back to pos 1 ahead A which we kept: |A|F|C|D|E|.

~~Since F comes after E, isn't it required to be after E in the kv cache? Otherwise the temporal information of the tokens are lost?~~

temporal info is in the RoPE

ggerganov Jun 14, 2024
Maintainer

temporal info is in the RoPE

Yes, the physical position of the data in the memory buffer is not relevant. What matters is how the position (i.e. the time) was encoded in the data

because kv_head is used for both v_l and k_l

Yes

as long as k_l's rope is encoded to the delta, then v_l will line up? Especially since v_l is never RoPE'd itself?

Yes, that's what I also initially thought in the beginning. But if you look closely at how the hidden state propagates through the transformer, you'll notice that v_l actually depends on [k_0, k_1, ... k_(l-1)]. Therefore, v_l also indirectly depends on the RoPE operations on K and that is why I said that doing only K-shift is not mathematically equivalent to recomputing the context from scratch - the cached V will not be correct. Nevertheless it still kind of seems to work - not sure why exactly

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

kv_cache confusion, build_k_shift but no build_v_shift #7887

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 1 comment 2 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Select a reply

Uh oh!

kv_cache confusion, build_k_shift but no build_v_shift #7887

Uh oh!

Uh oh!

mjkpolo Jun 11, 2024

Replies: 1 comment · 2 replies

Uh oh!

Uh oh!

ggerganov Jun 12, 2024 Maintainer

Uh oh!

Uh oh!

mjkpolo Jun 12, 2024 Author

Uh oh!

Uh oh!

ggerganov Jun 14, 2024 Maintainer

mjkpolo
Jun 11, 2024

Replies: 1 comment 2 replies

ggerganov
Jun 12, 2024
Maintainer

mjkpolo Jun 12, 2024
Author

ggerganov Jun 14, 2024
Maintainer