Skip to content

Commit 3aa78f7

Browse files
committed
fix: Fix indexing into k_l for recurrent cache with filter
Branch: HybridCache Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>
1 parent 704a0a8 commit 3aa78f7

File tree

1 file changed

+2
-2
lines changed

1 file changed

+2
-2
lines changed

src/llama-kv-cache.cpp

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1946,8 +1946,8 @@ llama_kv_cache_recurrent::llama_kv_cache_recurrent(
19461946
ggml_tensor * v = ggml_new_tensor_1d(ctx, type_v, n_embd_v_gqa*kv_size);
19471947
ggml_format_name(k, "cache_k_l%d", i);
19481948
ggml_format_name(v, "cache_v_l%d", i);
1949-
k_l.push_back(k);
1950-
v_l.push_back(v);
1949+
k_l[i] = k;
1950+
v_l[i] = v;
19511951
}
19521952

19531953
// allocate tensors and initialize the buffers to avoid NaNs in the padding

0 commit comments

Comments
 (0)