tests : add non-cont K,V FA tests #14756

ggerganov · 2025-07-18T10:39:48Z

With the introduction of a split KV cache, the K and V tensors passed to FA can now be non-contiguous. Add tests in test-backend-ops to cover this.

This issue was reported here: #14363 (comment)

It can be reproduced with this command with CUDA backend:

make -j && LLAMA_SET_ROWS=1 ./bin/llama-parallel -hf ggml-org/Qwen2.5-Coder-3B-Q8_0-GGUF -np 8 -ns 128 -s 1 -c 4096 -fa -ngl 99 --top-k 1 -ctk q8_0 -ctv q8_0

0.02.205.072 I common_init_from_params: setting dry_penalty_last_n to ctx_size = 4608
0.02.205.072 W common_init_from_params: warming up the model with an empty run - please wait ... (--no-warmup to disable)
0.02.233.146 I No new questions so proceed with build-in defaults.
0.02.233.146 I 

0.02.240.868 I main: Simulating parallel requests from clients:
0.02.240.870 I main: n_parallel = 8, n_sequences = 128, cont_batching = 1, system tokens = 273
0.02.240.870 I 
0.02.240.870 I Processing requests ...

0.02.241.045 I main: clearing the KV cache
0.02.248.040 I Client   0, seq    0, junk =    0, prompt = 284, started decoding ...
0.02.254.999 I Client   1, seq    1, junk =    0, prompt = 284, started decoding ...
0.02.262.112 I Client   2, seq    2, junk =    0, prompt = 284, started decoding ...
0.02.269.266 I Client   3, seq    3, junk =    0, prompt = 290, started decoding ...
0.02.276.355 I Client   4, seq    4, junk =    0, prompt = 288, started decoding ...
0.02.283.337 I Client   5, seq    5, junk =    0, prompt = 285, started decoding ...
0.02.290.405 I Client   6, seq    6, junk =    0, prompt = 286, started decoding ...
0.02.297.367 I Client   7, seq    7, junk =    0, prompt = 284, started decoding ...
/home/ggerganov/development/github/llama.cpp/ggml/src/ggml-cuda/template-instances/../fattn-common.cuh:748: GGML_ASSERT(ggml_is_contiguously_allocated(K)) failed

cc @JohannesGaessler

ggml-ci

tests : add non-cont K,V FA tests

a856a56

ggml-ci

github-actions bot added the testing Everything test related label Jul 18, 2025

ggerganov mentioned this pull request Jul 18, 2025

llama : add high-throughput mode #14363

Merged

23 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

tests : add non-cont K,V FA tests #14756

tests : add non-cont K,V FA tests #14756

ggerganov commented Jul 18, 2025 •

edited

Loading

Uh oh!

Uh oh!

tests : add non-cont K,V FA tests #14756

Are you sure you want to change the base?

tests : add non-cont K,V FA tests #14756

Conversation

ggerganov commented Jul 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

ggerganov commented Jul 18, 2025 •

edited

Loading