tests : add non-cont K,V FA tests #14756
Open
+11
−5
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
cont #14363
With the introduction of a split KV cache, the K and V tensors passed to FA can now be non-contiguous. Add tests in
test-backend-ops
to cover this.This issue was reported here: #14363 (comment)
It can be reproduced with this command with CUDA backend:
make -j && LLAMA_SET_ROWS=1 ./bin/llama-parallel -hf ggml-org/Qwen2.5-Coder-3B-Q8_0-GGUF -np 8 -ns 128 -s 1 -c 4096 -fa -ngl 99 --top-k 1 -ctk q8_0 -ctv q8_0
cc @JohannesGaessler