Skip to content

Regression in unified KV cache appears after llama.cpp release b5912 in b5913 #2045

@akarasulu

Description

@akarasulu

This issue concerns the llama-cpp-python community but was filed on the llama.cpp tracker first: ggml-org/llama.cpp#14847.

I just wanted to bring it to your attention. I can relocate the issue if it is more relevant here. For your convenience, the issue description is reproduced here:

Running llama-cpp-python against llama.cpp compiled after b5912, in b5913, results in:

llama.cpp/src/llama-kv-cache-unified.cpp:222: GGML_ASSERT(seq_id >= 0 && (size_t) seq_id < seq_to_stream.size()) failed

It appears to be a regression in sequence ID handling or unified KV cache logic affecting external bindings. This is consistent with the heavy work done on the kv-cache to prepare K/V buffers for separation in b5913.

NOTE: llama-cli runs successfully, but running llama-cpp-python against llama.cpp with the same model fails.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions