You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This issue concerns the llama-cpp-python community but was filed on the llama.cpp tracker first: ggml-org/llama.cpp#14847.
I just wanted to bring it to your attention. I can relocate the issue if it is more relevant here. For your convenience, the issue description is reproduced here:
Running llama-cpp-python against llama.cpp compiled after b5912, in b5913, results in:
It appears to be a regression in sequence ID handling or unified KV cache logic affecting external bindings. This is consistent with the heavy work done on the kv-cache to prepare K/V buffers for separation in b5913.
NOTE: llama-cli runs successfully, but running llama-cpp-python against llama.cpp with the same model fails.