Context reuse / context shift for long prompts #451
Replies: 4 comments 8 replies
-
This is a very useful usecase because of which I have been switching back and forth between ik_llama.cpp and llama.cpp. This works seamlessly with llama.cpp i have noticed. I always thought I am doing something wrong here and it is my user error, but apparantly it is not! Thank you for mentioning it here. |
Beta Was this translation helpful? Give feedback.
-
This would be a massive win for me. Currently PP is the millstone around the neck (for which you have had to endure many of my ignorant comments in support of a solution). KV Cache reuse and tool calling would open up whole new worlds. |
Beta Was this translation helpful? Give feedback.
-
Glad to see that others are also interested in this feature! I was about to open an issue myself, but I noticed that @saood06 is already looking into something similar here — so now it’s just a matter of waiting. By the way, @saood06, if you need any help with testing, I’d be happy to assist. |
Beta Was this translation helpful? Give feedback.
-
Might have to do it myself. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Hi! — I'm coming from koboldcpp, and I've been testing this fork due to its optimizations.
One feature I found very useful in koboldcpp was the context shift functionality, which helps when working with very long context windows.
I noticed that
llama.cpp
implemented something similar in PR #9866, which allows for reusing the prompt cache more efficiently instead of regenerating the entire prompt every time the context overflows.I searched through this repo but couldn’t find an equivalent implementation.
Here’s the issue I’m currently facing:
My question is:
Thanks!
Beta Was this translation helpful? Give feedback.
All reactions