-
Hey, I currently use llama-server with Open-WebUI. When the context is long, I have to wait a long time before it starts writing an answer every time I ask something. This is not the case with llama-cli in conversation mode as follow up questions don't make the model reprocess the whole prompt. Edit: #4347 seems to be the fix but can I specify cache_prompt=True when starting the server or do I need to send it with every request? |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 1 reply
-
For now I just changed the default value for cache_prompt to true and rebuild it since I can't find which parameter to pass to llama-server for it. |
Beta Was this translation helpful? Give feedback.
You need to send the
cache_prompt
with every request