Replies: 1 comment
-
Does llama.cpp not manage session history and multi-session isolation, focusing only on inference computation? That is, does it complete output for a single input without considering previous conversation history (requiring the user to merge conversation history into a single input)? |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
In a conversational scenario, suppose there are two chat windows, chat1 and chat2, interacting with the llama.cpp server via HTTP, with only a single HTTP thread.
How are these two conversations and their history information handled?
Is the conversation history data stored in separate memory buffers?
Is there a risk of memory overflow?
When llama.cpp performs inference, does it load the context (including chat history) before proceeding with inference?
Beta Was this translation helpful? Give feedback.
All reactions