How does llama.cpp manage multiple sessions or chats? #10040

RifeWang · 2024-10-25T07:36:30Z

RifeWang
Oct 25, 2024

In a conversational scenario, suppose there are two chat windows, chat1 and chat2, interacting with the llama.cpp server via HTTP, with only a single HTTP thread.
How are these two conversations and their history information handled?
Is the conversation history data stored in separate memory buffers?
Is there a risk of memory overflow?
When llama.cpp performs inference, does it load the context (including chat history) before proceeding with inference?

RifeWang · 2024-10-25T07:52:51Z

RifeWang
Oct 25, 2024
Author

Does llama.cpp not manage session history and multi-session isolation, focusing only on inference computation? That is, does it complete output for a single input without considering previous conversation history (requiring the user to merge conversation history into a single input)?

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

How does llama.cpp manage multiple sessions or chats? #10040

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

How does llama.cpp manage multiple sessions or chats? #10040

Uh oh!

RifeWang Oct 25, 2024

Replies: 1 comment

Uh oh!

RifeWang Oct 25, 2024 Author

RifeWang
Oct 25, 2024

RifeWang
Oct 25, 2024
Author