server.cpp append only prompting? Fast chats with Alpaca prompt format. #2648

psugihara · 2023-08-17T21:47:03Z

psugihara
Aug 17, 2023

I'm working on a chat interface powered by server.cpp (an incredible little tool 👏). I noticed that if I only append to the prompt like the chat example app then it responds very quickly (within 2 seconds on my macbook) as if it is continuing where it left off. BUT if I change the beginning of the prompt it will take a long time to respond (seems linear-ish with the total prompt length).

My intuition is that it's caching the state and blowing away the cache if the previous prompt is not a prefix of the new prompt. Can anyone explain what's happening there at a high level?

I was trying it with Alpaca style prompt formats where it's common to put conversation history in the ### Instruction: section like this but the responses just become really slow to start generating. Has anyone gotten a good result with this prompt format?

Answered by SlyEcho

Aug 25, 2023

The server reuses the previously evaluated state if the start of the string is the same. Try to format the prompt in a way that puts the changing parts in the end. Note that the text generated by the model is part of this cache, so that's why append-only is the fastest.

Or use a GPU, which can evaluate even long prompts nearly instantly.

View full answer

SlyEcho · 2023-08-25T09:48:10Z

SlyEcho
Aug 25, 2023
Collaborator

The server reuses the previously evaluated state if the start of the string is the same. Try to format the prompt in a way that puts the changing parts in the end. Note that the text generated by the model is part of this cache, so that's why append-only is the fastest.

Or use a GPU, which can evaluate even long prompts nearly instantly.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

server.cpp append only prompting? Fast chats with Alpaca prompt format. #2648

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

server.cpp append only prompting? Fast chats with Alpaca prompt format. #2648

Uh oh!

psugihara Aug 17, 2023

Replies: 1 comment

Uh oh!

SlyEcho Aug 25, 2023 Collaborator

psugihara
Aug 17, 2023

SlyEcho
Aug 25, 2023
Collaborator