Skip to content

server.cpp append only prompting? Fast chats with Alpaca prompt format. #2648

Answered by SlyEcho
psugihara asked this question in Q&A
Discussion options

You must be logged in to vote

The server reuses the previously evaluated state if the start of the string is the same. Try to format the prompt in a way that puts the changing parts in the end. Note that the text generated by the model is part of this cache, so that's why append-only is the fastest.

Or use a GPU, which can evaluate even long prompts nearly instantly.

Replies: 1 comment

Comment options

You must be logged in to vote
0 replies
Answer selected by Green-Sky
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
2 participants