Caching #4341
-
I feel like I'm being a pest, but that's not my intention. I just want to know whether the new OAI-like endpoint I understand that Thanks, |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 1 reply
-
It looks like it always just turns prompt caching on. https://github.com/ggerganov/llama.cpp/blob/5f6e0c0dff1e7a89331e6b25eca9a9fd71324069/examples/server/api_like_OAI.py#L80-L84 Whether it actually works or not, I don't know. You can possibly make your own modifications there to the stuff it's sending to the server example. |
Beta Was this translation helpful? Give feedback.
-
@ggerganov has fixed it in branch gg/server-oai-cache-prompt. Works very well now. See #4329. Makes it feasible to work on large-ish docs and chats interactively with 7B models running on my Mac mini. |
Beta Was this translation helpful? Give feedback.
@ggerganov has fixed it in branch gg/server-oai-cache-prompt. Works very well now. See #4329. Makes it feasible to work on large-ish docs and chats interactively with 7B models running on my Mac mini.