Replies: 2 comments
-
Would you consider running the interference as part of the bootstrapping process of your app? That should cache the prompts. |
Beta Was this translation helpful? Give feedback.
0 replies
-
@calvintwr This post was made back in April when prompt caching was not supported, but it was implemented a short time after. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
When you start the program, it has to load the model, then tokenize the prompt and run it through the model. The time to load the model has been mostly eliminated. Tokenizing is also very fast. It would be nice if it were possible to cache the result of running the prompt through the model and load it on startup as well.
I write long prompts, because it gives better results, but it can take a few minutes for the whole thing to process before the window is responsive. With this change, it should be nearly instant.
Beta Was this translation helpful? Give feedback.
All reactions