keep_alive parameter #2175
insureteach
started this conversation in
General
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Hello,
I am using the chat completion.
The response to the first request comes after a while (up to 200s) because the system needs to load the model. Then the response is quite fast. The problem is that, after a while (5 minutes) the model is unloaded and so the next request is again very slow as the first one. I noticed that there is an initiliazation parameter, KEEP_ALIVE in settings.yaml that is set to 5m (five minutes). I tryed to change it, with no success. If I use a value different than 5m, the system does not work (using the gui I receive "missing a require arguments: messages"). If I set the parameter back to 5m, it works again. I use docker to deploy the system with this command:
docker compose --profile ollama-cuda up -d --build
How can I force the system to mantain the model in memory?
Thank you.
Stefano.
system configuration:
i9 14900
64 gb ram
rtx 5070 12 gb
Beta Was this translation helpful? Give feedback.
All reactions