keep_alive parameter #2175

insureteach · 2025-06-13T07:54:35Z

insureteach
Jun 13, 2025

Hello,

I am using the chat completion.
The response to the first request comes after a while (up to 200s) because the system needs to load the model. Then the response is quite fast. The problem is that, after a while (5 minutes) the model is unloaded and so the next request is again very slow as the first one. I noticed that there is an initiliazation parameter, KEEP_ALIVE in settings.yaml that is set to 5m (five minutes). I tryed to change it, with no success. If I use a value different than 5m, the system does not work (using the gui I receive "missing a require arguments: messages"). If I set the parameter back to 5m, it works again. I use docker to deploy the system with this command:

docker compose --profile ollama-cuda up -d --build

How can I force the system to mantain the model in memory?

Thank you.

Stefano.

system configuration:
i9 14900
64 gb ram
rtx 5070 12 gb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

keep_alive parameter #2175

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

keep_alive parameter #2175

Uh oh!

Uh oh!

insureteach Jun 13, 2025

Replies: 0 comments

insureteach
Jun 13, 2025