Toggle to Disable dynamic Context Window ( dynamic calculated num_ctx ) #434
Replies: 2 comments
-
In addition to being able to set this to false, having the ability to also set num_ctx to a specific value would also solve this issue and give users full control over model loading/unloading. |
Beta Was this translation helpful? Give feedback.
-
THIS!!! ![]() quick research show that shouldn't be the case. Should be kept loaded for 5 minutes default. Ollama Server Log: |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Is your feature request related to a problem? Please describe.
With each document thats being analyzed, the llm is unloaded from my ollama instance. This is due to the api calls via different num_ctx param each time the ollama llm is put to work. I'd like to be able to disable this and let paperless-ai call my ollama instance without setting the num_ctx in the api call which would allow to analyze all documents without reloading the llm each time. This would massively increase performance as it allows all docs to the analyzed in a single flow without unloading/realoading the llm between each document.
Describe the solution you'd like
I'd love to have a dropdown/toggle in the settings labeled "dynamic context window" which can be set to false. Switching that setting from true to false would result paperless-ai to call the ollama api with the default context window thats defined on the server side, preventing the constant llm reloading.
Additional context
I run llama 3.3 70b on my ollama server. Unloading and reloading a model of that size takes longer than running the query requested by paperless-ai. Also paperless-ai interferes with other apps like OpenWebUI and other services that use Ollama with the default context window. All of my other applications use ollama with the default context window, never unloading the llm from the VRAM. Once paperless-ai starts running, all those other services loose connection to the llm.
Beta Was this translation helpful? Give feedback.
All reactions