Toggle to Disable dynamic Context Window ( dynamic calculated num_ctx ) #434

reneil1337 · 2025-03-18T09:41:34Z

reneil1337
Mar 18, 2025

Is your feature request related to a problem? Please describe.
With each document thats being analyzed, the llm is unloaded from my ollama instance. This is due to the api calls via different num_ctx param each time the ollama llm is put to work. I'd like to be able to disable this and let paperless-ai call my ollama instance without setting the num_ctx in the api call which would allow to analyze all documents without reloading the llm each time. This would massively increase performance as it allows all docs to the analyzed in a single flow without unloading/realoading the llm between each document.

Describe the solution you'd like
I'd love to have a dropdown/toggle in the settings labeled "dynamic context window" which can be set to false. Switching that setting from true to false would result paperless-ai to call the ollama api with the default context window thats defined on the server side, preventing the constant llm reloading.

Additional context
I run llama 3.3 70b on my ollama server. Unloading and reloading a model of that size takes longer than running the query requested by paperless-ai. Also paperless-ai interferes with other apps like OpenWebUI and other services that use Ollama with the default context window. All of my other applications use ollama with the default context window, never unloading the llm from the VRAM. Once paperless-ai starts running, all those other services loose connection to the llm.

milesian01 · 2025-03-23T17:02:06Z

milesian01
Mar 23, 2025

In addition to being able to set this to false, having the ability to also set num_ctx to a specific value would also solve this issue and give users full control over model loading/unloading.

0 replies

haldi4803 · 2025-07-19T16:29:09Z

haldi4803
Jul 19, 2025

THIS!!!
I was wondering why i see vram going up and down all the time! Seems like Paperless-AI unloads the model all the time.

quick research show that shouldn't be the case. Should be kept loaded for 5 minutes default.
https://github.com/ollama/ollama/blob/main/docs/faq.md#how-do-i-keep-a-model-loaded-in-memory-or-make-it-unload-immediately
https://old.reddit.com/r/ollama/comments/1fh040f/question_how_to_keep_ollama_from_unloading_model/

Ollama Server Log:
https://pastebin.com/ga3WFTfJ

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Toggle to Disable dynamic Context Window ( dynamic calculated num_ctx ) #434

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 2 comments

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Select a reply

Uh oh!

Uh oh!

Toggle to Disable dynamic Context Window ( dynamic calculated num_ctx ) #434

Uh oh!

Uh oh!

reneil1337 Mar 18, 2025

Replies: 2 comments

Uh oh!

milesian01 Mar 23, 2025

Uh oh!

Uh oh!

haldi4803 Jul 19, 2025

reneil1337
Mar 18, 2025

milesian01
Mar 23, 2025

haldi4803
Jul 19, 2025