[draft] new prompt templates - nearly 100% completition: ollama + qwen.2-5-instruct:32b (14b/7b feasible too) #327

thiscantbeserious · 2025-04-03T10:31:50Z

thiscantbeserious
Apr 3, 2025

From my testings I figured that deepseek-r1 wasn't able to properly follow the instructions especially in regards to parsing the date. That's why I tried a few different models and landed at qwen2.5:32b-instruct (7b and 14b work too, just slightly more errors).

This produces the most reliable results for me - including documents that weren't properly OCR`ed given my instructions.

I ran multiple models against the full 178 documents multiple times and different document-types including hand-writing.

Key take-away is that we might need a few improvements on the code-layer to improve other model results. For example the first thing I'd add is a simple date parser that extracts the last date that matches YYYY-MM-DD to solve the issue with slight off responses with additional noise. However models like deepseek-r1 would highly improve if we are able to use multiple prompts to fine-tune results, so that`s another possible improvement I do see on the code layer. And most importantly we should allow different models for each field - say you know that an instruct model is far better at following instructions like date parsing, but you want a more refined model like deepseek-r1 to work on the title.

That's the reasoning behind my choice and some outlook of possible PR`s / Improvements I might see ...

I noticed that the most important change was moving the Content at the top of the template for the instructions to not be overwritten under any circumstance.

paperless-gpt TOKEN_LIMIT set at:
3000

Ollama default Context Length via OLLAMA_CONTEXT_LENGTH:
8096 - 4048 should be fine too ...

Language of Documents (and System):
German

Documents tested:
179 so far and counting

Success rate:
Nearly 100% - Quality I would rate at 70-80% of the results being ok, even on really sub-par OCRs ...

I created and refined the prompts with the models itself to make sure its in a format its working best against - last but not least however I manually fine-tuned them and tested them again until I was somewhat satisfied ...

You can find the most recent version of the templates in my Git-Repo:

https://github.com/thiscantbeserious/paperless-gpt-prompts

andbez · 2025-04-14T19:31:36Z

andbez
Apr 14, 2025

Hi,
I really appreciate your findings. Are you still testing? I am also interested in your OCR-Prompt (because all of my > 6000 Documents are German :)),

Best Regards

0 replies

thiscantbeserious · 2025-04-15T11:09:03Z

thiscantbeserious
Apr 15, 2025
Author

I'm still sticking with it - haven't added that many more documents lately. So I didn't get around implementing any changes yet ... I have setup an automation for adding the ocr tags for new documents however ... that`s also how I got around re-doing everything again and again for testing purposes.

Feel free to try and report back! I guess we might need better some code changes for better results. Yesterday I`ve been looking into letta ... not sure if that's overkill but I'll see.

0 replies

thiscantbeserious · 2025-04-25T13:43:09Z

thiscantbeserious
Apr 25, 2025
Author

Short update since I'm currently developing an LMM based Search Engine that makes use of these models too (https://github.com/thiscantbeserious/llm-search-agent)

qwen2.5:3b-instruct_q8_0 seems to be quite reliable too. So this might be a solid choice in regards to a model thats even more lightweight while producing predictable results.

Albeit it does seem to have a much simpler language level, this should be fine for document titles and such.

Will keep you updated with my observations since I'm planning to implement a testing pipeline to benchmark the accuracy and reliability there against my prompts.

2 replies

danshaw Jul 17, 2025

Thanks for sharing this. Are you using a local model to help with any local OCR or are you relying on NGX to do the OCR? I am finding the LLM OCR is much more accurate than NDX.

thiscantbeserious Jul 21, 2025
Author

Yep, I'm using the recommended local LLM model (MiniCPM-V) for OCR.

Might as well turn the OCR from NGX off as well .. because Tessaract is really poor compared to it, especially with anything hand-written nearly impossible garbage. So lets save some compute resources there ...

Uh oh!

[draft] new prompt templates - nearly 100% completition: ollama + qwen.2-5-instruct:32b (14b/7b feasible too) #327

Uh oh!

Uh oh!

thiscantbeserious Apr 3, 2025

You can find the most recent version of the templates in my Git-Repo:

Replies: 3 comments · 2 replies

Uh oh!

Uh oh!

andbez Apr 14, 2025

Uh oh!

Uh oh!

thiscantbeserious Apr 15, 2025 Author

Uh oh!

Uh oh!

thiscantbeserious Apr 25, 2025 Author

Uh oh!

danshaw Jul 17, 2025

Uh oh!

thiscantbeserious Jul 21, 2025 Author

thiscantbeserious
Apr 3, 2025

Replies: 3 comments 2 replies

andbez
Apr 14, 2025

thiscantbeserious
Apr 15, 2025
Author

thiscantbeserious
Apr 25, 2025
Author

thiscantbeserious Jul 21, 2025
Author