How to configure backend settings of document ingestion #644

djdameln · 2025-06-18T11:49:15Z

djdameln
Jun 18, 2025

Hi, thanks for open-sourcing this awesome project!

I have been playing around with an experimental RAG workflow, for which I want to customize the document ingestion behaviour of the DocumentSearch module. More specifically, I want to disable OCR to speed up document ingestion for faster prototyping. Effectively, this means that I need to disable OCR in the PdfPipelineOptions used by the DocumentConverter instance of the Docling backend.

However, it seems that the various sets of pipeline options used by the DocumentConverter are hardcoded in the implementation of DoclingDocumentParser. As a result, I had to create a new subclass of DocumentParser in which I customize the Docling PipelineOptions to my preferences.

Is this the correct workflow, or is there a simpler way to configure the ingestion settings to my liking? It seems a bit overkill to define a new subclass just to change a module's configuration.

(Similarly, I would like to change the chunking behaviour of my document parser, which I believe would require a similar solution).

Answered by mhordynski

Jun 26, 2025

Hi @djdameln! Sorry for the late response. I haven't properly configured notifications from GH discussions.

You are absolutely right, we should allow options to be passed to the Docling parser. I've created an issue for that, and we'll add this to the next release: #662

To answer the question:

Is this the correct workflow, or is there a simpler way to configure the ingestion settings to my liking? It seems a bit overkill to define a new subclass just to change a module's configuration.

while DocumentParser is designed to be subclassed and extended by the users - this particular case should (and will be soon) be available without it.

Thanks for contributing!

View full answer

mhordynski · 2025-06-26T17:57:17Z

mhordynski
Jun 26, 2025
Maintainer

Hi @djdameln! Sorry for the late response. I haven't properly configured notifications from GH discussions.

You are absolutely right, we should allow options to be passed to the Docling parser. I've created an issue for that, and we'll add this to the next release: #662

To answer the question:

Is this the correct workflow, or is there a simpler way to configure the ingestion settings to my liking? It seems a bit overkill to define a new subclass just to change a module's configuration.

while DocumentParser is designed to be subclassed and extended by the users - this particular case should (and will be soon) be available without it.

Thanks for contributing!

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

How to configure backend settings of document ingestion #644

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Select a reply

Uh oh!

How to configure backend settings of document ingestion #644

Uh oh!

djdameln Jun 18, 2025

Replies: 1 comment

Uh oh!

Uh oh!

mhordynski Jun 26, 2025 Maintainer

djdameln
Jun 18, 2025

mhordynski
Jun 26, 2025
Maintainer