Title: How to extract PDF content with page numbers using Docling? #1598

luedblra · 2025-05-16T09:06:56Z

luedblra
May 16, 2025

Hi community,

I have a question regarding PDF processing with Docling. I'm trying to convert a PDF and would like to retain the page number information for the extracted content.

Is there a way to either:

Extract content page by page, or

Have the extracted text indicate which page each section belongs to?

Any guidance or suggestions would be greatly appreciated. Thanks in advance!

https://docling-project.github.io/docling/reference/document_converter/

onpillow · 2025-05-18T23:22:29Z

onpillow
May 18, 2025

You could try converting a specific page with:

result = converter.convert(Path(pdf_path), page_range=(page_no, page_no))

In my case, page number detection isn’t obvious when converting the whole PDF at once. I use page_range to handle one page at a time, add the page number in the resulting Markdown, and then repeat the same process for the remaining pages.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Title: How to extract PDF content with page numbers using Docling? #1598

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Title: How to extract PDF content with page numbers using Docling? #1598

Uh oh!

luedblra May 16, 2025

Replies: 1 comment

Uh oh!

onpillow May 18, 2025

luedblra
May 16, 2025

onpillow
May 18, 2025