How docling differentiates between scanned image-page and embedded image #1540

mudassir206 · 2025-05-07T06:14:35Z

mudassir206
May 7, 2025

we have been working on arabic pdf files.The docling pipeline is ok at the moment, but we are looking for some configurations which enhances text extraction from pdf.
grateful, if anyone could able to answer this.
The goal is
1.differentiate between scanned image-page and embedded imagetract
2.extract text from embedded image

mudassir206 · 2025-05-07T07:56:59Z

mudassir206
May 7, 2025
Author

@dolfim-ibm --i will be grateful if you can look into it

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

How docling differentiates between scanned image-page and embedded image #1540

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

How docling differentiates between scanned image-page and embedded image #1540

Uh oh!

mudassir206 May 7, 2025

Replies: 1 comment

Uh oh!

mudassir206 May 7, 2025 Author

mudassir206
May 7, 2025

mudassir206
May 7, 2025
Author