Replies: 2 comments 3 replies
-
Hello @mophilly. DocumentLoader works separately from the Splitter, i made sure that was done in each one. And works great with docling, the one i would not advice is MarkitDown, because it doesnt allow splitting page out of the box, Docling yes. To understand a documentLoader, just read this: always uses one function, load, that contains an array of pages, each page with content and image (if is vision). If you want to use splitter, just take a look at: PS: sorry for the delay, im finishing another article, this one is a big boy! |
Beta Was this translation helpful? Give feedback.
-
Hello, @enoch3712. Docling was easy to implement in this project. It outputs pages, which is nice. I would like additional output options, such as json, css, and html. Docling offers support for these. How might this fit into the vision for the project? Adding output options to document_loader_docling.py, def load() seems like one place to put that. On or about line 231 is the assignment of conv_result, which later in the code provides the export_to_markdown. A branching statement could be placed there to support other formats. OTOH, raising the scope of conv_result in the class would allow for additional methods to invoke specific output types. That might result in more concise code and flexibility. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
The docling project seems like a great fit for many IDP cases. I have just arrived at the need to split large files for submission to an LLM. It appears that splitting is a fundamental element in the decling examples.
Is splitting, as expressed in ExtractThinker, a complement to docling or would using docling replace it?
Beta Was this translation helpful? Give feedback.
All reactions