Inconsistent Markdown Output from export_to_markdown() vs Per-Page Export #1575
Unanswered
shubham-777
asked this question in
Q&A
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Issue: Markdown Export from PDF Misaligns Text and Tables, and Skips Text Items
I'm encountering issues when exporting PDF content to Markdown. Specifically:
1. Inconsistent Output Between Full Document Export and Per-Page Export
When I use:
the output includes all expected text content. However, when I export each page separately using:
some
TextItem
s are randomly skipped. Even if I export each page’s markdown to a separate file, the output differs from the full export, and some text content is missing.2. Content Order Changes – Table Position is Wrong
In the PDF, the logical content sequence is:
But the Markdown output from
export_to_markdown()
(without specifying a page) is:As you can see, the table is misplaced at the end.
3.
generate_multimodal_pages()
Produces Inconsistent or Missing OutputsWhen using
generate_multimodal_pages()
, I receive:None
for certain page numbersPipeline Configuration
Package Versions
Note: Have also tried new version 2.31.0 but didnt work
Has anyone else faced similar issues or found a workaround for:
generate_multimodal_pages()
?Any help or insights would be appreciated!
Beta Was this translation helpful? Give feedback.
All reactions