hi i am trying to extract the information from the pdf but my pdf tables and columns and hierarchy tables can help me on this how to do #264
Replies: 7 comments 4 replies
-
Sorry, I do not understand at all what you mean. Please reword your post. |
Beta Was this translation helpful? Give feedback.
-
I'm trying to extract information from a PDF that contains tables, columns, and hierarchical structures. I'm having trouble preserving the layout and structure during extraction. |
Beta Was this translation helpful? Give feedback.
-
Beta Was this translation helpful? Give feedback.
-
I am a bit out of ideas how to say this in order to reach you: |
Beta Was this translation helpful? Give feedback.
-
Sorry, now that you shared your code, I realize that you are not using PyMuPDF or PyMuPDF4LLM at all, but other packages. |
Beta Was this translation helpful? Give feedback.
-
i am using PyMuPDF also import pymupdf4llm Get the MD textmd_text = pymupdf4llm.to_markdown("/content/ZKBio CVSecurity_V6.4.0_R_Datasheet_202411.pdf") # get markdown for all pages splitter = MarkdownTextSplitter(chunk_size=1200, chunk_overlap=200) splitter.create_documents([md_text]) |
Beta Was this translation helpful? Give feedback.
-
help me |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
hi i am trying to extract the information from the pdf but my pdf tables and columns and hierarchy tables can help me on this how to do
Beta Was this translation helpful? Give feedback.
All reactions