Skip to content

Expanding the size of graphics (negative margin) #292

Closed Answered by JorjMcKie
PK109 asked this question in Q&A
Discussion options

You must be logged in to vote

What you could do as an immediate help is porting the clustering results over to each page before pymupdf4llm deals with it. Like this

import pymupdf, pymupdf4llm, pathlib

doc = pymupdf.open("test.pdf")
myheaders = pymupdf4llm.IdentifyHeaders(doc)  # prevent that effort per page
md = ""
for page in doc:
    clusters = page.cluster_drawings()
    for bb in clusters:
        page.draw_rect(bb, width=0.2)  # put extra border around detected graphics
    md += pymupdf4llm.to_markdown(
        doc, pages=[page.number], hdr_info=myheaders, write_images=True
    )
pathlib.Path("test1.md").write_text(md)

Replies: 3 comments 2 replies

Comment options

You must be logged in to vote
1 reply
@PK109
Comment options

Comment options

You must be logged in to vote
0 replies
Comment options

You must be logged in to vote
1 reply
@PK109
Comment options

Answer selected by PK109
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
2 participants