Is there any way to copy and replace texts between two pdfs? #4624
Replies: 2 comments 3 replies
-
You can do this:
All of the above points 3 to five can be done using PyMuPDF. Rectangle (green, red) identification is up to you, but I assume this won't be difficult: you mentioned a "fixed layout". What will be the result of the above: If you can let us have 2 PDF examples (left / right) we could make a demo script. |
Beta Was this translation helpful? Give feedback.
-
The following is about the best we can do. Turns out that the watermarks cannot be maintained, but erased, respectively copied over too. import pymupdf
doca = pymupdf.open("A.pdf")
docb = pymupdf.open("B.pdf")
pagea = doca[0]
# find "Reprinted" rectangle on source page
r0 = pagea.search_for("Reprinted")[0]
r1 = pagea.search_for("FORM OF PAYMENT")[0]
r0.x0 = 0
r0.x1 = pagea.rect.x1
r0 |= r1.tl - 5
recta = +r0
# remove everything outside the rectangle (temporarily)
pagea.clip_to_rect(recta)
pageb = docb[0]
# find target rectangle on B.pdf
# we cannot use the same rectangle because they have different dimensions
r0 = pageb.search_for("VAT REG")[0]
r1 = pageb.search_for("FORM OF PAYMENT")[0]
bottom = r1.y0 - 5
r1.y0 = r0.y1 + 10
r1.y1 = bottom
r1.x0 = 0
r1.x1 = pageb.rect.x1
rectb = +r1
# empty target rectangle
pageb.add_redact_annot(rectb)
pageb.apply_redactions()
# copy source rectangle content
# we do not keep aspect ratio to fill the complete target rectangle
pageb.show_pdf_page(rectb, doca, 0, clip=recta, keep_proportion=False)
docb.ez_save("output.pdf") |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Hi, guys. I don't know if this can be done with PyMuPDF. I want to copy one area texts(only texts within given rectangle area) within A.pdf and replace these texts into specific area on another B.pdf, and adapt the format of text exactly as B.pdf format, and without changing watermarks in B.pdf.
In my case, the scenario is limited to receipt-type PDFs, which have a fixed format. I want extract texts within green box(in A.pdf) and replace texts within red box(in B.pdf) with those texts showing in below image. Leaving content of B.pdf appears as untouched as possible, and the watermark remains intact(in out.pdf), referring to blue box(editing manually using commercial software WSP)
Beta Was this translation helpful? Give feedback.
All reactions