-
Notifications
You must be signed in to change notification settings - Fork 100
Description
Took a long time to debug this, I am creating a PDF that also includes another PDF, when reading that PDF, and saving the new file, the imported PDF showed incorrect information.
Turns out that this PDF, created by a customer in Indesign, had several "EOF" notations in the file, and when viewing the PDF the correct data was shown, but when parsed by tcpdi this was read incorrectly. I do not know how the original PDF file came to have multiple EOF and how that works other than I know it was created with InDesign, but I could verify this by editing the PDF file in a text editor, removing one of the EOF "parts" and save the file, and suddenly it had the wrong data in it.
The discrepancy seems to come from how a PDF viewer reads and displays the file and how tcpdi reads and imports it. I can of course let my customers know about this, but I would also like to make sure that tcpdi behaves as expected when these problems occur. Is this a known problem?
Example files, below is a PDF file which has the price "9 500 kronor inklusive moms" in the second square:
Avtal_Nyanslutning_Hoor_Maglehill_Fiber_Privat_9_500-2.pdf
When opening this file in a reader, it displays correctly. But if I open this file in a text editor and remove one part delimited by EOF, and save it, this is how it is shown in a PDF viewer:
Avtal_Nyanslutning_Hoor_Maglehill_Fiber_Privat_9_500-2 copy.pdf
I haven't "edited" the file, only removed some part delimited by EOF.
The problem is that if I use the first file, the original file, and read it with tcpdi, it will appear as the second file, due to a discrepancy in how it handles EOF (presumably).