-
Notifications
You must be signed in to change notification settings - Fork 273
make link copying more tolerant when adding page #1103
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
in #1082 and other issues relating to annotations we're running into constraints of the current model of building a pdf document. currently we skip all link type annotations, i think we can support copying of links where the link destination is outside the current document. however the more i look at this code the more i think we need a radical redesign of how document building is done because it has been pushed far beyond its current capabilities, i'll detail my thinking in the related pr in more detail
Proposed redesign notes
This results in a large number of issues: #635 #1082 #1000 #704 #720 #1035 #947 #936 #912 #878 #1083 etc. This is obviously unsustainable and prevents the library evolving to support other requested features like editing acro forms etc in future. I believe we should change how we think about adding existing documents to a builder as well as try to remove, as far as possible, the copy from method. Track added documentsEach
ChallengesThe main challenges here are the rewriting of indirect references on addition and final save. This needs to be approached carefully since every reference needs to be updated recursively. We also need to somehow do this at the last possible moment since if you have the following code where pages are added with interleaving:
You don't want to apply remapping of indirect refs in How do we handle resource inheritance in the page tree? Ideally we defer this to the last possible moment, i.e. the written document does not support resource inheritance, each leaf page defines its own resource dictionary. Also consider the name dictionary which can be used to point to pages. If the pointer is to a page that never gets added that will cause further problems. We need some way to track which pages are added then go back and prune them if they are missing. But this is also the case for e.g. annotations and forms. AI notesLLMs are very useful when it comes to spec stuff due to my inability to keep all 1,310 pages of the spec in my head. In terms of dictionary merge resolution Ched GPT highlighted the following areas to consider: Conflict Resolution Strategies by DictionaryDocument Information Dictionary (trailer /Info)
Catalog Dictionary (/Catalog)
Name Dictionary (/Names)
Resource Dictionary (/Resources)
Outlines (/Outlines)
AcroForm (/AcroForm)
Threads, Metadata, MarkInfo, ViewerPreferences
|
@BobLd interested on your thoughts on the proposed redesign please since you've had a couple of years dealing with these issues too. Any thoughts on things to look out for? |
@EliotJones I agree with your proposed plan, and make a lot of sense to aim for 0.1.12 for the full refactoring. I've had limited exposure on the Creating/Editing part of the PdfPig API so far, so unfortunately I have limited opinion here |
in #1082 and other issues relating to annotations we're running into constraints of the current model of building a pdf document. currently we skip all link type annotations, i think we can support copying of links where the link destination is outside the current document. however the more i look at this code the more i think we need a radical redesign of how document building is done because it has been pushed far beyond its current capabilities, i'll detail my thinking in the related pr in more detail