Move to_{html,xhtml,xml,text,json} methods from Page to TextPage
#143
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This moves
to_{html,xhtml,xml,text,json}fromPagetoTextPagewhich is more inline with how the C API itself works. This allows specifying theTextPageFlagslike PyMuPDF does depending on the format requested when no explicit flags are passed. Ontop this has the added benefit of not having to construct afz_stext_pagetwice when callingto_textandto_jsonfor example (very visible in thetests/test_issues.rsfile and theexamples/).To fix #69 the code in the issue can now be adapted to:
To fully match PyMuPDF's behaviour here
TextPageFlags::PRESERVE_LIGATURES | TextPageFlags::PRESERVE_WHITESPACE | TextPageFlags::CLIP | TextPageFlags::PRESERVE_IMAGES | TextPageFlags::USE_CID_FOR_UNKNOWN_UNICODEwould need to be passed. Therefore it might be worth adding eithera) A shortcut function like (this would have the benefit of keeping the
Page::to_htmlfunction in the same place, even if with slightly changed behaviour). This would fix #69 without a code change for example.or
b) An alias like
TextPageFlags::DISPLAY = flags_from_above(which would prevent people from doing thefz_page->fz_stext_pageconversion more often than they would need to, just because they don't see it hidden inside thePage::to_htmlfunction).I'm unsure myself which one of these would be better, but that's an addition that could come in a future PR anyway.