Skip to content

propagate parent tag context downward to improve runtime #191

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

chrispy-snps
Copy link
Collaborator

@chrispy-snps chrispy-snps commented Feb 9, 2025

Fixes #190.

Improves runtime by propagating the parent tag context downward into children. Python set() objects are used because membership tests are fast and deduplication is automatic.

In addition, the following "pseudo-parent" parent tag names are propagated:

  • _inline - parent include a heading, td, or th tag
  • _noformat - parents include a pre, code, kbd, or samp tag

For one of my large HTML testcases that has <div>, <article>, and <section> hierarchy, runtime improves from 502 seconds to 261 seconds.

This pull request changes the interface for convert_*() functions by changing the convert_as_inline Boolean parameter to a parent_tags set parameter, so this change should probably be made in a major version number change.

@chrispy-snps chrispy-snps force-pushed the chrispy/propagate-contexts-downward branch from a5b09f1 to b48c482 Compare February 10, 2025 12:46
@chrispy-snps chrispy-snps changed the title propagate a set of parent tag names downward to improve runtime propagate parent tag context downward to improve runtime Feb 10, 2025
@chrispy-snps chrispy-snps force-pushed the chrispy/propagate-contexts-downward branch from b48c482 to f9efec0 Compare February 10, 2025 13:07
Signed-off-by: chrispy <chrispy@synopsys.com>
@chrispy-snps chrispy-snps force-pushed the chrispy/propagate-contexts-downward branch from f9efec0 to 9848a04 Compare February 10, 2025 13:08
Copy link
Collaborator

@AlexVonB AlexVonB left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work! Refactoring the ['pre', 'code', 'kbd', 'samp'] list into _noformat makes sense. Is this the change that speeds up #162? Lets go!

@AlexVonB
Copy link
Collaborator

Ah, one last thing: the readme examples for the custom converters still have the convert_as_inline parameter

@chrispy-snps chrispy-snps force-pushed the chrispy/propagate-contexts-downward branch from caf0ec9 to 02a2bca Compare February 17, 2025 14:10
Signed-off-by: Chris Papademetrious <chrispy@synopsys.com>
@chrispy-snps
Copy link
Collaborator Author

@AlexVonB - good catch on the README.rst file, thank you! This pull request improves runtime for hierarchically deep HTML structures (as opposed to #186, which improves runtime for broad flat HTML structures).

@AlexVonB , @AlextheYounga - this branch now includes #186. I did some local correctness and performance testing, but I would appreciate any additional testing you can provide so we can merge this in confidence.

@chrispy-snps chrispy-snps merged commit 5655f27 into matthewwithanm:develop Feb 18, 2025
1 check passed
@chrispy-snps
Copy link
Collaborator Author

@AlexVonB, @AlextheYounga - I had one more look through the code and felt pretty good about it, so I merged it.

Wuhall pushed a commit to Wuhall/python-markdownify that referenced this pull request May 21, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Improve runtime for parent element context checking
2 participants