Skip to content

make conversion non-destructive to soup; improve div/article/section handling #184

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

chrispy-snps
Copy link
Collaborator

This merge request does the following:

  • Makes convert_soup() non-destructive (soup left as-is)
  • Implements block-element newline separation for <div>, <article>, <section> elements
  • Fixes div mis-converted #107

Unit tests are updated.

Regarding #107, I believe that block-element newline separation, not line continuation, is the correct behavior at <div>, <article>, and <section> elements. These elements are all block elements. The following HTML example shows that in both the <p> and <div> cases, the separation between "foo" and "bar" uses block-element separation behavior, not <br /> line-continuation behavior:

<!DOCTYPE html>
<html>
 <head>
  <title>Page Title</title>
  <style>

p, div {
 margin-top: 1em;
 margin-bottom: 1em;
 border: 1px black dotted;
 background-color: yellow;
}

  </style>
 </head>
 <body>

  <p>foo</p>
  <p>bar<br />baz</p>

  foo
  <div>bar<br />baz</div>

 </body>
</html>

…handling

Signed-off-by: chrispy <chrispy@synopsys.com>
@chrispy-snps chrispy-snps requested a review from AlexVonB February 1, 2025 23:24
@chrispy-snps
Copy link
Collaborator Author

@jsm28 - I am interested in your feedback on this pull request, if you have time.

Copy link
Collaborator

@AlexVonB AlexVonB left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Love the idea of not changing the soup object!

@jsm28
Copy link
Contributor

jsm28 commented Feb 4, 2025

I agree that paragraph separation is appropriate for these three tags.

@chrispy-snps chrispy-snps merged commit 3026602 into matthewwithanm:develop Feb 4, 2025
1 check passed
@chrispy-snps chrispy-snps deleted the chrispy/make-conversion-nondestructive branch February 17, 2025 13:44
Wuhall pushed a commit to Wuhall/python-markdownify that referenced this pull request May 21, 2025
…handling (matthewwithanm#184)

Signed-off-by: chrispy <chrispy@synopsys.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

div mis-converted
3 participants