Skip to content

Escape right square brackets #187

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

jsm28
Copy link
Contributor

@jsm28 jsm28 commented Feb 5, 2025

As discussed in #148, right square brackets need to be escaped for correct handling in links in some cases. This is an alternative to that PR that is based on current sources and adds the requested testcase, since there was no response to the request in that PR for a testcase to be added to the testsuite.

As discussed in matthewwithanm#148, right square brackets need to be escaped for
correct handling in links in some cases.  This is an alternative to
that PR that is based on current sources and adds the requested
testcase, since there was no response to the request in that PR for a
testcase to be added to the testsuite.
@chrispy-snps
Copy link
Collaborator

@jsm28 - thinking out loud - what do you think of escaping ] only inside convert_a() instead of everywhere?

@jsm28
Copy link
Contributor Author

jsm28 commented Feb 8, 2025

Literally in convert_a seems like the wrong level to attempt that, since convert_a receives already-converted text for its contents (and if the ] is inside a code span - <a href="..."><code>]</code></a> in the source HTML - then escaping would be wrong). Passing down information from process_text about whether there is a parent <a> might be a possibility.

@chrispy-snps
Copy link
Collaborator

@jsm28 - you're right, thanks for pointing that out.

We are increasingly finding cases where functions need to make decisions based on enclosing elements, but find_parent() is computationally expensive (especially when called within process_tag() and process_text() for every element in the document).

I think we need a generalized and efficient context-propagation mechanism similar to convert_as_inline, but passed as a set or dict. Maybe it could be just the set of ancestor tag names themselves, or maybe it could be property-based (like 'block or inline or table).

We could test for find_parent('a') in process_text() for optimal escaping behavior, then migrate it (and all the other ancestry tests) to this yet-to-be-developed context propagation mechanism in the future. The unit tests from this issue could be used to ensure the behavior remains unchanged.

@chrispy-snps
Copy link
Collaborator

@jsm28 - I opened #191 to capture the idea for downward context propagation. This fix could be adapted in that approach as follows:

if '_noformat' not in parent_tags:
    text = self.escape(text)

    if 'a' in parent_tags:
        # also escape ']'
        pass

@chrispy-snps
Copy link
Collaborator

@jsm28 - would you be willing to convert your fix to be conditional on being within an <a> element using find_parent() (until the context propagation stuff lands, if it does)?

@chrispy-snps
Copy link
Collaborator

@jsm28 - I merged develop into your branch, then opened this pull request for your pull request branch:

#1: only escape closing square brackets inside tags

Let me know what you think. My preference is to avoid unnecessary escaping, but if you can think of a scenario that breaks this proposal, then we shouldn't do it.

@chrispy-snps chrispy-snps merged commit c7329ac into matthewwithanm:develop Feb 19, 2025
1 check passed
@chrispy-snps
Copy link
Collaborator

@jsm28 - upon reflection, I think your original proposed change to always escape closing brackets was more straightforward. This is also what Pandoc does:

$ echo '<p>[test]</p>' | pandoc --from html --to markdown
\[test\]

I reverted the code back to your original proposal and merged it. Thanks for your contribution!

Wuhall pushed a commit to Wuhall/python-markdownify that referenced this pull request May 21, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants