Quadratic behavior when parsing smart quotes

Reproduce with:

    python3 -c 'print("\x27x " * 30000 + "\x27\"x" * 30000)' |\
        time build/src/cmark --smart >/dev/null

After unquoting, this input looks like

    'x 'x 'x 'x 'x '"x'"x'"x'"x'"x

This also works with emphasis:

    python3 -c 'print("\"x " * 30000 + "\"_x" * 30000)' |\
        time build/src/cmark --smart >/dev/null

When matching a pair of smart quotes, we currently don't remove delimiters (emphasis or other smart quotes) between the quotes. This can lead to quadratic behavior when processing deeply nested quotes interspersed with emphasis or different quotes.

A simple way to fix this issue is to delete delimiters between smart quotes but this changes the parsing rules. Another, more complicated approach is to handle single quotes, double quotes and emphasis in separate lists.

This is the only remaining issue with quadratic complexity in the parser that I could find with a specialized fuzzer similar to the "quadratic" fuzzers in cmark-gfm. Note that there are still some issues in the serializers.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Quadratic behavior when parsing smart quotes #521

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Quadratic behavior when parsing smart quotes #521

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions