Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Quadratic behavior when parsing emphasis #389

Closed
nwellnhof opened this issue Jul 13, 2021 · 0 comments
Closed

Quadratic behavior when parsing emphasis #389

nwellnhof opened this issue Jul 13, 2021 · 0 comments

Comments

@nwellnhof
Copy link
Contributor

nwellnhof commented Jul 13, 2021

Parsing a *t sequence followed by a _t*_ sequence exhibits quadratic behavior:

$ for n in 5000 10000 20000; do python3 -c "print('*t '*$n+'_t*_ '*$n)" |time -f "$n elems: %e secs" build/src/cmark >/dev/null; done
5000 elems: 0.19 secs
10000 elems: 0.88 secs
20000 elems: 5.45 secs

Sample pattern

*t *t *t *t _t*_ _t*_ _t*_ _t*_

Analysis

  • After matching a * closer, the preceding _ opener is removed.
  • After parsing a _ closer, no opener is found. The corresponding openers_bottom entry is set to preceding * opener.
  • After matching the next * closer, the corresponding opener is removed. This is the entry in openers_bottom.
  • The entry in openers_bottom becomes a dangling pointer.
  • Subsequent searches for an opener for _ closers will test against the dangling pointer (which is undefined behavior).
  • Since the opener was removed, there will never be a match.
  • The openers_bottom optimization stops working, resulting in quadratic behavior with abusive input patterns.

Possible solutions

  • Adjust openers_bottom when removing delimiters (somewhat expensive).
  • Number the delimiters and store the delimiter position instead of a pointer in openers_bottom.
nwellnhof added a commit to nwellnhof/cmark that referenced this issue Jul 13, 2021
Delimiters can be deleted, so store delimiter positions instead of
pointers in `openers_bottom`. Besides causing undefined behavior when
reading a dangling pointer, this could also result in quadratic
behavior when parsing emphasis.

Fixes commonmark#389.
nwellnhof added a commit to nwellnhof/cmark that referenced this issue Jul 13, 2021
Delimiters can be deleted, so store delimiter positions instead of
pointers in `openers_bottom`. Besides causing undefined behavior when
reading a dangling pointer, this could also result in quadratic
behavior when parsing emphasis.

Fixes commonmark#389.
nwellnhof added a commit to nwellnhof/cmark that referenced this issue Jul 13, 2021
Delimiters can be deleted, so store delimiter positions instead of
pointers in `openers_bottom`. Besides causing undefined behavior when
reading a dangling pointer, this could also result in quadratic
behavior when parsing emphasis.

Fixes commonmark#389.
nwellnhof added a commit to nwellnhof/cmark that referenced this issue Jul 13, 2021
Delimiters can be deleted, so store delimiter positions instead of
pointers in `openers_bottom`. Besides causing undefined behavior when
reading a dangling pointer, this could also result in quadratic
behavior when parsing emphasis.

Fixes commonmark#389.
@jgm jgm closed this as completed in ed0a4bf Jul 16, 2021
jgm added a commit that referenced this issue Jul 16, 2021
kraj pushed a commit to YoeDistro/meta-openembedded that referenced this issue Jul 23, 2021
Changelog:
Properly indent block-level contents of list items in man (openembedded#258).
commonmark/cmark#258
This handles nested lists as well as items with multiple paragraphs.
The change requires addition of a new field block_number_in_list_item
to cmark_renderer, but this does not change the public API.

Fix quadratic behavior when parsing emphasis (openembedded#389, Nick
Wellnhofer). Delimiters can be deleted, so store delimiter positions
instead of pointers in openers_bottom. Besides causing undefined
behavior when reading a dangling pointer, this could also result
in quadratic behavior when parsing emphasis.
commonmark/cmark#389

Fix quadratic behavior when parsing smart quotes (openembedded#388, Nick Wellnhofer).
Remove matching smart quote delimiters. Otherwise, the same opener
could be found over and over, preventing the openers_bottom
optimization from kicking in and leading to quadratic behavior when
processing lots of quotes.
commonmark/cmark#388

Modify CMake configuration so that the project can be built with
older versions of CMake (openembedded#384, Saleem Abdulrasool). (In 0.30.0,
some features were used that require CMake >= 3.3.) The cost of this
backwards compatibility is that developers must now explicitly invoke
cmark_add_compile_options when a new compilation target is added.
commonmark/cmark#384

Remove a comma at the end of an enumerator list, which was flagged
by clang as a C++11 extension.

make_man_page.py: use absolute path with CDLL. This avoids the error
"file system relative paths not allowed in hardened programs."

Include cmark version in cmark(3) man page (instead of LOCAL).

Signed-off-by: Wang Mingyu <wangmy@fujitsu.com>
Signed-off-by: Khem Raj <raj.khem@gmail.com>
kraj pushed a commit to YoeDistro/meta-openembedded that referenced this issue Jul 23, 2021
Changelog:
Properly indent block-level contents of list items in man (openembedded#258).
commonmark/cmark#258
This handles nested lists as well as items with multiple paragraphs.
The change requires addition of a new field block_number_in_list_item
to cmark_renderer, but this does not change the public API.

Fix quadratic behavior when parsing emphasis (openembedded#389, Nick
Wellnhofer). Delimiters can be deleted, so store delimiter positions
instead of pointers in openers_bottom. Besides causing undefined
behavior when reading a dangling pointer, this could also result
in quadratic behavior when parsing emphasis.
commonmark/cmark#389

Fix quadratic behavior when parsing smart quotes (openembedded#388, Nick Wellnhofer).
Remove matching smart quote delimiters. Otherwise, the same opener
could be found over and over, preventing the openers_bottom
optimization from kicking in and leading to quadratic behavior when
processing lots of quotes.
commonmark/cmark#388

Modify CMake configuration so that the project can be built with
older versions of CMake (openembedded#384, Saleem Abdulrasool). (In 0.30.0,
some features were used that require CMake >= 3.3.) The cost of this
backwards compatibility is that developers must now explicitly invoke
cmark_add_compile_options when a new compilation target is added.
commonmark/cmark#384

Remove a comma at the end of an enumerator list, which was flagged
by clang as a C++11 extension.

make_man_page.py: use absolute path with CDLL. This avoids the error
"file system relative paths not allowed in hardened programs."

Include cmark version in cmark(3) man page (instead of LOCAL).

Signed-off-by: Wang Mingyu <wangmy@fujitsu.com>
Signed-off-by: Khem Raj <raj.khem@gmail.com>
halstead pushed a commit to openembedded/meta-openembedded that referenced this issue Jul 27, 2021
Changelog:
Properly indent block-level contents of list items in man (#258).
commonmark/cmark#258
This handles nested lists as well as items with multiple paragraphs.
The change requires addition of a new field block_number_in_list_item
to cmark_renderer, but this does not change the public API.

Fix quadratic behavior when parsing emphasis (#389, Nick
Wellnhofer). Delimiters can be deleted, so store delimiter positions
instead of pointers in openers_bottom. Besides causing undefined
behavior when reading a dangling pointer, this could also result
in quadratic behavior when parsing emphasis.
commonmark/cmark#389

Fix quadratic behavior when parsing smart quotes (#388, Nick Wellnhofer).
Remove matching smart quote delimiters. Otherwise, the same opener
could be found over and over, preventing the openers_bottom
optimization from kicking in and leading to quadratic behavior when
processing lots of quotes.
commonmark/cmark#388

Modify CMake configuration so that the project can be built with
older versions of CMake (#384, Saleem Abdulrasool). (In 0.30.0,
some features were used that require CMake >= 3.3.) The cost of this
backwards compatibility is that developers must now explicitly invoke
cmark_add_compile_options when a new compilation target is added.
commonmark/cmark#384

Remove a comma at the end of an enumerator list, which was flagged
by clang as a C++11 extension.

make_man_page.py: use absolute path with CDLL. This avoids the error
"file system relative paths not allowed in hardened programs."

Include cmark version in cmark(3) man page (instead of LOCAL).

Signed-off-by: Wang Mingyu <wangmy@fujitsu.com>
Signed-off-by: Khem Raj <raj.khem@gmail.com>
anticomputer pushed a commit to github/cmark-gfm that referenced this issue Jan 23, 2023
Delimiters can be deleted, so store delimiter positions instead of
pointers in `openers_bottom`. Besides causing undefined behavior when
reading a dangling pointer, this could also result in quadratic
behavior when parsing emphasis.

Fixes commonmark#389.
daregit pushed a commit to daregit/yocto-combined that referenced this issue May 22, 2024
Changelog:
Properly indent block-level contents of list items in man (#258).
commonmark/cmark#258
This handles nested lists as well as items with multiple paragraphs.
The change requires addition of a new field block_number_in_list_item
to cmark_renderer, but this does not change the public API.

Fix quadratic behavior when parsing emphasis (#389, Nick
Wellnhofer). Delimiters can be deleted, so store delimiter positions
instead of pointers in openers_bottom. Besides causing undefined
behavior when reading a dangling pointer, this could also result
in quadratic behavior when parsing emphasis.
commonmark/cmark#389

Fix quadratic behavior when parsing smart quotes (#388, Nick Wellnhofer).
Remove matching smart quote delimiters. Otherwise, the same opener
could be found over and over, preventing the openers_bottom
optimization from kicking in and leading to quadratic behavior when
processing lots of quotes.
commonmark/cmark#388

Modify CMake configuration so that the project can be built with
older versions of CMake (#384, Saleem Abdulrasool). (In 0.30.0,
some features were used that require CMake >= 3.3.) The cost of this
backwards compatibility is that developers must now explicitly invoke
cmark_add_compile_options when a new compilation target is added.
commonmark/cmark#384

Remove a comma at the end of an enumerator list, which was flagged
by clang as a C++11 extension.

make_man_page.py: use absolute path with CDLL. This avoids the error
"file system relative paths not allowed in hardened programs."

Include cmark version in cmark(3) man page (instead of LOCAL).

Signed-off-by: Wang Mingyu <wangmy@fujitsu.com>
Signed-off-by: Khem Raj <raj.khem@gmail.com>
daregit pushed a commit to daregit/yocto-combined that referenced this issue May 22, 2024
Changelog:
Properly indent block-level contents of list items in man (#258).
commonmark/cmark#258
This handles nested lists as well as items with multiple paragraphs.
The change requires addition of a new field block_number_in_list_item
to cmark_renderer, but this does not change the public API.

Fix quadratic behavior when parsing emphasis (#389, Nick
Wellnhofer). Delimiters can be deleted, so store delimiter positions
instead of pointers in openers_bottom. Besides causing undefined
behavior when reading a dangling pointer, this could also result
in quadratic behavior when parsing emphasis.
commonmark/cmark#389

Fix quadratic behavior when parsing smart quotes (#388, Nick Wellnhofer).
Remove matching smart quote delimiters. Otherwise, the same opener
could be found over and over, preventing the openers_bottom
optimization from kicking in and leading to quadratic behavior when
processing lots of quotes.
commonmark/cmark#388

Modify CMake configuration so that the project can be built with
older versions of CMake (#384, Saleem Abdulrasool). (In 0.30.0,
some features were used that require CMake >= 3.3.) The cost of this
backwards compatibility is that developers must now explicitly invoke
cmark_add_compile_options when a new compilation target is added.
commonmark/cmark#384

Remove a comma at the end of an enumerator list, which was flagged
by clang as a C++11 extension.

make_man_page.py: use absolute path with CDLL. This avoids the error
"file system relative paths not allowed in hardened programs."

Include cmark version in cmark(3) man page (instead of LOCAL).

Signed-off-by: Wang Mingyu <wangmy@fujitsu.com>
Signed-off-by: Khem Raj <raj.khem@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant