Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New pathological parsing case (quadratic) #178

Closed
jgm opened this issue Jan 4, 2017 · 10 comments
Closed

New pathological parsing case (quadratic) #178

jgm opened this issue Jan 4, 2017 · 10 comments

Comments

@jgm
Copy link
Member

jgm commented Jan 4, 2017

Found by @raphlinus:

'a**b' + 'c* ' * n

That is,

a**bc* c* c* c* c* c* c* c* c* c* c* c* c* c* c* c* c* c* c* c* c* c* c* c* c* c* c* c* c* c* c* c* c* c* c* c* c* ...
@jgm
Copy link
Member Author

jgm commented Jan 4, 2017

Note, we don't get a similar problem with

a***bc* c* c* c* c* ...

because in this case the opening delimiter span quickly gets used up for matches. In the above case, it does not, because of the "multiple of 3" rule. So we have to keep coming back to it.

@jgm
Copy link
Member Author

jgm commented Jan 4, 2017

The problem is that when a match fails because of the "multiple of 3" rule, we can't set openers_bottom to rule out future matches to that opener.

@jgm
Copy link
Member Author

jgm commented Jan 4, 2017

One possible solution would be to make openers_bottom a 2D array, with the first dimension the delim char and the second the original delim run length % 3.

jgm added a commit that referenced this issue Jan 4, 2017
The new "multiple of 3" rule defeats one of our optimizations.
@jgm jgm closed this as completed in a8d5bd4 Jan 4, 2017
@mity
Copy link
Contributor

mity commented Jan 4, 2017

@jgm Sorry to comment something what's closed but... where is the quadratic complexity coming from?

There is only one potential opener, the initial **. All the other marks are only potential closers so they do not need to be remembered in the openers list. So the list of potential openers should always have just a single member.

@raphlinus
Copy link

@mity Yeah, my current code is also not quadratic on this, it doesn't push runs that are can_close but not can_open. But you can make my code quadratic too with a similar test case, say `'***a' * n.

I haven't looked at the patch code or played with the implementation, but is it enough to parametrize on just the match char and the length mod 3? It seems to me there are more cases, either the potential match is both can_open and can_close, or not. You'll get a different match depending on whether the closer is both.

@jgm
Copy link
Member Author

jgm commented Jan 4, 2017 via email

@mity
Copy link
Contributor

mity commented Jan 4, 2017

@raphlinus Well, I don't know internals of Cmark. I'm just interested because of my own iplementation where I do not need any complex handling beyond the algorithm described in appendix of CommonMark spec, and I just want to be sure I haven't overlooked something...

My way of thinking is as follows:

  1. Lets ignore _ and consider only * as the rule of three applies only to that.

  2. To make a quadratic complexity, you have to saturate the list of unresolved openers with arbitrary count of marks which are then never actually resolved but all subsequent potential closer marks have still to walk the list and check them.

  3. Openers not inside a word cannot saturate the list because they may be resolved on first occasion with any closer with the same character.

  4. Openers inside a word are always also a potential closers. Given the rule of three and its modulo-3 arithmetics, there cannot be more then three such unresolved marks in the list of potential openers at the same time. And that makes something like O(3 * n), not O(n^2).

Therefore I am wondering whether I miss something important or whether the fix is actually a workaround of some another issue inside Cmark which means the list is saturated with something what should not be in the list on the 1st place.

@mity
Copy link
Contributor

mity commented Jan 4, 2017

@jgm That explains it. Ignore mu latest post then, I wrote that before seeing your comment.

@jgm
Copy link
Member Author

jgm commented Jan 4, 2017 via email

@raphlinus
Copy link

@jgm I think that's enough information. When touching a closing run that is can_close only, you look at both the mod-3 and the can_open case, and take the later of the two? I was thinking of an array of 6 rather than an array of 4, where that logic was done at write time, but I think yours is better. Thanks!

jgm added a commit to commonmark/commonmark.js that referenced this issue Aug 30, 2019
taku0 added a commit to taku0/cmark-el that referenced this issue Dec 28, 2019
commonmark/commonmark.js@903f35e
Author: John MacFarlane <jgm@berkeley.edu>
Date:   Fri Aug 30 16:24:48 2019 -0700

Fix pathological case commonmark/cmark#178.

commonmark/commonmark.js@34b8f65
Author: John MacFarlane <jgm@berkeley.edu>
Date:   Fri Aug 30 16:28:09 2019 -0700
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants