Correctly process footnotes when autolink is enabled #227
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Superseeded by #229.
Ideally, this PR closes #121
As noted in #121, when the autolink extension and the footnotes extension are both turned on,
cmark-gfm
fails to correctly process footnotes whose reference label contains the characterw
or_
in it.This PR resolves this by ignoring nodes that occur after
[
, and^
, and just copying the text between the^
and the]
for use as our footnote reference label.Why?
In practice, this occurs because when we parse the text for footnote references we assumed that there weren't any nodes in between the '[' character and the rest of the
^footnote]
label.I did not investigate the autolinker, but this is what I suspect was happening:
[
, we cut a new text node, and insert it into the parser's AST]
, we look back and try to match the previous characters into a link, or some markdown feature. If nothing else matches, we try to look for a footnote reference.]
, the autolinker greedily looks for_
orw
and I think a few other characters while trying to guess if it's a URL. Upon finding aw
, it inserts a new text node in the hopes that it might find awww.foo.com
url as it keeps parsing.Given the following input:
The footnote reference code expected to see the following nodes in the parser:
Hello world.
,[
,^what-a-world]
but instead it saw:
Hello world.
,[
,^
what-a-
,world]
.Because this parser is operating one character at a time, and it is being called at the
close_bracket
, if we've found that we're dealing with a footnote reference then it should be safe to disregard any additional nodes in the AST, because by definition they would not occur after the closing bracket.