Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Correctly process footnotes when autolink is enabled #227

Closed
wants to merge 5 commits into from

Conversation

phillmv
Copy link
Member

@phillmv phillmv commented Aug 11, 2021

Superseeded by #229.


Ideally, this PR closes #121

As noted in #121, when the autolink extension and the footnotes extension are both turned on, cmark-gfm fails to correctly process footnotes whose reference label contains the character w or _ in it.

This PR resolves this by ignoring nodes that occur after [, and ^, and just copying the text between the ^ and the ] for use as our footnote reference label.

Why?

In practice, this occurs because when we parse the text for footnote references we assumed that there weren't any nodes in between the '[' character and the rest of the ^footnote] label.

I did not investigate the autolinker, but this is what I suspect was happening:

  • The parser inspects any given line of markdown character by character
  • Upon seeing the opening brace [, we cut a new text node, and insert it into the parser's AST
  • Upon seeing the closing brace ], we look back and try to match the previous characters into a link, or some markdown feature. If nothing else matches, we try to look for a footnote reference.
  • Before we reach the closing brace ], the autolinker greedily looks for _ or w and I think a few other characters while trying to guess if it's a URL. Upon finding a w, it inserts a new text node in the hopes that it might find a www.foo.com url as it keeps parsing.

Given the following input:

Hello world.[^what-a-world]

[^what-a-world]: example.

The footnote reference code expected to see the following nodes in the parser: Hello world., [, ^what-a-world]

but instead it saw: Hello world., [, ^ what-a-, world].

Because this parser is operating one character at a time, and it is being called at the close_bracket, if we've found that we're dealing with a footnote reference then it should be safe to disregard any additional nodes in the AST, because by definition they would not occur after the closing bracket.

@phillmv
Copy link
Member Author

phillmv commented Aug 23, 2021

Closing this PR in favour of #229, which incorporates a few additional bug fixes.

@phillmv phillmv closed this Aug 23, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Autolink and footnote extensions incorrectly process footnotes with certain letters
1 participant