Support Doc Comments and General Cleanup of Scanner #195

ProfDoof · 2023-09-27T20:31:22Z

Hi all!

I was looking at some previous PRs for doc comments like #168 and thought I would take a crack at it addressing some of the issues raised by @the-mikedavis in the other PR to fix #147. I scope creeped myself while I was doing it though and also separated each of the processing steps in the scanner.c file out into their own functions to make it easier to read the file. I was struggling quite a bit to read it myself. I also went through and did some cleanup of functions based on what I read in the documentation for tree-sitter.

If I need to split this into 2 separate PRs, let me know, and I will address that as able.

Quick question though, is there a good way for me to benchmark my changes against the original solution? I want to make sure that the extra processing cost we would be incurring is worth it.

Changes Made

Grammar

Added an error sentinel value to the externals array. See the comments in the scanner.c for explanation.
Changed from handling the entirety of block comment parsing in the external scanner to just the inner block/outer block signifier scanning and the block comment ending scanning
Split the line comment into 3 cases.
1. The line starts with 4 or more slashes which makes what looks like a doc comment actually a regular comment
2. The line starts with 2 slashes and either one more slash or at least 1 bang which makes the comment a regular comment
3. The line starts with only 2 slashes and everything after that is content
Added hidden rules for line doc comment, line inner doc comment, and line outer doc comment to be used in aliases when scanning a line comment
Added hidden rules for block doc comment, and external scanner handling for block inner and outer doc comment detection.
Aliased the line doc comment and block doc comment to a generic doc comment node
Aliased the inner line and block comment nodes to a generic inner doc comment node
Aliased the outer line and block comment nodes to a generic outer doc comment node

Scanner

Switched everywhere where the code was using lexer->lookahead == 0 to instead use eof(lexer) which calls the eof function on the lexer. This is recommended in the tree-sitter documentation I read.
Added a skip function similar to the advance function and switched parts of the code from manually calling advance on the lexer to using either the advance function or the skip function.
Added a markEnd function for the same reason as above.
Converted the block comment processing from a flag boolean to using an enum based state machine. It made it easier for me to read and catch edge cases.
In the block comment processing, I now first check whether there's a signifier of one of the doc comment possibilities and if there is I return the correct one. If there isn't, then I just pass it to the block comment processing part of the process function and completely parse the block comment and then return it with the end marked correctly.
In block comment processing, previously, the behavior if an EOF was found was to return false. I changed that to return true. This is because for the purposes of tree sitting, if we found an EOF that means we are dealing with an unterminated block comment and the user likely wants that highlighted. If we were the rust parser obviously we would probably choose a different behavior but for our purposes here that behavior seems more reasonable to me. However, I'm more than willing to revert that change. I will need to just change a line or two.
Started using markEnd during the block comment processing and moved the block comment processing to the beginning of the scanner function to ensure that we don't skip over whitespace by accident.
Added inlines to the lexer caller functions. Not sure that's actually worth it but it's there now. I can remove it if desired.

Test

Added tests for all the cases that I saw in the previous PRs. If there's any that I missed that you think I need to double check, feel free to include them and I will attempt to correct them at my earliest convenience.

ProfDoof · 2023-09-29T14:20:19Z

I have fixed the lint problems locally but now I'm hunting down the parser fuzzing error.

ProfDoof · 2023-09-29T20:35:10Z

The fuzzing error seen in the failure on here was caused by the shift in the install process for tree-sitter-cli from installing the api.h and parser.h files to only installing the api.h file. This means that you need to use the relative path to import the parser.h created by tree-sitter generate instead. That needs to be fixed in the vim repo as well it looks like.

However, there was also a recent update to the tree-sitter-fuzz-action which will likely cause an error as well. I will go make an issue on there and link it here.

ProfDoof · 2023-09-29T23:01:59Z

I looked deeper and make install is the one that no longer installs parser.h but cargo and npm still do. So, it's not an issue right now but may become one later.

ProfDoof · 2023-10-04T01:37:55Z

Hey, just wanted to check what the expected timeline on looking at this is.

maplant · 2023-10-23T14:51:36Z

Would love to have this added. Not parsing doc comments is very strange

ProfDoof · 2023-12-08T16:52:33Z

Hi! I wanted to check back in on this as it's been sitting in my backlog for a couple of month now. @amaanq is this something that you would need to look at?

TrueDoctor · 2024-01-04T19:29:56Z

For me this seems to only highlight the first two characters of the doc comment, #168 does not have the same problem

ProfDoof · 2024-01-05T19:46:10Z

For me this seems to only highlight the first two characters of the doc comment, #168 does not have the same problem

Would you mind posting the code you were using to test it? I am willing to fix it, I just need to know what it is. @TrueDoctor

TrueDoctor · 2024-01-11T10:45:20Z

Would you mind posting the code you were using to test it? I am willing to fix it, I just need to know what it is. @TrueDoctor

I hooked it up to the helix editor by changing the url of the rust grammer to point to your pr and then adding
(line_doc_comment) @comment.block.documentation
to the file runtime/queries/rust/highlights.scm and then starting the editor (sometimes you have to manually ask helix to recompile the grammer by doing hx --grammar fetch and hx --grammar build)

ProfDoof · 2024-01-13T01:26:22Z

@TrueDoctor

Oh, I see the issue I think. line_doc_comment isn't a valid selection for the grammar I crafted. Instead, you want to add

(line_comment doc: (_)) @comment.block.documentation
(line_comment) @comment.line
(block_comment doc: (_)) @comment.block.documentation
(block_comment) @comment.block

or

(line_comment (doc_comment)) @comment.block.documentation
(line_comment) @comment.line
(block_comment (doc_comment)) @comment.block.documentation
(block_comment) @comment.block

Or something like that. I'm not an expert on crafting queries yet. Once I have time again (don't do a PhD, you'll never have your own life) I can try to drum up some better versions.

Oh, if you want to differentiate between inner and outer doc comments you can as well. You would do that like this I think:

(line_comment (doc_comment outer: (_))) @comment.block.documentation
(line_comment (doc_comment inner: (_))) @comment.block.documentation
(line_comment) @comment.line
(block_comment (doc_comment outer: (_))) @comment.block.documentation
(block_comment (doc_comment inner: (_))) @comment.block.documentation
(block_comment) @comment.block

or

(line_comment (doc_comment (outer_doc_comment))) @comment.block.documentation.outer
(line_comment (doc_comment (inner_doc_comment))) @comment.block.documentation.inner
(line_comment) @comment.line
(block_comment (doc_comment (outer_doc_comment))) @comment.block.documentation.outer
(block_comment (doc_comment (inner_doc_comment))) @comment.block.documentation.inner
(block_comment) @comment.block

TrueDoctor · 2024-01-14T15:24:48Z

Oh, sorry I think made a small error, I think I actually used the (doc_comment) @comment.block.documentation and just copy pasted the line from my current config without checking. Otherwise the editor would complain about missing nodes iirc

ProfDoof · 2024-01-16T02:04:11Z

@TrueDoctor

The doc_comment does not contain the contents of the node IIRC. Instead, doc_comment contains just the characters needed to determine if I'm dealing with an inner or outer comment (theoretically, it's been a while since I wrote this code). The rest of the contents are contained within the line_comment or block_comment. There might be a different way to consider it though. I'm welcome to suggestions.

amaanq · 2024-02-16T09:27:38Z

I'm sorry for taking so long to get to this, I had a bit of a hiatus November-January and then was just busy afterwards and forgot about this

This looks really good FYI, so thanks a ton @ProfDoof! There's a few things I changed around but otherwise I think we can get this in, most notably:

rename block comment end to '*/', the external scanner can recognize and parse literal tokens just fine, and it's better for those looking to leverage the start/end of a comments in their queries
mark all non-exported functions static to avoid potential collisions when being compiled with other parsers/c libraries
hide the second '//' if it's a regular comment via a regex
move generated file changes to their own commit

Might've forgotten something, but that's the meat of it. Thanks!

clason · 2024-02-16T09:32:37Z

Would be helpful to add queries for this (makes it easier for downstream to include).

amaanq · 2024-02-16T09:34:07Z

not much really needed tbh, there's still queries for comments but this enables querying for doc comments (~~which is not in the list of upstream's standard captures~~ nvm it is) to be done much more easily

There's @comment.documentation highlight queries now

TrueDoctor · 2024-02-16T09:42:09Z

🥳

aryx requested review from amaanq and maxbrunsfeld September 28, 2023 07:22

amaanq force-pushed the master branch from a411a29 to 99c6922 Compare November 17, 2023 11:38

clason mentioned this pull request Feb 15, 2024

Add parsing of doc comments #168

Closed

amaanq force-pushed the support_doc_comments branch 2 times, most recently from 1a77348 to 5390f15 Compare February 16, 2024 09:18

amaanq force-pushed the support_doc_comments branch from 5390f15 to 7dcab5e Compare February 16, 2024 09:27

ProfDoof and others added 2 commits February 16, 2024 04:36

feat: support line and block doc comments

962f802

chore: generate

efddc8e

amaanq force-pushed the support_doc_comments branch from 7dcab5e to efddc8e Compare February 16, 2024 09:36

amaanq merged commit 6b7ccc9 into tree-sitter:master Feb 16, 2024
5 checks passed

This was referenced Feb 16, 2024

Doc strings parser #38

Closed

Differentiate // from /// #175

Closed

the-mikedavis mentioned this pull request Feb 17, 2024

Expose doc comment contents #212

Merged

DavisVaughan mentioned this pull request Sep 16, 2024

Highlight ROxygen tags r-lib/tree-sitter-r#68

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support Doc Comments and General Cleanup of Scanner #195

Support Doc Comments and General Cleanup of Scanner #195

ProfDoof commented Sep 27, 2023

ProfDoof commented Sep 29, 2023

ProfDoof commented Sep 29, 2023

ProfDoof commented Sep 29, 2023

ProfDoof commented Oct 4, 2023

maplant commented Oct 23, 2023

ProfDoof commented Dec 8, 2023

TrueDoctor commented Jan 4, 2024

ProfDoof commented Jan 5, 2024

TrueDoctor commented Jan 11, 2024

ProfDoof commented Jan 13, 2024 •

edited

Loading

TrueDoctor commented Jan 14, 2024

ProfDoof commented Jan 16, 2024

amaanq commented Feb 16, 2024 •

edited

Loading

clason commented Feb 16, 2024

amaanq commented Feb 16, 2024 •

edited

Loading

TrueDoctor commented Feb 16, 2024

Support Doc Comments and General Cleanup of Scanner #195

Support Doc Comments and General Cleanup of Scanner #195

Conversation

ProfDoof commented Sep 27, 2023

Changes Made

Grammar

Scanner

Test

ProfDoof commented Sep 29, 2023

ProfDoof commented Sep 29, 2023

ProfDoof commented Sep 29, 2023

ProfDoof commented Oct 4, 2023

maplant commented Oct 23, 2023

ProfDoof commented Dec 8, 2023

TrueDoctor commented Jan 4, 2024

ProfDoof commented Jan 5, 2024

TrueDoctor commented Jan 11, 2024

ProfDoof commented Jan 13, 2024 • edited Loading

TrueDoctor commented Jan 14, 2024

ProfDoof commented Jan 16, 2024

amaanq commented Feb 16, 2024 • edited Loading

clason commented Feb 16, 2024

amaanq commented Feb 16, 2024 • edited Loading

TrueDoctor commented Feb 16, 2024

ProfDoof commented Jan 13, 2024 •

edited

Loading

amaanq commented Feb 16, 2024 •

edited

Loading

amaanq commented Feb 16, 2024 •

edited

Loading