Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

language-definitions AutoHotkey: Modified block comment matching #1703

Closed
wants to merge 5 commits into from

Conversation

nnnik
Copy link

@nnnik nnnik commented Jan 13, 2019

The block comment matching has been fixed.
I also made the behavior as close as possible to, how AutoHotkey actually parses block comments as of 1.1.30.01.
The parser looks for a line starting with /* (excluding whitespace).
Then starting from the next line it will look for */ to end the comment.
The closing tag seems to be entirely optional and the entire remaining file will become a comment block if no ending tag is found.
Any code directly after a starting sequence is ignored until the parser encounters a new line.

Modified comment matching
Block comments now need to start on their own line
Copy link
Member

@RunDevelopment RunDevelopment left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for contributing!

However before we can merge this, we will have to make a few changes.

The comment pattern has two parts:

  1. (^[^";\n]*("[^"\n]*?"[^"\n]*?)*)(?:;.*$) to match single-line comments and
  2. ^\s*\/\*[^\n]*(?:\n(?:[^*]|\*[^\/])*(?:\*\/)?)? to match multi-line comments.

You only changed the second part but we might as well take this chance to improve the first one as well, so let's start with that:

The single-line comment pattern can be simplified to ;.*. The $ at the end is redundant (see .) and the lookbehind group (^[^";\n]*("[^"\n]*?"[^"\n]*?)*) is to prevent matching ;s inside strings.
I removed the lookbehind group because Prism has a better way to deal with that: greedy matching. So making string greedy will have the same effect and makes the pattern simpler.

Now to multi-line comments:

First of all, there are 3 possible end of line sequences: \n, \r, and \r\n. So the pattern doesn't support \r line ends.

Second, the requirement for */ to close the comment is to be at the start of a line (plus spaces). This means that the following is a comment:

/*
 * I don't end here */ but there
 */

Knowing that, the pattern should be (^\s*)\/\*[\s\S]*?^\s*\*\/.
(Leading and trailing spaces should not be part of any token, so I use the lookbehind group to remove them.)

This doesn't yet include comments which aren't closed by */ and end at the end of file, so let's add them:
(^\s*)\/\*(?:[\s\S]*?^\s*\*\|[\s\S]*)/.

Please verify that this works and please extend the comment test. I haven't thoroughly tested it.

Also, are non-space characters allowed on the same line after the closing */? E.g.

/*
 */ where do I end?
 */

@nnnik
Copy link
Author

nnnik commented Jan 13, 2019

However before we can merge this, we will have to make a few changes.

The comment pattern has two parts:

(^[^";\n]("[^"\n]?"[^"\n]?))(?:;.$) to match single-line comments and
^\s
/*[^\n](?:\n(?:[^*]|*[^\/])(?:*/)?)? to match multi-line comments.
You only changed the second part but we might as well take this chance to improve the first one as well, so let's start with that:

The single-line comment pattern can be simplified to ;.. The $ at the end is redundant (see .) and the lookbehind group (^[^";\n]("[^"\n]?"[^"\n]?)*) is to prevent matching ;s inside strings.
I removed the lookbehind group because Prism has a better way to deal with that: greedy matching. So making string greedy will have the same effect and makes the pattern simpler.

I dont know about the single line comments for now. I haven't looked into them extensively.
Especially since I haven't checked how extensive the string detection is - especially with the 2 syntaxes around and continuation sections etc. - so I'm going to leave them as it stands and might work on them at another time.

Now to multi-line comments:

First of all, there are 3 possible end of line sequences: \n, \r, and \r\n. So the pattern doesn't support \r line ends.

I kind of assumed that the parser would handle the newlines accordingly. AutoHotkey can deal with all types of Unicode newlines - sadly thats not a feature javascript can handle apparently.
For now I guess it would be fine if we only cover all \r and \n options.

Second, the requirement for */ to close the comment is to be at the start of a line (plus spaces). This means that the following is a comment:

Thats incorrect - the latest AHK Version use exactly the behavior I implemented.
I didn't read the source code though - I only ran several tests with a few AHK scripts.
Namely:

/**/
Msgbox this is still a comment
And will go on without closing.
Msgbox /*
Msgbox This is not a comment */
/* This is all part of the
Comment */Msgbox this is code
Msgbox the following is recognized as a comment:
/*

So this would be correct:

/*
 * I do end here */ Msgbox not there

(Leading and trailing spaces should not be part of any token, so I use the lookbehind group to remove them.)

Oh I will fix that

I will also add more unit tests to verify the new features.

Split comment-blocks from single-line comments
Removed leading whitespace from match
Added unit tests for new features
@nnnik
Copy link
Author

nnnik commented Jan 13, 2019

I added multiple regexes to the comment matching in order to split of multi-line comments from single-line comments.
I have one issue with the unit tests though. For multiline testing I added a few new tests which involve new-line characters. Locally those were encoded as \r\n and I had to match that in the string. If somebody changes those line-endings locally (for reasons I cannot understand) the unit tests will actually fail.
Which way would you suggest around this? Or is it fine and I'm just worrying for nothing?

@nnnik
Copy link
Author

nnnik commented Jan 13, 2019

So I went ahead and looked into single line comments and it seems that the only requirements that AHK has regarding single line comments are:

  • ; must follow a whitespace character or new-line
  • continuation sections disregard ; for comments unless set in their options

Which actually means that strings will be torn apart by ; if it follows a whitespace
See:

FileDelete,  test.txt
FileAppend, "This is a comment: ; inside the comment"
, test.txt
FileAppend, "`nThis is not a comment:; still just text"`n
, test.txt
;FileAppend,  % "`n This is a comment and would cause an error if it wasn't commented out (commentception?) ; inside the comment"
;, test.txt
FileAppend, 
(
;This is not a comment
Its just text and more text
/*
No comments here
*/
), test.txt

Will result in the creation of test.txt with the content:

"This is a comment:"
This is not a comment:; still just text"
;This is not a comment
Its just text and more text
/*
No comments here
*/

It will be hard to detect whether "" is currently inside an expression or not so highlighting it all the time doesn't seem to be wrong. Since ; interupts it not highlighting it then seems to be the safest option.
E.g.:

Msgbox This is text
Msgbox "This is still text" this too "and also this"
Msgbox % "This is text" thisIsAVariable
SoundBeep "13", thisIsAVariable ;since both parameters can be expressions

Single line comments can be reduced to:

/(^|\s);.*/m

using this logic.

@nnnik
Copy link
Author

nnnik commented Jan 13, 2019

I greatly simplified single-line comment matching and made the behavior equivalent to autohotkeys own behavior. Strings don't take precedence over comments in my browser but I will probably modify the regex for strings in order to cover that case.
Sadly greedy cannot be used since continuation sections exist.

@nnnik
Copy link
Author

nnnik commented Jan 15, 2019

This one is a minor fix that fixes if( and while(

Copy link
Member

@RunDevelopment RunDevelopment left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the delay.

},
'comment': [
{
pattern: /(^|\s);.*/m,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't need the m flag.
\s will also match \n and \r. Do keep ^ though.

lookbehind: true
},
{
pattern: /(^\s*)\/\*[^\n\r]*(?:(\n|\r|\r\n)(?:[^*]|\*[^\/])*(?:\*\/)?)?/m,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about this:

/(^\s*)\/\*.*$(?:[\s\S]*?\*\/|[\s\S]*)/m

Like this, we can just let JS handle all different kinds of line breaks.

lookbehind: true
},
{
pattern: /(^\s*)\/\*[^\n\r]*(?:(\n|\r|\r\n)(?:[^*]|\*[^\/])*(?:\*\/)?)?/m,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This has to be greedy because of:

/* ; what now?
*/

@joedf
Copy link
Contributor

joedf commented Apr 22, 2019

Has this been integrated into v1.16?

@RunDevelopment
Copy link
Member

There are some comments that haven't been addressed yet which is why this PR is still open.
The soonest Prism version these changes will be available in will most likely be v1.17.0.

@joedf
Copy link
Contributor

joedf commented Apr 23, 2019

There are some comments that haven't been addressed yet which is why this PR is still open.
The soonest Prism version these changes will be available in will most likely be v1.17.0.

Okay cool, thanks! 👍

@RunDevelopment RunDevelopment added this to the 1.17.0 milestone Jul 17, 2019
@RunDevelopment
Copy link
Member

@nnnik Is this still being worked on or shall I take over?

@mAAdhaTTah
Copy link
Member

@RunDevelopment Gonna drop this from 1.17.0, if we want to pub today.

@mAAdhaTTah mAAdhaTTah removed this from the 1.17.0 milestone Jul 21, 2019
@RunDevelopment
Copy link
Member

@mAAdhaTTah Agreed.

@joedf
Copy link
Contributor

joedf commented May 25, 2020

What's wrong with this PR? anything missing?
We are currently using a customized version of prismjs v1.15 in the meantime...
(over at https://www.autohotkey.com/boards/)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants