-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Extend cpp keyword set #3178
Extend cpp keyword set #3178
Conversation
We can do that when done in a separate commit at the end so it doesn't complicate the review. We also should go ahead and explode |
Would be nice to have this in v11. :) Probably getting close to release. |
Ha, I'm working on it as we speak ;-) Still figuring out some of the JS stuff going on in the language file. I will push a new version later this evening (evening for me, that is). |
Hey! I've cleaned up some parts according to Josh's comments above. However, three tests fail for some reasons not completely clear to me. The first one is an autodetection error from a C++ comment test that is detected as 'sqf' now, and another one where Arduino C++ is detected as C++ instead of Arduino. This latter one probably also triggers the last test to fail. Furthermore I've got a few remarks/questions:
At least I checked all the examples and have added a few TODOs for this language parsing file. I might be able to pick these up in a couple of weeks (I'm too busy now). It feels like this language parsing file may need a proper update/refresh anyway. |
I can look into these, probably Arduino needs to be updated based on other changes you made here.
We'd need to expand the regex to match the
Before the list was mixed (having more than functions) so it needed to be here so the non-functions could still be found and counted (since FUNCTION_DISPATCH wouldn't cover them). If it now only contains functions then it no longer needs to be in
That's because the grammar has too many nested modes... and keywords don't inherit, so if you want the keywords to be present in every sub-mode then you have to keep specifying them over and over. Flattening the grammar to avoid all these nested modes would help with this.
Why does it need to be beautified? I mean you could put the list in an array and then built it with the
Perhaps, though I thought I remembered @klmr saying something about not highlighting these, or am I thinking of something else?
This is sufficient. The list is only for heuristics - all functions are highlighted either way. Someone using the library for serious work should be tagging languages in any case, NOT relying on auto-detect which is not always reliable. This looks like it's in a fairly good spot if I can fix the test issues. We'll see about getting this merged then we can discuss what's next for another PR. |
It looks like the keywords list still includes some types... |
Mmm. Yes, you're right. Let me fix that. I initially went for "this is the list of reserved keywords in C++" but it contains type keywords as well of course. In the context of a highlighting tool it doesn't matter if a keyword is reserved or not of course. |
This is fixed now.
This can be fixed in later versions I would say.
Yeah, that's exactly what I meant. I left it in for now, this could be picked up later on.
Hmmm. OK, I can look up this conversation for a later version. For now they are in the functions list (which is not entirely correct).
OK. I think the highlighting is working quite well now. I've been evaluating the test examples and didn't see any weirdness. It turns out quite a bit of the things didn't work before (many of the keyword functions). Mostly because of the lack of matching for In a next PR I would like to dig in a little deeper to simplify the grammar. It really feels like some parts of it are dead, wrong or not up-to-date with the latest C++ standard. |
Oh, and by the way, the Arduino 'default' markup test fails again. In the developer markup tester it is auto-detected as JavaScript. I think it's not detected as Arduino code in the tests as well. |
Because Arduino is ALSO cpp and uses that same grammar... so you need to update those markup tests as well (keywords are now Please don't rewrite history here but just add commits so I can more easily review.
No matter, markup tests have nothing to do with auto-detection.
I don't see a problem... run |
I see, yes. I now fixed it using the results from the test output of Node, because the 'Arduino' language highlighting is not in the developer markup checker.
Uhm, I'm not sure what I did wrong then -- this was not my intention. I pulled in your commits with
OK. |
It's whatever you choose to build... |
No worries could just be me reading it wrong... a lot of the more git-savvy contributors do that and often it makes reviews harder. |
} | ||
|
||
<span class="hljs-comment">// Disable overload for already valid operands.</span> | ||
<span class="hljs-keyword">template</span><<span class="hljs-keyword">class</span> <span class="hljs-title class_">T</span>, class = std::<span class="hljs-keyword">enable_if_t</span><!impl::is_streamable_v<<span class="hljs-keyword">const</span> T &> && std::is_convertible_v<<span class="hljs-keyword">const</span> T &, std::wstring_view>>> | ||
<span class="hljs-keyword">template</span><<span class="hljs-keyword">class</span> <span class="hljs-title class_">T</span>, <span class="hljs-type">class</span> = std::<span class="hljs-type">enable_if_t</span><!impl::is_streamable_v<<span class="hljs-keyword">const</span> T &> && std::is_convertible_v<<span class="hljs-keyword">const</span> T &, std::<span class="hljs-type">wstring_view</span>>>> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't believe class is a type here? We should probably remove it from the list of types and (in a follow-up PR) add it instead with a class [IDENT]
multi-matcher rule as done in many other languages - allowing us to also flag the class name as a title.class
, etc...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also pretty sure class
is not a type but a keyword... you use class to specify classes, it's not uses (as a type) to declare a variable class a
... you'd say NameOfClass a
... so in that usage:
class Booger {
}
It would be a keyword.
For our intents and purposes it's also a keyword here:
template <class T> // template argument
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah you're right. I'm still trying to get used to the internal conflicts between the language lawyer and the highlight mindset. I'll fix this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm going to move auto
, class
, struct
, union
and enum
, as they're all somewhere in the twilight zone of being a keyword/storage specifier and a type specifier. I don't have a strong opinion about this regarding the highlighting class, but language-strictly spoken they're probably more type than keyword. But at least they're considered 'special' for the highlighter -- this is the most important fact.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And indeed in templates class
serves a double role as a typename specifier. With some regular expression magic this could be made a special case, but it's fine like it is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm going to move auto, class, struct, union and enum,
I'd say broadly these are all keywords (from our highlighting POV and historically how we treat them across many grammars). We don't currently differentiate between "type of storage" vs "type of type"... I think the context of their usage is relevant as well... lets take a made up class-based language (very TypeScript like):
(this is me just thinking about this seat of my pants)
// define a class
class Person { }
// a new variable of class Person
var bob : class<Person> = Person.new(...)
The first class
is a keyword
. You could argue the second class
is a bit closer to type
(and that could be worth discussing), but for convenience we'd probably just call it a keyword, rather than try to contextually parse the context. Person
of course would be title.class
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't believe class is a type here? We should probably remove it from the list of types and (in a follow-up PR) add it instead with a
class [IDENT]
multi-matcher rule as done in many other languages - allowing us to also flag the class name as atitle.class
, etc...
Could you point me at a language definition file that uses these multi-matcher rules, or at least a language definition that is a very good example of how a language definition should be done? I'm going to try to study this stuff for the next PR. Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Look at Wren.
Very very close. :) I think just fix the class issue and this is good to go. |
@klmr Would you mind quickly reviewing the list of keywords/types/hints just as a sanity check incase there are other issues I should flag but I'm not because I don't know C++? |
Ah, yes I see what you mean. I was not aware of the merge strategy that you have. For some reason I assumed you don't like merge commits. Will you squash all the commits from this PR at merge-time? |
Well, typically the scope of the PRs is small enough that squash works well. It's not set in stone. If someone has a very clean commit history and I see real value in each individual commit then I'll rebase/merge it. 95% of the time we squash.
I don't like messy commit histories.
Most likely. About rebasing though: What is hard about rebasing is it's hard to review the changes on GitHub because they keep changing... when I'm working with someone on a PR and they push a tiny change/fix I want to see JUST that change (often it might be 1-2 lines)... that way I can confirm it matches what I was expecting. Often when someone instead rebases you lose track of "what just changed" because they smartly merged it back into their prior commits. Rebasing can be great (as the very last step)... and along the way add |
I would probably have given you "auto" as a type (I flagged it before in my head but didn't deal it worth mentioning), but I don't have a strong argument either way. :-) |
@krisvanrens You should consider joining our Discord if you want to dig into C++ further in the future. I'd be happy to explain what I mean by a "flatter" syntax. I don't know if C++ is "wrong/outdated" now so much as "overly-complex". Certainly possible it's also both. :) The |
I will definitely join the Discord channel for info. I've got a very busy month ahead, but after that there's time for this. Thanks for your patience anyway :-) My guess is that if I take on the C++ grammar, the C-one should be doable as well. |
Still needs changelog entries. :) |
As Josh alluded to I strongly disagree with this. I’ve explained the reasons for this here: #3093 (comment) — tl;dr: standard library types and functions are not reserved words, and highlighting them in builtins leads to numerous false positives. They also don’t have any special semantics compared to user-defined types and functions, so giving them special highlighting even where correct serves little purpose in making code more readable; on the contrary, it creates a distraction by emphasising an irrelevant distinction. Compare this with IDEs and source code editors: the good ones don’t highlight these identifiers either (or rather, they highlight them the same as user-defined types and functions). In fact, previous revisions of the C++ grammar underwent a rigorous cleanup to get rid of arbitrary identifiers that get highlighted incorrectly and I strongly advocate keeping it that way. |
I'm wrapping this up and merging now that we seem at a nice spot... This PR is already too large. Any remaining work can fall into a second PR. |
OK. I had a change ready for the remark I made to Konrad about the list of additional types. I can put it into a future PR. |
What is a "trait helper template", can someone show me an example? |
Traits in C++ are types which describe properties of other types. They’re used for compile-time metaprogramming. Trait helpers are convenience wrappers for trait classes ending in There are other trait helpers which describe types rather than values (e.g. |
Is the variety such that we couldn't handle this with a smarter regex? IE, exclude |
Perhaps we could detect the template arguments and the |
That we shouldn't overly concern ourselves with them, perhaps they never come up at all. I worry we're being too pre-emptive here. That can often result in bloated grammars with questionable benefits. I'm not yet convinced we need ANY further work here on I'd prefer to wait until someone raises an actual concern vs adding another list of keywords... there is a cost to all these explicit lists, they increase the file size of our bundle, etc... |
I’m not sure I understand what that should accomplish: users are free to define their own names ending in The clash between POSIX and C++ naming conventions is unfortunate but, ultimately, not something that can be solved (nor IMHO a real issue). |
These situations are different though... in that the problem with standard library is we can't (or don't want to) include a large enough list... so we will always fail to highlight SOME standard library things making the highlighting appear inconsistent... false negatives. Resolving that doesn't add any complexity to the grammar. This |
Mmm. Yes. I'm not sure what the signal "no one has filed an issue" in the light of type traits in C++ really means. It's an advanced feature not everyone dabbles with (from what I can tell looking around me), multiplied with the fact that there's only a few people filing issues (again, my feeling) 😏 I would very much agree to keep the grammar as simple as possible. However I'm still divided as to why or why not highlight other conventions if the Regarding the Is there a difference in how important the highlighting of 'types' vs. 'functions' vs. 'constants' etc. is considered for Highlight.js? |
That’s one problem. The other being that highlighting standard library names is not useful and creates many false positives when these names aren’t reserved and are routinely overridden in user code. Regarding the If you don’t want to include all names from /u?int(_fast|_least)?(8|16|32|64)_t|u?int(max|ptr)_t|size_t|ptrdiff_t|nullptr_t/ That captures all types defined in |
I think we're conflating things a bit here. Our goal is not to only highlight built-ins or reserved types. If we know If we wish to draw a distinction between built-in type aliases and user-defined ones we should consider doing that with classification - not be just ignoring the user-defined ones. The false positive argument of trait helpers does have some merit... could we exclude those with just a negative look-ahead for |
Oh I agree, that would be awesome. The issue is that this introduces (substantial, not just occasional) inconsistencies since most user-defined types (/type aliases) are would not be highlighted for C++, since there is simply no way, based on syntax alone, to construct a heuristic with (IMHO) acceptable error rate to recognise type names in C++. There is just no sufficiently widespread convention. Even code that uses |
Yes, that should be possible. There might be cases where this fails but these would be very rare and wouldn’t be catastrophic. |
Well, now you've moved me right back to the middle again. LOL. If |
The typename postfix |
I'd be ok with this in this use case, but generally I try and stay away from this for lack of maintainability... but this is clear enough. We'd probably want |
Definitely, the regex wasn’t meant to be production ready since I wasn’t sure in which context it would be applied. In general we would want word boundaries, correct. |
In my opinion this lack of maintainability, or rather: static definition of additional types is also a signal to future maintainers: "there's no room for an extensive list of other types here, we're being conservative about it." I would have worked for me 😉 |
I read it more as:
Though as I said this one ain't so bad. If we want to signal future maintainers though I'd prefer we write a readable note in the comments. :-) |
Added a set of commonly used C++ keywords/identifiers. We can't add them all of course, but things like
optional
andvariant
which are common vocabulary types since 2017 should be in there in my opinion.I would also like to order the lists in the language file lexicographically, making extension easier (without having to search). Is this OK? I will update this PR if desired.
Changes
Checklist /
TODO
CHANGES.md
.