-
Notifications
You must be signed in to change notification settings - Fork 4.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add .bas
extension to VB6 and heuristics for VB6/VBA
#6355
Conversation
Looks like with the new samples, the FreeBasic samples are now detected as VB6:
@XusinboyBekchanov, are there any good samples you'd suggest to add for FreeBasic? |
I would argue that the situation around VBA/VB6 is more akin to JavaScript. The language is the same language regardless of runtime environment (browser, Node.js) or loaded libraries and global objects. My original intention was to use VBA/VB6 as the language name, but Linguist doesn't like slashes in language names so I went with VBA. AFAICT the only argument for VB6 as a separate language is that there are some file extensions and file formats unique to VB6. |
.bas
extension to VB6 and heuristics for VB6/VBA
That strikes me as a combination of two equally-valid names for the same technology. In which case, either
Interesting analogy. 😉 Early versions of Internet Explorer actually did support VBA (or something with "Visual Basic" in its name), but it was discontinued due to lack of adoption and the growing popularity of JavaScript. |
That would be VBScript, which (I would argue) is indeed a separate language. Most notably, because -- like JavaScript -- it has no notion of compile-time types. |
|
Thanks for the feedback @zspitz! This means that unlike For this reason, I would argue that we should keep them as seperate languages. From the PR description:
|
Plasma Generation.bas uses a unique keyword for FreeBasic Shared: https://freebasic.net/wiki/KeyPgShared The try_catch_throw.bas uses the #include keyword as specified in the heuristic: https://github.com/github/linguist/blob/master/lib/linguist/heuristics.yml#L116 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had a brief flashback to high school accounting classes (my only experience with Microsoft Excel) and recall formulae being case-insensitive. A quick Google search suggests Visual Basic—or one of its confuseables1—is case-insensitive. If so, these regexes should begin with (?i)
to mark them as case-insensitive matches. I'll leave this up to you to decide; I'm out-of-my-depth when it comes to Microsoft technologies named /Visual\s*(Studio|Basic)/.
Footnotes
-
These technologies wouldn't be Microsoft if they weren't given confusing, forgettable and/or easily-rebranded names. Or in the case of Excel's Turing-complete formula functions, no programming language name whatsoever. ↩
This needs to be added for FreeBasic. |
For VBA, you also need to add: Attribute VB_Name = |
Thanks for the clarification @XusinboyBekchanov, so my undestanding is that the supplied sample What if we added this sample below instead which is longer and more in line with a good sample to illustrate the use of the "shared" keyword? |
The one provided in your example has the same #include. In this case, it is determined by this feature. |
Folks, let's try to avoid nested quote-blocks (preferably by quoting only the relevant parts of a comment). They make this discussion much harder to follow, not to mention encumbering CTRL+F searches to find a specific part of a user's original reply. |
@XusinboyBekchanov, from what I understand, the main goal of the samples are to help the classifer make a guess when the heuristics aren't conclusive. So, it's important that the samples represent FreeBasic code "in the wild" to help classify real code and not contrived examples. I'm going to change the sample |
@XusinboyBekchanov I've added this heuristic in the form of the named pattern "vb-module" as it is both used by VBA and VB6 |
@Alhadis, thanks for letting me know this is possible. While it is true that Excel formulas are case-insensitive, it is not always true for VBA. For instance, the reserved keywords and compiler constants will have a specific capitalization that the VBA IDE will enforce as soon as you exit the line in the editor. For this reason, I don't think we should worry about the case insensitivity of VBA. The only place it can play a role is with variable names (sometimes), but that shouldn't apply to the heuristics used here. |
Seriously? (An editor that does that would shit me right up the wall…)
Agreed, provided there's no substantial risk of these files being authored in other editing environments. However, @XusinboyBekchanov mentioned earlier that FreeBASIC's heuristics should be marked case-insensitive (where it makes sense to do so), although that might warrant a separate PR. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please find smaller samples. Hint: if the PR is not displaying the full file content in the diff, it's probably unnecessarily large.
I think we're also going to need another FreeBasic sample or two (depends on how things behave after you replace the large samples) as the classifier is now detecting one of the current FreeBasic samples as Visual Basic 6.0:
FreeBasic/try_catch_throw.bas BAD (Visual Basic 6.0)
@lildude, thanks for your patience on this one. At least I didn't go with a 8000+ lines sample this time, so there's progress 😅. |
Classifier cross-validation is still failing for these:
|
Adding the following sample for BASIC fixed the issue with However, the issue with FreeBasic didn't improve by adding Picture.bas as a sample 😔 |
a9e03cf
to
f348586
Compare
@lildude, I'd be interested to know if you have any more suggestions regarding this issue with samples for FreeBasic (FB) and the failing classifier tests. Just to recap : What was the original impact of the PR?:
Which samples I've tried to add:
So, after adding 2 more samples, we only have 1/4 passing the test. What should be done in this case? |
You have provided several different heuristics. They have Attribute VB_Name = "ClassOrModuleName" inside them too. |
@XusinboyBekchanov, I'm not sure I understand your comment. With the current heuristics, a file will need to match the
|
Maybe OR? |
Tweak some more heuristics Simplify Range heuristic to account for cases like Range(B & row_number) Edit heuristics
Fix issue with `\b` and make sure we match with `VBAObject`
@Alhadis, I believe that the suggestions you've made were addressed. Can you please have a look and let me know if there is something I missed or something else that could be improved? |
Hey @DecimalTurn, sorry about the wait. 👋 You've been very patient. Yup, this LGTM. @lildude, any thoughts? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This all looks good to me. Thanks for your effort and patience.
Note: this PR will not be merged until close to when the next release is made. See here for more details.
This PR:
.bas
as an extension for VB6.bas
extensionDescription
This is the 3rd instalment of a 3 part series related to VB6 (and VBA).
For more context:
Part 1: Add Visual Basic 6.0 as its own language
Part 2: Add .frm and .cls extensions to VB6
The reason why I've waited a bit to add the
.bas
extension was because it is more tricky than for the extensions in part 2. Indeed,.bas
is used for a module file and the attributes at the top of those files are identical between VBA and VB6.ie.:
Attribute VB_Name = "ClassOrModuleName"
So, we can't use it to separate the two, but we can at least use it to distinguish VBA/VB6 from other languages using the
.bas
file extension (see new heuristic with named patternvb-module
).Hence, the general approach that I'm proposing is the same idea as what was used for the
.h
extension as suggested by @lildude:From #5824 (comment)
This comment refers to part of the
troubleshooting.md
file reproduced here with the appropriate substitution for the case of.bas
:Difference between VBA and VB6
Alright, now let's get to the meat of it: what are the identifiable differences between VBA and VB6?
a. Option Private Module
b. Option Compare Database
These 3 sections appear in the
vba
named pattern at the bottom of the heuristic file. I don't think there is a need for debate on (1) and (2) as those differences between VBA and VB6 are directly based on syntax and are well documented. However, I could see some disagreement regarding section (3) as there is a bit more of a grey area there.Regarding the choices made for (3), I've included the invocation of a VBA library. This means that just having "Word" in the code is not enough to trigger the regex, you need to invoke a method/property/constant on it like (e.g.
Word.Application
). I've also included multiple common objects that are part of the DLLs VBA has natively access to.Potential issues with section 3
The issue with this is that some VB6 code can reference the DLLs in question and then be "wrongly" labelled as VBA. I'm using quotes around "wrongly" because a case could be made that if you are using VB to automate an office application, you are basically writing VBA code at this point 😅.
Anyhow, I'm open to tweaking the list of objects included in case there is too many false positives, but since this approach gives VB6 as the default outcome for vb-like modules, it seems fair to give some extra chances to VBA.
@fafalone, @XusinboyBekchanov, @Gagniuc and @zspitz, I would be interested to have your opinion on this since you've been involved in discussions regarding this topic in the past. Cheers!
Checklist:
I am adding a new extension to a language. (VB6)
Licence: MIT
License: GPL2
License: GNU Lesser General Public
License: GNU GPLv2
License: GNU GPLv2
I am fixing a misclassified language (VBA)
License: MIT