Skip to content

3.4 Applies_to (languages)

Gabe Stocco edited this page Oct 17, 2023 · 8 revisions

Overview

In the JSON definition for rules, the applies_to field is used to restrict the rule processing to a white list of programming languages. This is primarily for performance reasons to avoid searching needlessly for patterns that are specific to a given programming language and not others.

For example LoadLibrary is a function in both C and C++ but not others so we would want to restrict its search to those languages and not run the rule for PHP files to save scanning time. In other cases the use of 'SHA2' or 'X509' for cryptography is likely to be found in every programming language file we scan so we can specify which languages the rule applies to or leave it blank to search in all supported langauge file types.

In terms of the data structure it is an array of strings representing the languages and would look like:

"applies_to": [
            "c",
            "cpp",
            "objective-c"
        ]

Language Support

Support for patterns specific to a given language listed below varies. Again, many patterns will be relevant across programming languages and the default Rules are defined to that end. However, many identifiable feature patterns are unique to a given language. Searching for a given feature, function or API across all supported programming languages has no real negative consequence other than performance i.e. if a pattern will only be found with high accuracy in a given language then that language should be used to filter out other file types. The goal should also be to reduce false positive matches as much as possible by exacting search patterns and languages where they may appear.

For languages where more exact Rule patterns were needed, we added them to our default Rules with emphasis on the following ones to start with:

Java, C, C++, C#, Python, JavaScript/Node, Ruby, Objective-C, Powershell, various build and solution files and others in more limited amounts. The tool will process solutions that have a mix of these listed as well. Additional support for more language specific detection patterns will continue to be added.

The language names are usually based on the langID values in Visual Studio Code, which are unfortunately not centrally defined (Visual Studio Code in turn creates its list of values from the values in language definitions - as these are independently created there is significant inconsistency in how the langIDs are represented. Below is a list of langIDs as of this writing, but as more VS Code language definitions are created this list will expand.

If writing a rule for a language not in this list, you can extend Application Inspector's language support by specifying a custom languages.json and comments.json file that instructs Application Inspector how to determine commented vs active code blocks and to identify files for that language. See Analyze Command Usage on the wiki.

This list is derived from the canonical languages.json file in the repository: languages.json. Comment styles are defined in the comments.json file.

  • c
  • cpp
  • csharp
  • fsharp
  • vb
  • python
  • html
  • javascript
  • javascriptreact
  • typescript
  • typescriptreact
  • coffeescript
  • dart
  • java
  • kotlin
  • scala
  • objective-c
  • swift
  • perl
  • perl6
  • ruby
  • lua
  • groovy
  • go
  • rust
  • jade
  • clojure
  • r
  • php
  • powershell
  • shellscript
  • wincmdscript
  • sql
  • yaml
  • package.json
  • nugetpkg
  • VSSolution
  • VSProject
  • pom.xml
  • build.xml
  • build.gradle
  • build.make.xml
  • jenkins
  • project.clj
  • sbt
  • build.py
  • typescript-config
  • json
  • terraform
  • .config
  • Package.appxmanifest
  • XML
  • plaintext

See the tags section of this wiki for more details on how language type applies to pattern matching.