Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Files/IncludingNonPHPFile: various improvements #586

Merged
merged 15 commits into from
Sep 24, 2020

Conversation

jrfnl
Copy link
Collaborator

@jrfnl jrfnl commented Sep 24, 2020

This PR is the result of a full review of the WordPressVIPMinimum.Files.IncludingNonPHPFile sniff.

It will be easiest to understand the different changes and their impact by reviewing this PR on the individual commits.

Fixes #515

Commit Details

Files/IncludingNonPHPFile: minor efficiency fix [1]

  • Move the extensions against which checks are being run to private properties.
  • Removes the . from the part in the regex which is being remembered.
    Note: The . before the extension is still required, it is just no longer remembered.
  • Changes the extension checks from using the relatively expensive in_array() to the lightweight isset().

Files/IncludingNonPHPFile: minor efficiency fix [2]

The regex will work just the same without the .*.

Files/IncludingNonPHPFile: improve the existing unit tests

  • The existing unit tests only tested against require_once and include, while the sniff looks for all variations.
    The newly added unit tests make sure that all four variations have at least one positive/negative test associated with it.
  • Add some capitalization variations to the include|require[_once] keywords.

Files/IncludingNonPHPFile: improve the error message

  • Make sure the include|require[_once] keyword is always referred to in lower case in the error message.
  • Make the error message more informative by displaying the text snippet which triggered the error.

Files/IncludingNonPHPFile: bug fix [1] - case sensitivity of file extension

The regex used to retrieve an extension, would retrieve it case-insensitively, but the previous in_array() check (now isset()), checked the extension in a case-sensitive manner.

Whether the file extension is used in upper, lower or mixed case, does not matter to the include as long as a matching file is found, so the sniff should check the extension in a case insensitive manner.

As things were, a file name like file.PHP would trigger a false positive for the IncludingNonPHPFile error and a file called file.Css would incorrectly throw the IncludingNonPHPFile error instead of the IncludingSVGCSSFile error which it should have thrown.

Fixed now.

Tested by adjusting some of the existing unit tests.

Files/IncludingNonPHPFile: bug fix [2] - allow for "phar" file extension

PHAR (PHP Archive) files should be regarded as PHP files for the purposes of this sniff.

Includes unit tests.

Fixes #478

Files/IncludingNonPHPFile: bug fix [3] - extensions in interpolated strings

The sniff looks for Tokens::$stringTokens when examining the statement.

The Tokens::$stringTokens array contains two tokens:

  1. T_CONSTANT_ENCAPSED_STRING - these are either single or double quoted text strings without variables within them.
  2. T_DOUBLE_QUOTED_STRING - these are double quoted text strings with interpolated variables in the text string.

As things were, the quotes would be stripped off the first, but not off the contents of T_DOUBLE_QUOTED_STRING-type tokens.

As the regex matches extensions at the very end of the text string and doesn't allow for a double quote at the end, this effectively meant that text in double quoted strings would never be recognized as containing a non-PHP extension. (= false negative)

Includes unit tests.

Files/IncludingNonPHPFile: bug fix [4] - fix bleed through /end of statement

An include/require statement can be used within template files to include another template.
In that case, the statement can be ended by a PHP close tag without there ever being a semi-colon.

Even though the findNext() function call indicated that it only wants to look "locally", the findNext() method itself does not take a PHP close tag into account when determining whether the statement has ended. This could be considered a bug upstream, but that's another matter.

In practice, this meant that when an include|require[_once] statement ended on a PHP close tag, the findNext() would continue looking for text string tokens and could report an error on a text string which is unrelated to the include|require[_once] statement. (= false positive).

Determining the end of the statement properly before calling findNext() fixes this, however, the findEndOfStatement() method will return the last non-whitespace token in a statement, which for statements nested in brackets can be a token which is part of the statement.

As the findNext() method does not look at the $end token, we need to +1 the result of findEndOfStatement() to make sure the whole statement is considered.

Includes unit tests.

Files/IncludingNonPHPFile: efficiency fix [3]

To prevent an assignment in condition, the same function call was executed twice.

By changing the code around slightly and using a do - while loop instead of a while, the duplicate function call can be removed.

Files/IncludingNonPHPFile: efficiency fix [4]/bug fix [5] - extension at end of statement

An include|require[_once] statement can only include one (1) file and the file extension of that file will normally be at the end of the statement.

So far, the sniff would walk from the start of the statement to the end, while walking from the end of the statement to the start, will get us a result more quickly and more accurately.

Includes unit tests. The first would be a false negative, the second a false positive without this fix.

Files/IncludingNonPHPFile: efficiency fix [5]/bug fix [6] - one error per statement

An include|require[_once] statement can only include one (1) file, so there should only ever be one file extension in the statement and by extension, only ever be one error for any particular include|require[_once] statement.

Once a file extension has been found which generates an error, we can therefore stop sniffing that statement.
This is similar to the sniff breaking off checking the statement once a text string with .php as a file extension is found, as was already done.

Includes unit tests.

Files/IncludingNonPHPFile: add some additional unit tests

... to document the behaviour of the sniff for certain situations.

Files/IncludingNonPHPFile: remove unused use statement

Files/IncludingNonPHPFile: minor simplification

preg_match() will return false for an invalid regex, 0 for "no match" and 1 for a match as it only looks for one match - in contrast to preg_match_all()-.

As the regex uses an "end of text string" anchor, using preg_match() is the right function, but that also means we can simplify the return value check a little.

Files/IncludingNonPHPFile: fix class docblocks

The existing text was unrelated to the actual sniff. Probably legacy copy/paste.

* Move the extensions against which checks are being run to private properties.
* Removes the `.` from the part in the regex which is being remembered.
    Note: The `.` before the extension is still required, it is just no longer _remembered_.
* Changes the extension checks from using the relatively expensive `in_array()` to the lightweight `isset()`.
The regex will work just the same without the `.*`.
* The existing unit tests only tested against `require_once` and `include`, while the sniff looks for all variations.
    The newly added unit tests make sure that all four variations have at least one positive/negative test associated with it.
* Add some capitalization variations to the `include|require[_once]` keywords.
* Make sure the `include|require[_once]` keyword is always referred to in lower case in the error message.
* Make the error message more informative by displaying the text snippet which triggered the error.
…ension

The regex used to retrieve an extension, would retrieve it case-insensitively, but the previous `in_array()` check (now `isset()`), checked the extension in a case-sensitive manner.

Whether the file extension is used in upper, lower or mixed case, does not matter to the include as long as a matching file is found, so the sniff should check the extension in a case insensitive manner.

As things were, a file name like `file.PHP` would trigger a false positive for the `IncludingNonPHPFile` error and a file called `file.Css` would incorrectly throw the `IncludingNonPHPFile` error instead of the `IncludingSVGCSSFile` error which it should have thrown.

Fixed now.

Tested by adjusting some of the existing unit tests.
PHAR (PHP Archive) files should be regarded as PHP files for the purposes of this sniff.

Includes unit tests.

Fixes 478
…trings

The sniff looks for `Tokens::$stringTokens` when examining the statement.

The `Tokens::$stringTokens` array contains two tokens:
1. `T_CONSTANT_ENCAPSED_STRING` - these are either single or double quoted text strings without variables within them.
2. `T_DOUBLE_QUOTED_STRING` - these are double quoted text strings with interpolated variables in the text string.

As things were, the quotes would be stripped off the first, but not off the contents of `T_DOUBLE_QUOTED_STRING`-type tokens.

As the regex matches extensions at the very end of the text string and doesn't allow for a double quote at the end, this effectively meant that text in double quoted strings would never be recognized as containing a non-PHP extension. (false negative)

Includes unit tests.
…atement

An `include`/`require` statement can be used within template files to include another template.
In that case, the statement can be ended by a PHP close tag without there ever being a semi-colon.

Even though the `findNext()` function call indicated that it only wants to look "locally", the `findNext()` method itself does not take a PHP close tag into account when determining whether the statement has ended. This could be considered a bug upstream, but that's another matter.

In practice, this meant that when an `include|require[_once]` statement ended on a PHP close tag, the `findNext()` would continue looking for text string tokens and could report an error on a text string which is unrelated to the `include|require[_once]` statement. (false positive).

Determining the end of the statement properly before calling `findNext()` fixes this, however, the `findEndOfStatement()` method will return the last non-whitespace token in a statement, which for statements nested in brackets can be a token which is _part_ of the statement.

As the `findNext()` method does not look at the `$end` token, we need to `+1` the result of `findEndOfStatement()` to make sure the whole statement is considered.

Includes unit tests.
To prevent an assignment in condition, the same function call was executed twice.

By changing the code around slightly and using a `do - while` loop instead of a `while`, the duplicate function call can be removed.
… at end of statement

An `include|require[_once]` statement can only include one (1) file and the file extension of that file will normally be at the _end_ of the statement.

So far, the sniff would walk from the start of the statement to the end, while walking from the end of the statement to the start, will get us a result more quickly and more accurately.

Includes unit tests. The first would be a false negative, the second a false positive without this fix.
… per statement

An `include|require[_once]` statement can only include one (1) file, so there should only ever be one file extension in the statement and by extension, only ever be one error for any particular `include|require[_once]` statement.

Once a file extension has been found which generates an error, we can therefore stop sniffing that statement.
This is similar to the sniff breaking off checking the statement once a text string with `.php` as a file extension is found, as was already done.

Includes unit tests.
... to document the behaviour of the sniff for certain situations.
`preg_match()` will return `false` for an invalid regex, `0` for "no match" and `1` for a match as it only looks for one match - in contrast to `preg_match_all()`-.

As the regex uses an "end of text string" anchor, using `preg_match()` is the right function, but that also means we can simplify the return value check a little.
The existing text was unrelated to the actual sniff. Probably legacy copy/paste.
Copy link
Contributor

@GaryJones GaryJones left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Simply beautiful ❤️

@GaryJones GaryJones merged commit 2b057a0 into develop Sep 24, 2020
@GaryJones GaryJones deleted the feature/includingnonphpfile-various-improvements branch September 24, 2020 17:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants