Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve regexp/sort-alternatives rule to add support for string alternatives and v flag #587

Merged
merged 10 commits into from
Sep 21, 2023
5 changes: 5 additions & 0 deletions .changeset/rich-ways-exercise.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
---
"eslint-plugin-regexp": minor
---

Improve `regexp/sort-alternatives` rule to add support for string alternatives and v flag
5 changes: 4 additions & 1 deletion docs/rules/sort-alternatives.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,8 @@ This rule will sort alternatives to improve readability and maintainability.

The primary target of this rule are lists of words and/or numbers. These lists are somewhat common, and sorting them makes it easy for readers to check whether a particular word or number is included.

This rule will only sort alternatives if reordering the alternatives doesn't affect the pattern.
This rule will only sort alternatives if reordering the alternatives doesn't affect the pattern.\
However, character classes containing strings are ensured to match the longest string, so they can always be sorted.

<eslint-code-block fix>

Expand All @@ -29,11 +30,13 @@ This rule will only sort alternatives if reordering the alternatives doesn't aff
/* ✓ GOOD */
var foo = /\b(1|2|3)\b/;
var foo = /\b(alpha|beta|gamma)\b/;
var foo = /[\q{blue|green|red}]/v;

/* ✗ BAD */
var foo = /\b(2|1|3)\b/;
var foo = /__(?:Foo|Bar)__/;
var foo = /\((?:TM|R|C)\)/;
var foo = /[\q{red|green|blue}]/v;
```

</eslint-code-block>
Expand Down
170 changes: 149 additions & 21 deletions lib/rules/sort-alternatives.ts
Original file line number Diff line number Diff line change
Expand Up @@ -4,9 +4,11 @@
Character,
CharacterClass,
CharacterSet,
ClassStringDisjunction,
Element,
Node,
ExpressionCharacterClass,
Pattern,
StringAlternative,
} from "@eslint-community/regexpp/ast"
import type { RegExpContext } from "../utils"
import {
Expand All @@ -30,8 +32,9 @@
hasSomeDescendant,
canReorder,
getLongestPrefix,
toCharSet,
getConsumedChars,
toUnicodeSet,
hasStrings,
} from "regexp-ast-analysis"
import type { CharSet, Word, ReadonlyWord } from "refa"
import { NFA, JS, transform } from "refa"
Expand All @@ -45,7 +48,10 @@

function getAllowedChars(flags: ReadonlyFlags) {
assertValidFlags(flags)
const cacheKey = (flags.ignoreCase ? "i" : "") + (flags.unicode ? "u" : "")
const cacheKey =
(flags.ignoreCase ? "i" : "") +
(flags.unicode ? "u" : "") +
(flags.unicodeSets ? "v" : "")
let result = cache.get(cacheKey)
if (result === undefined) {
result = {
Expand Down Expand Up @@ -86,7 +92,8 @@
d.type === "Backreference" ||
d.type === "CharacterSet" ||
(d.type === "Quantifier" && d.max === Infinity) ||
(d.type === "CharacterClass" && d.negate)
(d.type === "CharacterClass" && d.negate) ||
(d.type === "ExpressionCharacterClass" && d.negate)
)
},
(d) => d.type !== "Assertion",
Expand Down Expand Up @@ -156,29 +163,72 @@
return getLexicographicallySmallestFromCharSets(prefix)
}

function getLexicographicallySmallestFromAlternative(
alternative: Alternative,
parser: JS.Parser,
flags: ReadonlyFlags,
): Word | undefined
function getLexicographicallySmallestFromAlternative(
alternative: StringAlternative,
parser: JS.Parser,
flags: ReadonlyFlags,
): Word
/**
* If defined, this will return the lexicographically smallest string accepted
* by the given alternative (ignoring assertions).
*/
function getLexicographicallySmallestFromAlternative(
alternative: Alternative,
alternative: Alternative | StringAlternative,
parser: JS.Parser,
flags: ReadonlyFlags,
): Word | undefined {
const { elements } = alternative
if (isOnlyCharacters(elements)) {
if (
alternative.type === "StringAlternative" ||
hasOnlyCharacters(alternative, flags)
) {
// fast path to avoid converting simple alternatives into NFAs
const smallest: Word = []
for (const e of elements) {
// FIXME: TS Error
// @ts-expect-error -- FIXME
const cs = toCharSet(e, flags)
for (const e of alternative.elements) {
const cs = toUnicodeSet(e, flags).chars
if (cs.isEmpty) return undefined
smallest.push(cs.ranges[0].min)
}
return smallest
}

if (isOnlyCharacterElements(alternative.elements)) {
const reversedElements = [...alternative.elements].reverse()
const smallest: Word = []
for (const e of reversedElements) {
const us = toUnicodeSet(e, flags)
if (us.isEmpty) return undefined
if (us.accept.isEmpty) {
smallest.unshift(us.chars.ranges[0].min)
} else {
const words: ReadonlyWord[] = [
...(us.chars.isEmpty ? [] : [[us.chars.ranges[0].min]]),
...us.accept.words,

Check failure on line 210 in lib/rules/sort-alternatives.ts

View workflow job for this annotation

GitHub Actions / update

Property 'words' does not exist on type 'StringSet'.

Check failure on line 210 in lib/rules/sort-alternatives.ts

View workflow job for this annotation

GitHub Actions / test-and-coverage

Property 'words' does not exist on type 'StringSet'.

Check failure on line 210 in lib/rules/sort-alternatives.ts

View workflow job for this annotation

GitHub Actions / lint

Property 'words' does not exist on type 'StringSet'.

Check failure on line 210 in lib/rules/sort-alternatives.ts

View workflow job for this annotation

GitHub Actions / test (18.x)

Property 'words' does not exist on type 'StringSet'.

Check failure on line 210 in lib/rules/sort-alternatives.ts

View workflow job for this annotation

GitHub Actions / test (20.x)

Property 'words' does not exist on type 'StringSet'.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same issue as in #588. Fixed with #604.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added the same fix to this PR. (But there may be conflicts)

]
smallest.unshift(
...words
// Sort by connecting the following string.
// This compares `'a'+'bb'` and `'aba'+'bb'`
// if the current word set is 'a' and 'aba', and the following string is 'bb'.
// We expect `'aba'+'bb'` to become an LSA as a result.
.map((word) => ({
word,
concatenated: [...word, ...smallest],
}))
.sort((a, b) =>
compareWords(a.concatenated, b.concatenated),
)
.shift()!.word,
)
}
}
return smallest
}

try {
const result = parser.parseElement(alternative, {
assertions: "unknown",
Expand Down Expand Up @@ -212,15 +262,45 @@
}
}

/** Returns whether the given array of nodes contains only characters. */
function isOnlyCharacters(
nodes: readonly Node[],
): nodes is readonly (Character | CharacterClass | CharacterSet)[] {
/**
* Returns whether the given array of nodes contains only characters.
* But note that if the pattern has the v flag, the character class may contain strings.
*/
function isOnlyCharacterElements(
nodes: Element[],
): nodes is (
| Character
| CharacterClass
| CharacterSet
| ExpressionCharacterClass
)[] {
return nodes.every(
(e) =>
e.type === "Character" ||
e.type === "CharacterClass" ||
e.type === "CharacterSet",
e.type === "CharacterSet" ||
e.type === "ExpressionCharacterClass",
)
}

/**
* Returns whether the given alternative has contains only characters.
* The v flag in the pattern does not contains the string.
*/
function hasOnlyCharacters(
alternative: Alternative,
flags: ReadonlyFlags,
): alternative is Alternative & {
elements: readonly (
| Character
| CharacterClass
| CharacterSet
| ExpressionCharacterClass
)[]
} {
return (
isOnlyCharacterElements(alternative.elements) &&
alternative.elements.every((e) => !hasStrings(e, flags))
)
}

Expand Down Expand Up @@ -433,6 +513,28 @@
})
}

/**
* Sorts the given string alternatives.
*
* Sorting is done by comparing the lexicographically smallest strings (LSS).
*
* For more information on why we use LSS-based comparison and how it works,
* see https://github.com/ota-meshi/eslint-plugin-regexp/pull/423.
*/
function sortStringAlternatives(
alternatives: StringAlternative[],
parser: JS.Parser,
flags: ReadonlyFlags,
): void {
alternatives.sort((a, b) => {
const lssDiff = compareWords(
getLexicographicallySmallestFromAlternative(a, parser, flags),
getLexicographicallySmallestFromAlternative(b, parser, flags),
)
return lssDiff
})
}

/**
* Returns whether the given string is a valid integer.
* @param str
Expand All @@ -446,7 +548,9 @@
* This tries to sort the given alternatives by assuming that all alternatives
* are a number.
*/
function trySortNumberAlternatives(alternatives: Alternative[]): void {
function trySortNumberAlternatives(
alternatives: (Alternative | StringAlternative)[],
): void {
const runs = getRuns(alternatives, (a) => isIntegerString(a.raw))
for (const { startIndex, elements } of runs) {
elements.sort((a, b) => {
Expand Down Expand Up @@ -528,7 +632,7 @@
fixable: "code",
schema: [],
messages: {
sort: "The alternatives of this group can be sorted without affecting the regex.",
sort: "The {{alternatives}} can be sorted without affecting the regex.",
},
type: "suggestion", // "problem",
},
Expand All @@ -551,7 +655,6 @@
let chars = possibleCharsCache.get(a)
if (chars === undefined) {
chars = getConsumedChars(a, flags).chars
possibleCharsCache.set(a, chars)
}
return chars
}
Expand Down Expand Up @@ -590,14 +693,19 @@
}
}

enforceSorted(run)
enforceSorted(run, "alternatives of this group")
}

/**
* Creates a report if the sorted alternatives are different from
* the unsorted ones.
*/
function enforceSorted(run: Run<Alternative>): void {
function enforceSorted(
run: Run<Alternative | StringAlternative>,
alternatives:
| "alternatives of this group"
| "string alternatives",
): void {
const sorted = run.elements
const parent = sorted[0].parent
const unsorted = parent.alternatives.slice(
Expand All @@ -619,6 +727,7 @@
node,
loc,
messageId: "sort",
data: { alternatives },
fix: fixReplaceNode(parent, () => {
const prefix = parent.raw.slice(
0,
Expand Down Expand Up @@ -682,10 +791,29 @@
}
}

/** The handler for ClassStringDisjunction */
function onClassStringDisjunction(
parent: ClassStringDisjunction,
): void {
if (parent.alternatives.length < 2) {
return
}

const alternatives = [...parent.alternatives]
sortStringAlternatives(alternatives, parser, flags)
trySortNumberAlternatives(alternatives)
const run: Run<StringAlternative> = {
startIndex: 0,
elements: [...alternatives],
}
enforceSorted(run, "string alternatives")
}

return {
onGroupEnter: onParent,
onPatternEnter: onParent,
onCapturingGroupEnter: onParent,
onClassStringDisjunctionEnter: onClassStringDisjunction,
}
}

Expand Down
53 changes: 52 additions & 1 deletion tests/lib/rules/sort-alternatives.ts
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ import rule from "../../../lib/rules/sort-alternatives"

const tester = new RuleTester({
parserOptions: {
ecmaVersion: 2020,
ecmaVersion: "latest",
sourceType: "module",
},
})
Expand All @@ -20,6 +20,7 @@ tester.run("sort-alternatives", rule as any, {
String.raw`/\b(x?Rec|RequestOptionsPage)\b/`,
String.raw`/\b([ft]|false|true)\b/`,
String.raw`/\b(a+b|a+c)\b/`,
String.raw` /[\q{blue|green|red}]/v`,
],
invalid: [
{
Expand Down Expand Up @@ -140,5 +141,55 @@ tester.run("sort-alternatives", rule as any, {
"The alternatives of this group can be sorted without affecting the regex.",
],
},
{
code: String.raw`/[\q{red|green|blue}]/v`,
output: String.raw`/[\q{blue|green|red}]/v`,
errors: [
"The string alternatives can be sorted without affecting the regex.",
],
},
{
code: String.raw`/(?:c|[\q{red|green|blue}]|a)/v`,
output: String.raw`/(?:a|[\q{red|green|blue}]|c)/v`,
errors: [
"The alternatives of this group can be sorted without affecting the regex.",
"The string alternatives can be sorted without affecting the regex.",
],
},
{
code: String.raw`/[\q{ac|ab|aa}]/v`,
output: String.raw`/[\q{aa|ab|ac}]/v`,
errors: [
"The string alternatives can be sorted without affecting the regex.",
],
},
{
code: String.raw`/(?:b|[a-b])/v`,
output: String.raw`/(?:[a-b]|b)/v`,
errors: [
"The alternatives of this group can be sorted without affecting the regex.",
],
},
{
code: String.raw`/\b(?:a[\q{bd}]|abc)\b/v`,
output: String.raw`/\b(?:abc|a[\q{bd}])\b/v`,
errors: [
"The alternatives of this group can be sorted without affecting the regex.",
],
},
{
code: String.raw`/\b(?:abb|[\q{a|aba}]bb)\b/v`,
output: String.raw`/\b(?:[\q{a|aba}]bb|abb)\b/v`,
errors: [
"The alternatives of this group can be sorted without affecting the regex.",
],
},
{
code: String.raw`/\b(?:c|b_|[\q{b|da}]_|b_2)\b/v`,
output: String.raw`/\b(?:b_|[\q{b|da}]_|b_2|c)\b/v`,
errors: [
"The alternatives of this group can be sorted without affecting the regex.",
],
},
],
})
Loading