Skip to content

Commit

Permalink
Fix
Browse files Browse the repository at this point in the history
  • Loading branch information
Josh-Cena committed Jan 17, 2023
1 parent 127f5de commit 220e2e7
Show file tree
Hide file tree
Showing 13 changed files with 30 additions and 31 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -67,7 +67,7 @@ The following regex syntaxes are deprecated and only available in non-[unicode](
- [Lookahead assertions](/en-US/docs/Web/JavaScript/Reference/Regular_expressions/Lookahead_assertion) are [quantifiable](/en-US/docs/Web/JavaScript/Reference/Regular_expressions/Quantifier).
- [Backreferences](/en-US/docs/Web/JavaScript/Reference/Regular_expressions/Backreference) that do not refer to an existing capturing group become [legacy octal escapes](#escape_sequences).
- In [character classes](/en-US/docs/Web/JavaScript/Reference/Regular_expressions/Character_class), character ranges where one boundary is a character class makes the `-` become a literal character.
- An escape sequence that's not one of the recognized kinds becomes an ["identity escape"](/en-US/docs/Web/JavaScript/Reference/Regular_expressions/Escape_character).
- An escape sequence that's not one of the recognized kinds becomes an ["identity escape"](/en-US/docs/Web/JavaScript/Reference/Regular_expressions/Character_escape).
- Escape sequences within [character classes](/en-US/docs/Web/JavaScript/Reference/Regular_expressions/Character_class) of the form `\cX` where `X` is a number or `_` are decoded in the same way as those with ASCII letters: `\c0` is the same as `\cP` when taken modulo 32. In addition, if the form `\cX` is encountered anywhere where `X` is not one of the recognized characters, then the backslash is treated as a literal character.
- The sequence `\k` within a regex that doesn't have any [named capturing groups](/en-US/docs/Web/JavaScript/Reference/Regular_expressions/Named_capturing_group) is treated as an identity escape.
- The syntax characters `]`, `{`, and `}` may appear literally without escaping if they cannot be interpreted as the end of a character class or quantifier delimiters.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -35,7 +35,7 @@ The backreference must refer to an existent capturing group. If the number it sp
/(a)\2/u; // SyntaxError: Invalid regular expression: Invalid escape
```

In non-[unicode](/en-US/docs/Web/JavaScript/Reference/Global_Objects/RegExp/unicode) mode, invalid backreferences become a [legacy octal escape](/en-US/docs/Web/JavaScript/Reference/Deprecated_and_obsolete_features#escape_sequences) sequence. This is a [deprecated syntax for web compatibility](/en-US/docs/Web/JavaScript/Reference/Deprecated_and_obsolete_features#regexp) and you should not rely on it.
In non-[unicode](/en-US/docs/Web/JavaScript/Reference/Global_Objects/RegExp/unicode) mode, invalid backreferences become a [legacy octal escape](/en-US/docs/Web/JavaScript/Reference/Deprecated_and_obsolete_features#escape_sequences) sequence. This is a [deprecated syntax for web compatibility](/en-US/docs/Web/JavaScript/Reference/Deprecated_and_obsolete_features#regexp), and you should not rely on it.

```js
/(a)\2/.test("a\x02"); // true
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -49,7 +49,7 @@ Capturing groups can be used in [lookahead](/en-US/docs/Web/JavaScript/Reference
```js
/c(?=(ab))/.exec("cab"); // ['', 'ab']
/(?<=(a)(b))c/.exec("abc"); // ['', 'a', 'b']
/(?<=([ab])+)c/.exec("abc"); // ['', 'a']; because "a" is seen by the lookbehind after it's seen "b"
/(?<=([ab])+)c/.exec("abc"); // ['', 'a']; because "a" is seen by the lookbehind after the lookbehind has seen "b"
```

Capturing groups can be nested, in which case the outer group is numbered first, then the inner group, because they are ordered by their opening parentheses. If a nested group is repeated by a quantifier, then each time the group matches, the subgroups' results are all overwritten, sometimes with `undefined`.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@ A **character class** matches any character in or not in a custom set of charact

## Description

A character class specifies a list of characters between square bracket and matches any character in the list. The following syntaxes are available:
A character class specifies a list of characters between square brackets and matches any character in the list. The following syntaxes are available:

- A single character: matches the character itself.
- A range of characters: matches any character in the inclusive range. The range is specified by two characters separated by a dash (`-`). The first character must be smaller in character value than the second character.
Expand All @@ -35,15 +35,15 @@ Unlike other parts of the regex, character classes interpret most character lite
- The `]` character indicates the end of the character class. To use it literally, escape it as `\]`.
- The dash (`-`) character, when used between two characters, indicates a range. When it appears at the start or end of a character class, it is a literal character. It's also a literal character when it's used in the boundary of a range. For example, `[a-]` matches the characters `a` and `-`, `[!--]` matches the characters `!` to `-`, and `[--9]` matches the characters `-` to `9`. You can also escape it as `\-` if you want to use it literally anywhere.

The [lexical grammar](/en-US/docs/Web/JavaScript/Reference/Lexical_grammar#regular_expression_literals) does a very rough parse of regex literals, so that it does not end the regex literal at a `/` character that appears within a character class. This means `/[/]/` is valid without needing to escape the `/`.
The [lexical grammar](/en-US/docs/Web/JavaScript/Reference/Lexical_grammar#regular_expression_literals) does a very rough parse of regex literals, so that it does not end the regex literal at a `/` character which appears within a character class. This means `/[/]/` is valid without needing to escape the `/`.

The boundaries of a character range must not specify more than one character, which happens if you use a [character class escape](/en-US/docs/Web/JavaScript/Reference/Regular_expressions/Character_class_escape). For example,
The boundaries of a character range must not specify more than one character, which happens if you use a [character class escape](/en-US/docs/Web/JavaScript/Reference/Regular_expressions/Character_class_escape). For example:

```js
/[\s-9]/u; // SyntaxError: Invalid regular expression: Invalid character class
```

In non-[unicode](/en-US/docs/Web/JavaScript/Reference/Global_Objects/RegExp/unicode) mode, character ranges where one boundary is a character class makes the `-` become a literal character. This is a [deprecated syntax for web compatibility](/en-US/docs/Web/JavaScript/Reference/Deprecated_and_obsolete_features#regexp) and you should not rely on it.
In non-[unicode](/en-US/docs/Web/JavaScript/Reference/Global_Objects/RegExp/unicode) mode, character ranges where one boundary is a character class makes the `-` become a literal character. This is a [deprecated syntax for web compatibility](/en-US/docs/Web/JavaScript/Reference/Deprecated_and_obsolete_features#regexp), and you should not rely on it.

```js
/[\s-9]/.test("-"); // true
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -22,15 +22,15 @@ Unlike [character escapes](/en-US/docs/Web/JavaScript/Reference/Regular_expressi
- `\d`
- : Matches any digit character. Equivalent to `[0-9]`.
- `\w`
- : Matches any word character.
- : Matches any word character, where a word character includes letters (A–Z, a–z), numbers (0–9), and underscore (_). If the [`u`](/en-US/docs/Web/JavaScript/Reference/Global_Objects/RegExp/unicode) and [`i`](/en-US/docs/Web/JavaScript/Reference/Global_Objects/RegExp/ignoreCase) flags are both set, it also matches other Unicode characters that get canonicalized to one of the characters above through [case folding](https://unicode.org/Public/UCD/latest/ucd/CaseFolding.txt).
- `\s`
- : Matches any [whitespace](/en-US/docs/Web/JavaScript/Reference/Lexical_grammar#white_space) or [line terminator](/en-US/docs/Web/JavaScript/Reference/Lexical_grammar#line_terminators) character.

The uppercase forms `\D`, `\W`, and `\S` negates the match or `\d`, `\w`, and `\s`, respectively. They match any character that is not in the set of characters matched by the lowercase form.

[Unicode character class escapes](/en-US/docs/Web/JavaScript/Reference/Regular_expressions/Unicode_character_class_escape) start with `\p` and `\P`, but they are only supported in [unicode mode](/en-US/docs/Web/JavaScript/Reference/Global_Objects/RegExp/unicode). In non-unicode mode, they are [identity escapes](/en-US/docs/Web/JavaScript/Reference/Regular_expressions/Character_escape) for the `p` or `P` character.

Character class escapes can be used in [character classes](/en-US/docs/Web/JavaScript/Reference/Regular_expressions/Character_class). However, using them as boundaries of character ranges is a [deprecated syntax for web compatibility](/en-US/docs/Web/JavaScript/Reference/Deprecated_and_obsolete_features#regexp) and you should not rely on it.
Character class escapes can be used in [character classes](/en-US/docs/Web/JavaScript/Reference/Regular_expressions/Character_class). However, they cannot be used as boundaries of character ranges. This is only allowed as a [deprecated syntax for web compatibility](/en-US/docs/Web/JavaScript/Reference/Deprecated_and_obsolete_features#regexp), and you should not rely on it.

## See also

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -27,10 +27,10 @@ A **character escape** represents a character that may not be able to be conveni

## Description

The following escape characters are recognized in regular expressions:
The following character escapes are recognized in regular expressions:

- `\f`, `\n`, `\r`, `\t`, `\v`
- : Same as those in [string literals](/en-US/docs/Web/JavaScript/Reference/Global_Objects/String#escape_sequences), except `\b` represents a [word boundary](/en-US/docs/Web/JavaScript/Reference/Regular_expressions/Word_boundary_assertion) in regexes unless in a [character class](/en-US/docs/Web/JavaScript/Reference/Regular_expressions/Character_class).
- : Same as those in [string literals](/en-US/docs/Web/JavaScript/Reference/Global_Objects/String#escape_sequences), except `\b`, which represents a [word boundary](/en-US/docs/Web/JavaScript/Reference/Regular_expressions/Word_boundary_assertion) in regexes unless in a [character class](/en-US/docs/Web/JavaScript/Reference/Regular_expressions/Character_class).
- `\c` followed by a letter from `A` to `Z` or `a` to `z`
- : Represents the control character with value equal to the letter's character value modulo 32. For example, `\cJ` represents line break (`\n`), because the code point of `J` is 74, and 74 modulo 32 is 10, which is the code point of line break. Because an uppercase letter and its lowercase form differ by 32, `\cJ` and `\cj` are equivalent. You can represent control characters from 1 to 26 in this form.
- `\0`
Expand All @@ -46,15 +46,16 @@ The following escape characters are recognized in regular expressions:

In non-[unicode](/en-US/docs/Web/JavaScript/Reference/Global_Objects/RegExp/unicode) mode, escape sequences that are not one of the above become _identity escapes_: they represent the character that follows the backslash. For example, `\a` represents the character `a`. This behavior limits the ability to introduce new escape sequences without causing backward compatibility issues, and is therefore forbidden in unicode mode.

In non-unicode mode, `]`, `{`, and `}` may appear literally if it's not possible to parse them as the end of a character class, or quantifier delimiters. This is a [deprecated syntax for web compatibility](/en-US/docs/Web/JavaScript/Reference/Deprecated_and_obsolete_features#regexp) and you should not rely on it.
In non-unicode mode, `]`, `{`, and `}` may appear literally if it's not possible to parse them as the end of a character class or quantifier delimiters. This is a [deprecated syntax for web compatibility](/en-US/docs/Web/JavaScript/Reference/Deprecated_and_obsolete_features#regexp), and you should not rely on it.

In non-unicode mode, escape sequences within [character classes](/en-US/docs/Web/JavaScript/Reference/Regular_expressions/Character_class) of the form `\cX` where `X` is a number or `_` are decoded in the same way as those with ASCII letters: `\c0` is the same as `\cP` when taken modulo 32. In addition, if the form `\cX` is encountered anywhere where `X` is not one of the recognized characters, then the backslash is treated as a literal character. These syntaxes are also deprecated.

```js
/\c/.test("\\c"); // true
/[\c0]/.test("\x10"); // true
/[\c_]/.test("\x1f"); // true
/\c0/.test("\\c0"); // true
/[\c*]/.test("\\"); // true
/\c/.test("\\c"); // true
/\c0/.test("\\c0"); // true (the \c0 syntax is only supported in character classes)
```

## Examples
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ slug: Web/JavaScript/Reference/Regular_expressions/Lookahead_assertion

{{JsSidebar}}

A **lookahead assertion** "looks ahead": it attempts to match the subsequent input with the given pattern, but it does not consume any of the input — if the match is successful, the current position in the input is not advanced.
A **lookahead assertion** "looks ahead": it attempts to match the subsequent input with the given pattern, but it does not consume any of the input — if the match is successful, the current position in the input stays the same.

## Syntax

Expand All @@ -23,7 +23,7 @@ A **lookahead assertion** "looks ahead": it attempts to match the subsequent inp

A regular expression generally matches from left to right. This is why lookahead and [lookbehind](/en-US/docs/Web/JavaScript/Reference/Regular_expressions/Lookbehind_assertion) assertions are called as such — lookahead asserts what's on the right, and lookbehind asserts what's on the left.

In order for a `(?=pattern)` assertion to succeed, the `pattern` must match at the current position, but the current position is not advanced before matching the subsequent input. The `(?!pattern)` form negates the assertion — it succeeds if the `pattern` does not match at the current position.
In order for a `(?=pattern)` assertion to succeed, the `pattern` must match the text after the current position, but the current position is not changed. The `(?!pattern)` form negates the assertion — it succeeds if the `pattern` does not match at the current position.

The `pattern` can contain [capturing groups](/en-US/docs/Web/JavaScript/Reference/Regular_expressions/Capturing_group). See the capturing groups page for more information on the behavior in this case.

Expand All @@ -41,9 +41,9 @@ The matching of the pattern above happens as follows:
3. `\1` does not match the following string, because that requires 2 `"a"`s, but only 1 is available. So the matcher backtracks, but it doesn't go into the lookahead, so the capturing group cannot be reduced to 1 `"a"`, and the entire match fails at this point.
4. `exec()` re-attempts matching at the next position — before the second `"a"`. This time, the lookahead matches `"a"`, and `a*b` matches `"b"`. The backreference `\1` matches the captured `"a"`, and the match succeeds.

If the regex is able to backtrack into the lookahead and revise the choice made in there, then the match would succeed at step 3 by `(a+)` matching the first `"a"` (instead of the first two `"a"`s) and `a*b` matching `"ab"`, without even attempting the next input position.
If the regex is able to backtrack into the lookahead and revise the choice made in there, then the match would succeed at step 3 by `(a+)` matching the first `"a"` (instead of the first two `"a"`s) and `a*b` matching `"ab"`, without even re-attempting the next input position.

Negative lookaheads can contain capturing groups as well, but backreferences only make sense within the `pattern`, because if matching continues, `pattern` would necessarily be unmatched (otherwise the assertion fails). This means outside of the `pattern`, backreferences to capturing groups in negative lookaheads always succeed. For example:
Negative lookaheads can contain capturing groups as well, but backreferences only make sense within the `pattern`, because if matching continues, `pattern` would necessarily be unmatched (otherwise the assertion fails). This means outside of the `pattern`, backreferences to those capturing groups in negative lookaheads always succeed. For example:

```js
/(.*?)a(?!(a+)b\1c)\1(.*)/.exec("baaabaac"); // ['baaabaac', 'ba', undefined, 'abaac']
Expand All @@ -58,7 +58,7 @@ The matching of the pattern above happens as follows:
5. At this position, the lookahead fails to match, because the remaining input does not follow the pattern "any number of `"a"`s, a `"b"`, the same number of `"a"`s, a `c`". This causes the assertion to succeed.
6. However, because nothing was matched within the assertion, the `\1` backreference has no value, so it matches the empty string. This causes the rest of the input to be consumed by the `(.*)` at the end.

Normally, assertions cannot be [quantified](/en-US/docs/Web/JavaScript/Reference/Regular_expressions/Quantifier). However, in non-[unicode](/en-US/docs/Web/JavaScript/Reference/Global_Objects/RegExp/unicode) mode, lookahead assertions) can be quantified. This is a [deprecated syntax for web compatibility](/en-US/docs/Web/JavaScript/Reference/Deprecated_and_obsolete_features#regexp) and you should not rely on it.
Normally, assertions cannot be [quantified](/en-US/docs/Web/JavaScript/Reference/Regular_expressions/Quantifier). However, in non-[unicode](/en-US/docs/Web/JavaScript/Reference/Global_Objects/RegExp/unicode) mode, lookahead assertions can be quantified. This is a [deprecated syntax for web compatibility](/en-US/docs/Web/JavaScript/Reference/Deprecated_and_obsolete_features#regexp), and you should not rely on it.

```js
/(?=a)?b/.test("b"); // true; the lookahead is matched 0 time
Expand Down
Loading

0 comments on commit 220e2e7

Please sign in to comment.