feat(coverage): enable regexp in test262 #4242

Boshen · 2024-07-13T03:06:25Z

@leaysgur This enables all test262 regexp tests, feel free to decide when's the right time to integrate.

It seems like we need to add some pointing spans on the diagnostics.

It's somewhat slow to run just c, so I always use just example parser for local development.

IWANABETHATGUY · 2024-08-06T10:22:12Z

crates/oxc_parser/src/js/expression.rs

        let span = self.start_span();

        // split out pattern
        let (pattern_end, flags) = self.read_regex();
        let pattern_start = self.cur_token().start + 1; // +1 to exclude `/`
        let pattern = &self.source_text[pattern_start as usize..pattern_end as usize];
+        if let Err(diagnostic) = PatternParser::new(


Reparsing each regex eagerly in place sounds not reasonable, can we introduce another visit pass (enabled by a option) and reparse each regex then emit diagnostic at the end?

Most downstream users of parser may not care about whether the regex semantic is correct or not like formater, bundler.

But actually, I'm also concerned about how to finish this PR.

With current code,

oxc_parser parses RegExp and reports errors, but does not seem to hold the parsed results

RegExp literals(/abc/) are checked, but RegExp object calls(new RegExp("abc")) are not checked

@Boshen What were your original plans?

I've thought about it a little, and to organize my thoughts..., I'll answer my own questions.

RegExp literals(/abc/) are checked, but RegExp object calls(new RegExp("abc")) are not checked

It's OK.
In the case of new RegExp("string"), the code is just parsed, and if there’s an error in RegExp, it will occur at runtime.

On the other hand, in the case of /string/, the syntax must satisfy the requirements of a literal, so it should produce an error during the parsing stage, before runtime.

console.log("START"); const a = new RegExp("a{", "u"); // <- Invalid console.log("END");

This will log "START".

console.log("START"); const a = /a{/u; // <- Invalid console.log("END");

This, however, will not log anything.

One thing that concerns me, though, is that if invalid literal is treated as error at the parser stage, it would make it impossible to implement rules like eslint/no-invalid-regexp?

How to reuse parsed result

Perhaps parse it again from Semantic, just like with JSDoc...?

To make the #4242 tests pass. (My `RegExp` parser tells me `/as)df/` is invalid syntax. 😂)

leaysgur · 2024-08-06T13:36:51Z

FYI:

The previous CodSpeed result was one where the entire source code(!) was passed to the regexp parser, not the regular expression part
I have not yet been started parser perf improvements

Benchmark results have been updated now.

I don't think current approach is not the best solution, but the CI is green. 😅

IWANABETHATGUY · 2024-08-06T14:21:58Z

FYI:

The previous CodSpeed result was one where the entire source code(!) was passed to the regexp parser, not the regular expression part

I have not yet been started parser perf improvements

Thanks for your explanation

@Boshen

Part of #1164 ## Progress updates 🗞️ Waiting for the review and advice, while thinking how to handle escaped string when `new RegExp(pat)`. ## TODOs - [x] `RegExp(Literal = Body + Flags)#parse()` structure - [x] Base `Reader` impl to handle both unicode(u32) and utf-16(u16) units - [x] Global `Span` and local offset conversion - [x] Design AST shapes - [x] Keep `enum` size small by `Box<'a, T>` - [x] Rework AST shapes - [x] Split body and flags w/ validating literal - [x] Parse `RegExpFlags` - [x] Parse `RegExpBody` = `Pattern` - [x] Parse `Pattern` > `Disjunction` - [x] Parse `Disjunction` > `Alternative` - [x] Parse `Alternative` > `Term` - [x] Parse `Term` > `Assertion` - [x] Parse `BoundaryAssertion` - [x] Parse `LookaroundAssertion` - [x] Parse `Term` > `Quantifier` - [x] Parse `Term` > `Atom` - [x] Parse `Atom` > `PatternCharacter` - [x] Parse `Atom` > `.` - [x] Parse `Atom` > `\AtomEscape` - [x] Parse `\AtomEscape` > `DecimalEscape` - [x] Parse `\AtomEscape` > `CharacterClassEscape` - [x] Parse `CharacterClassEscape` > `\d, \D, \s, \S, \w, \W` - [x] Parse `CharacterClassEscape` > `\p{UnicodePropertyValueExpression}, \P{UnicodePropertyValueExpression}` - [x] Parse `\AtomEscape` > `CharacterEscape` - [x] Parse `CharacterEscape` > `ControlEscape` - [x] Parse `CharacterEscape` > `c AsciiLetter` - [x] Parse `CharacterEscape` > `0` - [x] Parse `CharacterEscape` > `HexEscapeSequence` - [x] Parse `CharacterEscape` > `RegExpUnicodeEscapeSequence` - [x] Parse `CharacterEscape` > `IdentityEscape` - [x] Parse `\AtomEscape` > `kGroupName` - [x] Parse `Atom` > `[CharacterClass]` - [x] Parse `[CharacterClass]` > `ClassContents` > `[~UnicodeSetsMode] NonemptyClassRanges` - [x] Parse `[CharacterClass]` > `ClassContents` > `[+UnicodeSetsMode] ClassSetExpression` - [x] Parse `ClassSetExpression` > `ClassUnion` - [x] Parse `ClassSetExpression` > `ClassIntersection` - [x] Parse `ClassSetExpression` > `ClassSubtraction` - [x] Parse `ClassSetExpression` > `ClassSetOperand` - [x] Parse `ClassSetExpression` > `ClassSetRange` - [x] Parse `ClassSetExpression` > `ClassSetCharacter` - [x] Parse `Atom` > `(GroupSpecifier)` - [x] Parse `Atom` > `(?:Disjunction)` - [x] Annex B - [x] Parse `QuantifiableAssertion` - [x] Parse `ExtendedAtom` - [x] Parse `ExtendedAtom` > `\ [lookahead = c]` - [x] Parse `ExtendedAtom` > `InvalidBracedQuantifier` - [x] Parse `ExtendedAtom` > `ExtendedPatternCharacter` - [x] Parse `ExtendedAtom` > `\AtomEscape` > `CharacterEscape` > `LegacyOctalEscapeSequence` - [x] Early errors - [x] Pattern :: Disjunction(1/2) - [x] Pattern :: Disjunction(2/2) - [x] QuantifierPrefix :: { DecimalDigits , DecimalDigits } - [x] ExtendedAtom :: InvalidBracedQuantifier (Annex B) - [x] AtomEscape :: k GroupName - [x] AtomEscape :: DecimalEscape - [x] NonemptyClassRanges :: ClassAtom - ClassAtom ClassContents(1/2) - [x] NonemptyClassRanges :: ClassAtom - ClassAtom ClassContents(2/2) - [x] NonemptyClassRanges :: ClassAtom - ClassAtom ClassContents(Annex B) - [x] NonemptyClassRangesNoDash :: ClassAtomNoDash - ClassAtom ClassContents(1/2) - [x] NonemptyClassRangesNoDash :: ClassAtomNoDash - ClassAtom ClassContents(2/2) - [x] NonemptyClassRangesNoDash :: ClassAtomNoDash - ClassAtom ClassContents(Annex B) - [x] RegExpIdentifierStart :: \ RegExpUnicodeEscapeSequence - [x] RegExpIdentifierStart :: UnicodeLeadSurrogate UnicodeTrailSurrogate - [x] RegExpIdentifierPart :: \ RegExpUnicodeEscapeSequence - [x] RegExpIdentifierPart :: UnicodeLeadSurrogate UnicodeTrailSurrogate - [x] UnicodePropertyValueExpression :: UnicodePropertyName = UnicodePropertyValue(1/2) - [x] UnicodePropertyValueExpression :: UnicodePropertyName = UnicodePropertyValue(2/2) - [x] UnicodePropertyValueExpression :: LoneUnicodePropertyNameOrValue(1/2) - [x] UnicodePropertyValueExpression :: LoneUnicodePropertyNameOrValue(2/2) - [x] CharacterClassEscape :: P{ UnicodePropertyValueExpression } - [x] CharacterClass :: [^ ClassContents ] - [x] NestedClass :: [^ ClassContents ] - [x] ClassSetRange :: ClassSetCharacter - ClassSetCharacter - [x] Add `Span` to `Err(OxcDiagnostic::error())` calls - [x] Perf improvement - [x] `Reader#peek()` should avoid `iter.next()` equivalent - [x] ~~Use `char` everywhere and split and push 2 surrogates(pair) for `Character`?~~ - [x] ~~Try 1(+1) loop parsing for capturing groups?~~ ## Follow up - [x] @Boshen Test suite > #4242 - [x] Investigate CI errors... - Next... - Support ES2025 Duplicate named capturing groups? - Support ES20XX Stage3 Modifiers?

Boshen · 2024-08-20T07:05:39Z

Continue in #4998

leaysgur added 30 commits June 20, 2024 13:52

Restore current progress

47f83f0

Add bare example

7fd8c3b

Init parser

5e70a1e

Make it run

94c741b

Adjust span

8d03d2b

Omit options support

396c439

Run fmt

56f7fdc

Fix typo

dc57a2b

Fix doc

76f23ea

Merge remote-tracking branch 'origin' into regexpp

5c97aae

Remove ast_builder

dfe439e

Merge remote-tracking branch 'origin' into regexpp

29a2436

Keep enum size small

cb4d278

Merge remote-tracking branch 'origin' into regexpp

0a790fc

Validate u+v flags

912929a

Clean up

164bb37

Wip reader

6fbb659

Fix warnings

0dc517b

Fix clippy

1794de8

Reader#eat2(), eat3()

53705c7

Calculate unified span pos

ec87865

Fix test name

93d231c

Merge remote-tracking branch 'origin' into regexpp

9f131ce

Merge remote-tracking branch 'origin' into regexpp

24bc377

Split mod

0cac2c4

Align parsing names

f657cf0

Merge remote-tracking branch 'origin' into regexpp

31b2849

Make very minimum pattern pass

3c9bb61

Merge remote-tracking branch 'origin' into regexpp

ac13312

Fix

98eb951

leaysgur added 2 commits August 6, 2024 18:13

Fix regex parser usage

e3a2ad4

Update coverage

8a674d7

leaysgur mentioned this pull request Aug 6, 2024

chore(module_lexer): Fix invalid regex in test #4683

Merged

IWANABETHATGUY reviewed Aug 6, 2024

View reviewed changes

Boshen pushed a commit that referenced this pull request Aug 6, 2024

chore(module_lexer): Fix invalid regex in test (#4683)

bc611d7

To make the #4242 tests pass. (My `RegExp` parser tells me `/as)df/` is invalid syntax. 😂)

leaysgur added 2 commits August 6, 2024 22:06

Merge remote-tracking branch 'origin' into regexpp

b16c6f4

Merge branch 'regexpp' into regexp-tests

8a2f7a1

leaysgur added 12 commits August 14, 2024 15:02

Refactor class_strings_disjunction

653b2e8

Fix \b value

1c2c91f

Merge remote-tracking branch 'origin' into regexpp

1ae167c

Diff

719acbc

Rename span_position > offset

f7ea71e

Check collect Vec perf

bdbffa9

Merge branch 'regexpp' into regexp-tests

0cbab8a

Fix u16 offset issue partially

43814e9

Merge remote-tracking branch 'origin' into regexpp

114627b

Perf non-unicode offset

c3c4ec2

Merge remote-tracking branch 'origin' into regexpp

bb54e3a

Merge branch 'regexpp' into regexp-tests

33c80a7

leaysgur mentioned this pull request Aug 19, 2024

feat(linter): regex parser #1164

Closed

4 tasks

Boshen force-pushed the regexpp branch 3 times, most recently from a6325ee to 368364d Compare August 20, 2024 02:19

Base automatically changed from regexpp to main August 20, 2024 02:22

Boshen closed this Aug 20, 2024

Boshen deleted the regexp-tests branch August 20, 2024 07:05

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(coverage): enable regexp in test262 #4242

feat(coverage): enable regexp in test262 #4242

Boshen commented Jul 13, 2024 •

edited

Loading

IWANABETHATGUY Aug 6, 2024

IWANABETHATGUY Aug 6, 2024

leaysgur Aug 6, 2024 •

edited

Loading

leaysgur Aug 16, 2024 •

edited

Loading

leaysgur commented Aug 6, 2024 •

edited

Loading

IWANABETHATGUY commented Aug 6, 2024

Boshen commented Aug 20, 2024

feat(coverage): enable regexp in test262 #4242

feat(coverage): enable regexp in test262 #4242

Conversation

Boshen commented Jul 13, 2024 • edited Loading

IWANABETHATGUY Aug 6, 2024

Choose a reason for hiding this comment

IWANABETHATGUY Aug 6, 2024

Choose a reason for hiding this comment

leaysgur Aug 6, 2024 • edited Loading

Choose a reason for hiding this comment

leaysgur Aug 16, 2024 • edited Loading

Choose a reason for hiding this comment

leaysgur commented Aug 6, 2024 • edited Loading

IWANABETHATGUY commented Aug 6, 2024

Boshen commented Aug 20, 2024

Boshen commented Jul 13, 2024 •

edited

Loading

leaysgur Aug 6, 2024 •

edited

Loading

leaysgur Aug 16, 2024 •

edited

Loading

leaysgur commented Aug 6, 2024 •

edited

Loading