Michael Saboff's Review #55

msaboff · 2022-03-28T05:25:42Z

Over all, it looks good.

For the Syntax Rules production ClassReservedDouble, is the long list of reserved doubled syntax characters a somewhat paranoid reservation for possible extensions without needing to add a new flag?

A possible nit question. For 22.2.2.7 Runtime Semantics: CompileAtom, Step 6 of the production Atom :: CharacterClass,
It seems to me that the sorting of Strings by descending order of string length might undermine the intent of a developer.
Consider a CharacterClass that contains a long list of strings, the developer may have ordered equal length strings within that character class by the expected match likelihood. If the sort is not stable, the sorting by length may circumvent that intended order, possibly impacting match performance.

The text was updated successfully, but these errors were encountered:

mathiasbynens · 2022-03-28T06:18:22Z

Thanks for your review, Michael!

For the Syntax Rules production ClassReservedDouble, is the long list of reserved doubled syntax characters a somewhat paranoid reservation for possible extensions without needing to add a new flag?

Exactly. Some additional background on this was presented at the TC39 May 2021 meeting (and the preceding April 2021 Incubator Call):

We are proposing to reserve additional single and double ASCII punctuation for clarity and for possible future extensions. By mostly reserving double punctuation, most single ASCII characters can continue to be used without escaping.

[…] Some regex engines (and UTS #18) support an operator like ‘~~’ for symmetric difference.

A possible nit question. For 22.2.2.7 Runtime Semantics: CompileAtom, Step 6 of the production Atom :: CharacterClass, It seems to me that the sorting of Strings by descending order of string length might undermine the intent of a developer. Consider a CharacterClass that contains a long list of strings, the developer may have ordered equal length strings within that character class by the expected match likelihood. If the sort is not stable, the sorting by length may circumvent that intended order, possibly impacting match performance.

I see. https://github.com/tc39/proposal-regexp-set-notation#whats-the-match-order-for-character-classes-containing-strings addresses this for properties of strings specifically (where the strings don’t have an inherent order), but as you pointed out we could choose to preserve the order of equal-length strings in string literals (e.g. \q{foo|bar|baz}), although we don’t currently do so. @markusicu Any thoughts?

msaboff · 2022-03-28T15:04:58Z

I see. https://github.com/tc39/proposal-regexp-set-notation#whats-the-match-order-for-character-classes-containing-strings addresses this for properties of strings specifically (where the strings don’t have an inherent order), but as you pointed out we could choose to preserve the order of equal-length strings in string literals (e.g. \q{foo|bar|baz}), although we don’t currently do so.

The Unicode property of strings case is clear that order os same length strings is not important. It is the case that you point out, \q{foo|bar|baz}. My concern is due to the discussions of Array.sort stability that you and others were involved in. It would be sad if a similar stability concern would be raised after this proposal is approved and implemented.

markusicu · 2022-03-28T15:29:41Z

I see. https://github.com/tc39/proposal-regexp-set-notation#whats-the-match-order-for-character-classes-containing-strings addresses this for properties of strings specifically (where the strings don’t have an inherent order), but as you pointed out we could choose to preserve the order of equal-length strings in string literals (e.g. \q{foo|bar|baz}), although we don’t currently do so. @markusicu Any thoughts?

The order of same-length strings should not matter. However, I expect that implementations will implement character classes with set data structures (extending from only code points to also allowing strings), which means that they won't preserve parsing order (just like they don't for code points). Therefore I would be reluctant to suggest that the matching order for a given string length is the parsing order. Specifying a stable sort in the operation that creates a matcher object might be harmless but would be misleading if the construction of the CharSet didn't preserve the parsing order.

mathiasbynens · 2022-03-29T17:27:59Z

We discussed this during the 2023-03-29 TC39 meeting and agreed not to make any spec changes. @waldemarhorwat pointed out that today’s character classes (supporting only strings of size 1) don’t have an inherent order either (e.g. [xyz] vs. [zyx]). The same should go for strings of any size, so that character classes remain true mathematical sets. I will keep this issue open until I add an explicit FAQ entry (tomorrow).

markusicu · 2022-03-29T17:30:19Z

Right. I also suggested that implementers should be free to use sets (implementations of mathematical sets), and that for runtime optimizations they might use tries (retrieval trees).

Issue: #55

mathiasbynens · 2022-03-30T07:25:50Z

I tried to summarize the outcome in #58.

#58) Issue: #55

mathiasbynens · 2022-03-31T08:38:04Z

Closing now that #58 is merged. Thanks, everyone!

mathiasbynens mentioned this issue Mar 28, 2022

Advance to Stage 3 #24

Closed

9 tasks

mathiasbynens added a commit that referenced this issue Mar 30, 2022

Summarize outcome of match order discussion w.r.t. same-length strings

ed3530a

Issue: #55

mathiasbynens mentioned this issue Mar 30, 2022

Summarize outcome of match order discussion w.r.t. same-length strings #58

Merged

mathiasbynens added a commit that referenced this issue Mar 30, 2022

Summarize outcome of match order discussion w.r.t. same-length strings

bc7c500

Issue: #55

mathiasbynens added a commit that referenced this issue Mar 31, 2022

Summarize outcome of match order discussion w.r.t. same-length strings (

e7048b2

#58) Issue: #55

mathiasbynens closed this as completed Mar 31, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Michael Saboff's Review #55

Michael Saboff's Review #55

msaboff commented Mar 28, 2022

mathiasbynens commented Mar 28, 2022 •

edited

Loading

msaboff commented Mar 28, 2022

markusicu commented Mar 28, 2022

mathiasbynens commented Mar 29, 2022

markusicu commented Mar 29, 2022

mathiasbynens commented Mar 30, 2022

mathiasbynens commented Mar 31, 2022

Michael Saboff's Review #55

Michael Saboff's Review #55

Comments

msaboff commented Mar 28, 2022

mathiasbynens commented Mar 28, 2022 • edited Loading

msaboff commented Mar 28, 2022

markusicu commented Mar 28, 2022

mathiasbynens commented Mar 29, 2022

markusicu commented Mar 29, 2022

mathiasbynens commented Mar 30, 2022

mathiasbynens commented Mar 31, 2022

mathiasbynens commented Mar 28, 2022 •

edited

Loading