Skip to content
This repository has been archived by the owner on Jun 7, 2022. It is now read-only.

Meta: Stage 3 Specification Reviews #11

Closed
3 tasks done
rbuckton opened this issue Jul 12, 2019 · 17 comments
Closed
3 tasks done

Meta: Stage 3 Specification Reviews #11

rbuckton opened this issue Jul 12, 2019 · 17 comments

Comments

@rbuckton
Copy link
Collaborator

rbuckton commented Jul 12, 2019

@rbuckton
Copy link
Collaborator Author

@msaboff, @ljharb, @waldemarhorwat - Please note, I'm looking to merge #7 shortly which has a few changes to simplify the design based on feedback, and has full specification text.

@msaboff
Copy link
Collaborator

msaboff commented Jul 12, 2019

@rbuckton I'll review it in the next few days.

@rbuckton
Copy link
Collaborator Author

#7 has been merged, so the content of the explainer and full spec text at https://tc39.es/proposal-regexp-match-offsets/ are up to date.

@waldemarhorwat
Copy link
Collaborator

I did a full review and found a number of bugs in the specification. See issue #10 for the details.

@msaboff
Copy link
Collaborator

msaboff commented Jul 18, 2019

Setting aside the issues that @waldemarhorwat found, I don't find any other issues with the spec text doing what the current proposal describes. If that is the way we decide to go, count me as reviewed.

I have concerns that there is a late push to either always create the indices or add methods to retrieve them from the match result, and the related impact on the spec text. I think we need to resolve between the three choices that various people are advocating before stage 3.

From an implementers point of view I have some concern with the either always creating an indices property, or adding a captureStart() and captureEnd() methods on the result. Like V8, JavaScriptCore creates indices during matching (* most of the time), however the native C++ array of integers is a local variable that is thrown away after the equivalent of RegExpBuiltinExec has been run and the string result(s) are materialized. Obviously we'd have to persist this data in the match result.

RegExp's are performance sensitive for many applications and this is reflected in the various JavaScript benchmarks out there. We need to be careful that we don't hurt some implementations with whatever option we choose.

Lastly, I don't want to muddy the waters more, but wasn't there an earlier proposal that always returned the captured strings and made the indices optional? The current proposal is an either or kind of thing.

* JSC doesn't create indices during matching when they'll never be used, e.g. when RegExp.prototype.test() is called.

@rbuckton
Copy link
Collaborator Author

@msaboff: The version of the spec in master uses a property on an options object to indicate whether to return substrings or [startIndex, endIndex] pairs. This avoids any memory overhead for existing RegExp cases, but requires threading the options object through @@match and @@matchAll. The #14 PR goes back to something like the Stage 1 proposal in that it doesn't need the options object, but may have additional memory overhead.

I plan to discuss both in committee next week to gauge whether implementers feel this overhead is acceptable.

@msaboff
Copy link
Collaborator

msaboff commented Jul 18, 2019

@msaboff: The version of the spec in master uses a property on an options object to indicate whether to return substrings or [startIndex, endIndex] pairs. This avoids any memory overhead for existing RegExp cases, but requires threading the options object through @@match and @@matchAll. The #14 PR goes back to something like the Stage 1 proposal in that it doesn't need the options object, but may have additional memory overhead.

I'm aware that the current spec's options threading doesn't impact memory. It was the other options being considered that might have memory impact.

I plan to discuss both in committee next week to gauge whether implementers feel this overhead is acceptable.

I fully expect that along with what the whole committee thinks about the three options on the table.

@rbuckton
Copy link
Collaborator Author

three options

What is the third option? Or by a third option do you just mean choosing not to advance the proposal?

@mathiasbynens
Copy link
Member

Like V8, JavaScriptCore creates indices during matching (* most of the time), however the native C++ array of integers is a local variable that is thrown away after the equivalent of RegExpBuiltinExec has been run and the string result(s) are materialized. Obviously we'd have to persist this data in the match result.

Here's what the memory overhead would be for V8:

If done carefully, the memory overhead would be at most 2 ints per capture. Assuming 4-byte integers, that's comparable to each captured string being 8 [ASCII] characters longer.

For JSC, are you anticipating something different? I'm trying to understand if you're concerned with a JSC-specific implementation detail, or just with adding any overhead at all.

@rbuckton
Copy link
Collaborator Author

@waldemarhorwat, @ljharb, @msaboff: For each of you, can I consider the specification text both in master and in #14 to be fully reviewed? If so that means I could possibly move forward with a request for Stage 3 depending on the outcome of the discussion around #12 both here and during committee.

@msaboff
Copy link
Collaborator

msaboff commented Jul 19, 2019

Like V8, JavaScriptCore creates indices during matching (* most of the time), however the native C++ array of integers is a local variable that is thrown away after the equivalent of RegExpBuiltinExec has been run and the string result(s) are materialized. Obviously we'd have to persist this data in the match result.

Here's what the memory overhead would be for V8:

If done carefully, the memory overhead would be at most 2 ints per capture. Assuming 4-byte integers, that's comparable to each captured string being 8 [ASCII] characters longer.

For JSC, are you anticipating something different? I'm trying to understand if you're concerned with a JSC-specific implementation detail, or just with adding any overhead at all.

I think our memory overhead would be about the same.

@ljharb
Copy link
Member

ljharb commented Jul 20, 2019

As an editor: Confirm that master looks good if we go with an options object (modulo #16, #17); as does #14.

As a delegate: on master, i think that replace when functionalReplace is true and matchOptions.[[Capture]] is ‘indices’ should not throw, but should do something useful. Similarly, on #14, should replace with a function get indices info? (i think it probably should somehow)

@rbuckton
Copy link
Collaborator Author

should not throw, but should do something useful

Any thoughts on what that "something useful" might be? "abc".replace("b", "d", { capture: "indices" }) doesn't make a lot of sense, so throwing is the safest option (since the restriction can always be loosened later). "abc".replace("b", ([start, end]) => { ... }, { capture: "indices" }) make a little more sense, since you can chose what to do with those indices.

@ljharb
Copy link
Member

ljharb commented Jul 20, 2019

Right - I'd expect it to throw in the non-function case, but to provide the indices somehow in the function case.

@rbuckton
Copy link
Collaborator Author

Ah, I see. The spec does not contain any changes for RegExp.prototype[@@replace]. I will file a PR shortly.

@ljharb
Copy link
Member

ljharb commented Jul 20, 2019

It might be worth getting the committee's opinion before merging it, but it'd be great to have it lined up and reviewed :-)

@waldemarhorwat
Copy link
Collaborator

@waldemarhorwat, @ljharb, @msaboff: For each of you, can I consider the specification text both in master and in #14 to be fully reviewed? If so that means I could possibly move forward with a request for Stage 3 depending on the outcome of the discussion around #12 both here and during committee.

I just posted my review of the #14 variant. I found a few algorithm logic bugs, but they are all simple to fix. Assuming you fix them, I'm happy with that variant (and at this point very much prefer it).

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants