Revisiting mutable vs immutable APIs #93

sffc · 2019-12-15T01:45:01Z

I'm a bit concerned that patterns such as the following will produce hard-to-detect bugs.

let firstSegment = new Intl.Segmenter().segment(string);
// OK
let lastSegment = firstSegment.preceding(Infinity);
// OK, but...
if (firstSegment.index === lastSegment.index) {
  // BAD! firstSegment changed when .preceding() was called
  console.log("One segment in your string!");
}

If the methods on %SegmentIterator%.prototype returned semantically new objects, this problem would be solved, but the obvious problem with such behavior is whether it is efficiently implementable.

I think maybe it is still efficient to implement. C++, and probably Rust, has the concept of an rvalue. Basically it means that if the receiver is going to be destroyed after the current function call, you can optimize by using move semantics instead of copy semantics. So, for example, if you have a call site such as,

let segment = new Intl.Segmenter().segment(string);
segment = segment.following();

since the object associated with the first segment is going out of scope, the engine can use an optimized code path to re-use the existing object. It is even easier to optimize when using tail calls:

let segment = new Intl.Segmenter().segment(string).following().following();

And for the main use case via the Iterator interface, the engine definitely will have enough information to optimize out the object cloning.

Thoughts?

The text was updated successfully, but these errors were encountered:

gibson042 · 2019-12-15T23:16:34Z

I don't want to introduce the complexity of random-access methods cloning behind the scenes and forcing implementations to include special optimizations or suffer memory churn. Map and Set, along with their Weak counterparts, also implement a chainable "fluent" interface, as do Array methods such as ES1 sort/reverse and ES6 fill (with an honorable mention for ES5/ES6 Object/Reflect functions like defineProperties and setPrototypeOf). It is a well-established paradigm that spans all versions of ECMAScript.

I wanted the random-access methods to return iteration result objects (i.e., corresponding with next), but @littledan argued very strongly that those allocations would be prohibitively expensive. Unless there's consensus around that or some other approach, I'd like to keep this one.

bathos · 2019-12-15T23:41:26Z

And for the main use case via the Iterator interface, the engine definitely will have enough information to optimize out the object cloning.

Is that really the case? I would have imagined the mutability of the properties in play would generally preclude that.

let a, b;

const { following } = Intl.Segmenter.prototype;

Intl.Segmenter.prototype.following = function() {
  if (a) {
    b = this;
  } else {
    a = this;
  }

  return following.call(this);
}

new Intl.Segmenter().segment(string).following().following();

console.log(a === b); // ???

sffc · 2019-12-17T22:35:22Z

Is that really the case? I would have imagined the mutability of the properties in play would generally preclude that.

Engines, as far as I understand, tend to use different code paths depending on whether or not the prototype are all built-ins. If you mutate the prototype, then you lose those optimizations.

Map and Set, along with their Weak counterparts, also implement a chainable "fluent" interface, as do Array methods

The difference is that Map, Set, and Array are data structures, whereas a segment is a "thing". My intuition, and I believe the intuition shared by others, is that you have an expectation that a data structure can be mutable, but the expectation is different for nouns.

littledan · 2020-01-16T11:57:23Z

I doubt it would be easy to optimize the allocations out. Engines have a lot of very specific work to optimize Array iteration; these things don't generalize automatically. The optimizations that @sffc is positing seem even more difficult.

However, we can make the decision that we're OK with an API shape which will likely cause more allocations (Temporal seems to be going in this direction) for semantic reasons. If @sffc and @gibson042 want to champion a higher-level API that will likely have more allocations, I'm OK with the proposal moving in this direction. I don't have any data that allocation would be the bottleneck, and maybe I've been emphasizing this goal too much.

I'm not sure what @sffc is proposing for what following and preceding would return. I could picture a design like this:

There are three classes: Intl.Segmenter, %SegmentsPrototype% and %SegmentIteratorPrototype%
Intl.Segmenter holds options, and has a segments method which returns a %SegmentsPrototype% instance.
%SegmentsPrototype% is logically stateless, and just the combination of an Intl.Segmenter and a String. %SegmentsPrototype% has preceding and following methods which take a mandatory position parameter and return a { index, isWordLike, segment } object. [Symbol.iterator] returns a %SegmentIteratorPrototype% instance.
%SegmentIteratorPrototype% has a next method, implementing the iteration protocol. It is the only stateful class, logically having a %SegmentsPrototype% and a current position.

sffc · 2020-01-16T13:29:22Z

@hagbard How would this decision influence OmnICU?

gibson042 · 2020-01-25T18:26:36Z

Option 1 (current)
segmenterInstance.segment(string) returns a stateful SegmentIterator instance at starting position. A SegmentIterator instance exposes current state (segment contents/index/etc.) via getters on its prototype and supports next() to advance and return an iteration result representing the new state and other (random-access) methods to move to a new state and return the receiver. Use cases for those methods consist of "move to first segment", "move to last segment", "move to previous segment", and "move to segment before/after/containing a specific index", though the specific signatures are still being debated.

Option 2 (state-confining)
segmenterInstance.segment(string) returns a stateless iterable Segments instance. A Segments instance supports random-access methods to find a segment and return its contents/index/etc., and Symbol.iterator to return a stateful SegmentIterator instance whose next() method returns objects with the same shape. Use cases for random-access methods consist of "describe first segment", "describe last segment", and "describe segment before/after/containing a specific index".

Option 2 seems better to me and less divergent from existing ECMA-262 iterators, I'm just not ready to adopt it unilaterally. This is marked as blocking Stage 3, but I'm still going to put advancement on the agenda for February. I will update slides accordingly if we can resolve this before the meeting.

hagbard · 2020-01-25T19:36:47Z

@hagbard How would this decision influence OmnICU?

I'm not fully groking this having just dropped in, but the OmnICU model I've been envisioning would be something like what @littledan was saying (the stateless "segments" prototype would be what calls into WASM, passing position and string handle, and then updates the iterator on the host side with new position and updated state derived from the OmniICU result).

This kind of thing is going to be hard to make super efficient whatever we (OmnICU) do, and maybe we want to look at higher level functionality too (e.g. "iterate to segment with property X" as higher level).

gibson042 · 2020-01-27T16:51:20Z

Resolved in favor of adopting this suggestion (summarized as Option 2) by me and @sffc and @littledan.

sffc · 2020-01-27T16:52:08Z

From January 27 phone call: adopt the option proposed by @littledan, and leave room for possible future extensions based on user demand.

sffc · 2020-01-27T16:56:48Z

Also, include a .string slot in %SegmentsPrototype% so that callers can access the original string if they are passed a Segments object.

Fixes tc39gh-93 Fixes tc39gh-92

littledan added the stage 3 blocker label Jan 16, 2020

gibson042 mentioned this issue Jan 25, 2020

Inclusive vs exclusive indices for preceding/following #92

Closed

gibson042 mentioned this issue Jan 30, 2020

Should we have random access APIs? #98

Closed

gibson042 added a commit to gibson042/proposal-intl-segmenter that referenced this issue Feb 5, 2020

Normative: Remove stateful iterator manipulation

33f44be

Fixes tc39gh-93 Fixes tc39gh-92

gibson042 closed this as completed in 9e4f879 Feb 5, 2020

sffc mentioned this issue Mar 24, 2020

Make Number.range() return re-usable value tc39/proposal-iterator.range#17

Closed

sffc mentioned this issue Jul 9, 2020

Consistency with Number.range model #120

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Revisiting mutable vs immutable APIs #93

Revisiting mutable vs immutable APIs #93

sffc commented Dec 15, 2019 •

edited

Loading

gibson042 commented Dec 15, 2019

bathos commented Dec 15, 2019

sffc commented Dec 17, 2019

littledan commented Jan 16, 2020

sffc commented Jan 16, 2020

gibson042 commented Jan 25, 2020 •

edited

Loading

hagbard commented Jan 25, 2020

gibson042 commented Jan 27, 2020

sffc commented Jan 27, 2020

sffc commented Jan 27, 2020

Revisiting mutable vs immutable APIs #93

Revisiting mutable vs immutable APIs #93

Comments

sffc commented Dec 15, 2019 • edited Loading

gibson042 commented Dec 15, 2019

bathos commented Dec 15, 2019

sffc commented Dec 17, 2019

littledan commented Jan 16, 2020

sffc commented Jan 16, 2020

gibson042 commented Jan 25, 2020 • edited Loading

hagbard commented Jan 25, 2020

gibson042 commented Jan 27, 2020

sffc commented Jan 27, 2020

sffc commented Jan 27, 2020

sffc commented Dec 15, 2019 •

edited

Loading

gibson042 commented Jan 25, 2020 •

edited

Loading