Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Revisiting mutable vs immutable APIs #93

Closed
sffc opened this issue Dec 15, 2019 · 10 comments
Closed

Revisiting mutable vs immutable APIs #93

sffc opened this issue Dec 15, 2019 · 10 comments

Comments

@sffc
Copy link

sffc commented Dec 15, 2019

I'm a bit concerned that patterns such as the following will produce hard-to-detect bugs.

let firstSegment = new Intl.Segmenter().segment(string);
// OK
let lastSegment = firstSegment.preceding(Infinity);
// OK, but...
if (firstSegment.index === lastSegment.index) {
  // BAD! firstSegment changed when .preceding() was called
  console.log("One segment in your string!");
}

If the methods on %SegmentIterator%.prototype returned semantically new objects, this problem would be solved, but the obvious problem with such behavior is whether it is efficiently implementable.

I think maybe it is still efficient to implement. C++, and probably Rust, has the concept of an rvalue. Basically it means that if the receiver is going to be destroyed after the current function call, you can optimize by using move semantics instead of copy semantics. So, for example, if you have a call site such as,

let segment = new Intl.Segmenter().segment(string);
segment = segment.following();

since the object associated with the first segment is going out of scope, the engine can use an optimized code path to re-use the existing object. It is even easier to optimize when using tail calls:

let segment = new Intl.Segmenter().segment(string).following().following();

And for the main use case via the Iterator interface, the engine definitely will have enough information to optimize out the object cloning.

Thoughts?

@gibson042
Copy link
Collaborator

I don't want to introduce the complexity of random-access methods cloning behind the scenes and forcing implementations to include special optimizations or suffer memory churn. Map and Set, along with their Weak counterparts, also implement a chainable "fluent" interface, as do Array methods such as ES1 sort/reverse and ES6 fill (with an honorable mention for ES5/ES6 Object/Reflect functions like defineProperties and setPrototypeOf). It is a well-established paradigm that spans all versions of ECMAScript.

I wanted the random-access methods to return iteration result objects (i.e., corresponding with next), but @littledan argued very strongly that those allocations would be prohibitively expensive. Unless there's consensus around that or some other approach, I'd like to keep this one.

@bathos
Copy link

bathos commented Dec 15, 2019

And for the main use case via the Iterator interface, the engine definitely will have enough information to optimize out the object cloning.

Is that really the case? I would have imagined the mutability of the properties in play would generally preclude that.

let a, b;

const { following } = Intl.Segmenter.prototype;

Intl.Segmenter.prototype.following = function() {
  if (a) {
    b = this;
  } else {
    a = this;
  }

  return following.call(this);
}

new Intl.Segmenter().segment(string).following().following();

console.log(a === b); // ???

@sffc
Copy link
Author

sffc commented Dec 17, 2019

Is that really the case? I would have imagined the mutability of the properties in play would generally preclude that.

Engines, as far as I understand, tend to use different code paths depending on whether or not the prototype are all built-ins. If you mutate the prototype, then you lose those optimizations.

Map and Set, along with their Weak counterparts, also implement a chainable "fluent" interface, as do Array methods

The difference is that Map, Set, and Array are data structures, whereas a segment is a "thing". My intuition, and I believe the intuition shared by others, is that you have an expectation that a data structure can be mutable, but the expectation is different for nouns.

@littledan
Copy link
Member

I doubt it would be easy to optimize the allocations out. Engines have a lot of very specific work to optimize Array iteration; these things don't generalize automatically. The optimizations that @sffc is positing seem even more difficult.

However, we can make the decision that we're OK with an API shape which will likely cause more allocations (Temporal seems to be going in this direction) for semantic reasons. If @sffc and @gibson042 want to champion a higher-level API that will likely have more allocations, I'm OK with the proposal moving in this direction. I don't have any data that allocation would be the bottleneck, and maybe I've been emphasizing this goal too much.

I'm not sure what @sffc is proposing for what following and preceding would return. I could picture a design like this:

  • There are three classes: Intl.Segmenter, %SegmentsPrototype% and %SegmentIteratorPrototype%
  • Intl.Segmenter holds options, and has a segments method which returns a %SegmentsPrototype% instance.
  • %SegmentsPrototype% is logically stateless, and just the combination of an Intl.Segmenter and a String. %SegmentsPrototype% has preceding and following methods which take a mandatory position parameter and return a { index, isWordLike, segment } object. [Symbol.iterator] returns a %SegmentIteratorPrototype% instance.
  • %SegmentIteratorPrototype% has a next method, implementing the iteration protocol. It is the only stateful class, logically having a %SegmentsPrototype% and a current position.

@sffc
Copy link
Author

sffc commented Jan 16, 2020

@hagbard How would this decision influence OmnICU?

@gibson042
Copy link
Collaborator

gibson042 commented Jan 25, 2020

Option 1 (current)
segmenterInstance.segment(string) returns a stateful SegmentIterator instance at starting position. A SegmentIterator instance exposes current state (segment contents/index/etc.) via getters on its prototype and supports next() to advance and return an iteration result representing the new state and other (random-access) methods to move to a new state and return the receiver. Use cases for those methods consist of "move to first segment", "move to last segment", "move to previous segment", and "move to segment before/after/containing a specific index", though the specific signatures are still being debated.

Option 2 (state-confining)
segmenterInstance.segment(string) returns a stateless iterable Segments instance. A Segments instance supports random-access methods to find a segment and return its contents/index/etc., and Symbol.iterator to return a stateful SegmentIterator instance whose next() method returns objects with the same shape. Use cases for random-access methods consist of "describe first segment", "describe last segment", and "describe segment before/after/containing a specific index".

Option 2 seems better to me and less divergent from existing ECMA-262 iterators, I'm just not ready to adopt it unilaterally. This is marked as blocking Stage 3, but I'm still going to put advancement on the agenda for February. I will update slides accordingly if we can resolve this before the meeting.

@hagbard
Copy link

hagbard commented Jan 25, 2020

@hagbard How would this decision influence OmnICU?

I'm not fully groking this having just dropped in, but the OmnICU model I've been envisioning would be something like what @littledan was saying (the stateless "segments" prototype would be what calls into WASM, passing position and string handle, and then updates the iterator on the host side with new position and updated state derived from the OmniICU result).

This kind of thing is going to be hard to make super efficient whatever we (OmnICU) do, and maybe we want to look at higher level functionality too (e.g. "iterate to segment with property X" as higher level).

@gibson042
Copy link
Collaborator

Resolved in favor of adopting this suggestion (summarized as Option 2) by me and @sffc and @littledan.

@sffc
Copy link
Author

sffc commented Jan 27, 2020

From January 27 phone call: adopt the option proposed by @littledan, and leave room for possible future extensions based on user demand.

@sffc
Copy link
Author

sffc commented Jan 27, 2020

Also, include a .string slot in %SegmentsPrototype% so that callers can access the original string if they are passed a Segments object.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants