Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inclusive vs exclusive indices for preceding/following #92

Closed
sffc opened this issue Dec 13, 2019 · 10 comments
Closed

Inclusive vs exclusive indices for preceding/following #92

sffc opened this issue Dec 13, 2019 · 10 comments

Comments

@sffc
Copy link

sffc commented Dec 13, 2019

I want to get the segment containing the code unit at index i. How do I do it? Basically, what I want is:

// Input string: "hello world"
//                01234567890

for (let i=0; i<input.length; i++) {
  let segmentStart = new Intl.Segmenter("en").segment(input).ZZZZZ(i).index;
  console.log(segmentStart);
}

// Expected output: 0, 0, 0, 0, 0, 5, 6, 6, 6, 6, 6

Reading the spec, it appears that both following and preceding use an exclusive index. So, I think I could get what I want by using .preceding(i+1) or .following(i-1). But, that doesn't seem as nice to me. Personally, I would find it more intuitive if .preceding() used an exclusive index, and .following() was inclusive, such that .following(i) gave me the behavior I want.

@gibson042
Copy link
Collaborator

Reading the spec, it appears that both following and preceding use an exclusive index. So, I think I could get what I want by using .preceding(i+1) or .following(i-1).

Correct.

But, that doesn't seem as nice to me. Personally, I would find it more intuitive if .preceding() used an exclusive index, and .following() was inclusive, such that .following(i) gave me the behavior I want.

We need segments.following() to advance the iterator, and I believe it should behave identically to segments.following(segments.index) for symmetry with preceding (in terms of both behavior and English-semantics naming). If instead following were inclusive and segments.following() behaved like segments.following(segments.index + 1), we'd need to explain why preceding/following have similar names but dissimilar behavior for identical input, and there'd also be unfortunate resulting edge-case behavior (e.g., (new Intl.Segmenter("en")).segment(str).following(str.length) not throwing). Keeping both methods exclusive seems like the lesser evil.

@sffc
Copy link
Author

sffc commented Dec 13, 2019

OK, thanks for the explanation. I can see some elegance in the symmetry between .following() and .following(index), but that's not how I see it. I see passing an argument and not passing an argument as two fundamentally different operations. When you pass an argument, you are asking for random access behavior, and without an argument, you are asking for it to take a single step in one direction or another, not regarding random access.

Virtually all programming languages, including JavaScript, have come to agree that the left index should be inclusive, and the right index should be exclusive, and that's the intuitive behavior that I would want to expect from this API.

Separately, I've always thought of indices as a cursor pointing between two elements of an array:

Elements:  a b c d e|f g
Indices:  0 1 2 3 4 5 6 7
                    |
Cursor @ 5:         ^

If f is the start of a segment, then I expect .following(5) to get the segment starting at f, because it is the full complete segment following the cursor at index 5.

@sffc
Copy link
Author

sffc commented Dec 13, 2019

I would also be happy if we had a new method .containing(i), and removed the versions of .preceding() and .following() that take an index argument. Then, my use case would simply use the .containing(i) function, and if you really want the segment strictly before or strictly after an index, then you can do .containing(i).preceding() or .containing(i).following().

If we do this, consider renaming .preceding() to .back() and .following() to .next().

@gibson042
Copy link
Collaborator

next is already claimed by the Iterator interface; it is literally the only thing that following cannot be renamed to. However, I do get your point about indices identifying positions between items, and in fact have made the same argument before but abandoned it to make more clear that segments.preceding(n) cannot stop on a segment whose first code unit is at index n. The problem with dropping parametes, though, is that we'd lose segments.preceding(Infinity) for seeking to the last segment (#83).

We could switch to the asymmetric behavior you're after, but I don't want to do so with the current antonym pair. What would you think about this refactoring?

  • Switch from argument-exclusive following(startAfter = receiver.index) to argument-inclusive iterate(from = receiver.index + 1) (or "advance", or some other similar verb).
  • Rename preceding(startBefore = receiver.index) to iterateBack(startBefore = receiver.index) (or "advanceBack", or perhaps just "back", or "regress" or some other similar verb).

@sffc
Copy link
Author

sffc commented Dec 15, 2019

Sure, in my opinion, your suggestion is better than the status quo.

Another idea:

  1. .search(index) produces the segment containing the code unit at index
  2. .first() == .search(0)
  3. .last() == .search(string.length-1)
  4. .forward() advances to the next segment
  5. .backward() regresses to the previous segment

@littledan
Copy link
Member

I'm glad to see some thoughts here from a use case perspective. Lots of interesting ideas here; I guess we need to come to a conclusion before Stage 3.

@gibson042
Copy link
Collaborator

If we make the random-access methods stateless as proposed by #93, then I expect to define containing(inclusiveIndex) (where containing(0) returns the first segment, containing(len-1) returns the last, and containing(outOfBounds) returns null) and probably also before(exclusiveIndex){ return this.containing(Math.min(exclusiveIndex, len) - 1) } for convenience, but nothing else since "last segment", "preceding segment", and "following segment" are all straightforward (respectively, before(Infinity) vs. before(current.index) vs. containing(current.index + current.segment.length)).

If they instead remain state-mutating, then I'm thinking more and more that indicating that with verbs is important, and would want an seek([inclusiveStartIndex = receiver.index + 1]) and an argument-exclusive seekBefore([exclusiveLastIndex = receiver.index]) but probably nothing else. Move to first segment is seek(0), move to last is seekBefore(Infinity), move to previous is seekBefore(), move to next is seek(), and move to containing segment is seekBefore(index + 1).

@littledan
Copy link
Member

@gibson042 , @sffc and I discussed this issue in a call, and agreed that the containing method seemed like a good and sufficient option, in conjunction with a move to a stateless API.

@gibson042
Copy link
Collaborator

gibson042 commented Jan 27, 2020

Resolved in favor of %SegmentsPrototype%.prototype.containing(inclusiveIndex) (without before but with get %SegmentsPrototype%.string) by me and @sffc and @littledan. Out-of-bounds input will result in undefined return values, paralleling out-of-bounds array access.

@sffc
Copy link
Author

sffc commented Jan 27, 2020

We also discussed that .containing(i) should return undefined if i is out of range because:

  1. string[i] is undefined
  2. Works nicely with ?. (optional chaining)
  3. If the string is empty, we don't want an exception per How do you fast-forward to the end of the string? #83

gibson042 added a commit to gibson042/proposal-intl-segmenter that referenced this issue Feb 5, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants