Should take() close the iterator when remaining is 0? #71

senocular · 2020-01-19T13:59:18Z

take() returns undefined when remaining is 0 without calling back up to close the iterator. I'd expect iterator to get closed when this happens. For example

function* readLines() {
  try {
    open()
    while(1) yield read()
  } finally {
    close()
  }
}

// manual 
let count = 3
for (let line of readLines()) {
  // use line
  if (--count === 0) break; // closes
}
  
// using take()
for (let line of readLines().take(3)) {
  // use line
}
// not closed

devsnek · 2020-01-19T17:34:06Z

you can still use the iterator afterwards though.

bergus · 2020-01-19T21:42:11Z

I agree it should be closed. @senocular gave the most common use case for take(), and the iterator is normally not used afterwards. Closing resources is important here.

If people do want to keep using the iterator, they should need to specify explicitly that they don't want to close it already:

const iterator = readLines();
for (let i=0; i<3; i++)
  const line = iterator.next().value;
  // use line
}
const fourthLine = iterator.next().value;
iterator.return(); // close explcitly

const iterator = readLines();
let count = 3;
for (const line of notClosing(iterator)) {
  // use line
  if (--count === 0) break;
}
const fourthLine = iterator.next().value;
iterator.return(); // close explcitly

const iterator = readLines();
for (const line of notClosing(iterator.take(3))) {
  // use line
}
const fourthLine = iterator.next().value;
iterator.return(); // close explcitly

So this notClosing helper is necessary anyway for using an iterator multiple times. We could integrate it with take though, as an optional second parameter closes = true that can be used in for (const of iterator.take(3, false)).

benjamingr · 2023-03-09T11:22:35Z

We are actually running into the reverse of this in nodejs/node#46980 and it seems like in practice the expectation is for the iterator not to be closed to enable cases like:

const iter = getIterator();
doWorkOn(iter.take(4));
doWorkOn(iter.take(4)); // next 4, etc

bakkot · 2023-03-09T11:47:44Z

Hmm. I think a "get the next N items without closing the iterator" operation might be useful, but I'm not sure "take" is the appropriate name for it. Note that if the "doWorkOn" function in that example doesn't run to completion immediately you're going to get completely the wrong behavior - take isn't eager, so it only pulls from the underlying iterator when it itself is pulled from.

I see where the confusion there is coming from, though. Honestly limit might have been a better name than take...

bakkot · 2023-03-09T12:17:48Z

A second parameter as discussed above does seem reasonable though.

benjamingr · 2023-03-09T12:43:56Z

.take for taking and leaving the iterator open and .limit to only take n items and then close the iterator sounds like a good idea, good dsuggestion.

zloirock · 2023-03-09T13:19:18Z

const iter = (function * (i) { yield ++i })(0);
const iter1 = iter.take(4);
const iter2 = iter.take(4);
iter2.next().value; // do you expect 1 or 5?

rluvaton · 2023-03-09T13:34:33Z

const iter = (function * (i) { yield ++i })(0);
const iter1 = iter.take(4);
const iter2 = iter.take(4);
iter2.next().value; // do you expect 1 or 5?

I think 1 is the more correct behavior

Mr0grog · 2023-03-09T19:36:59Z

@senocular gave the most common use case for take(), and the iterator is normally not used afterwards. Closing resources is important here.

Is this is the most common use case? It is definitely very common, but wanting to do something with the first N items and something else with the rest like @benjamingr is suggesting is also something people do a lot. It’s certainly a pattern I have needed many times.

I think a "get the next N items without closing the iterator" operation might be useful, but I'm not sure "take" is the appropriate name for it.

Yeah I think I agree. @zloirock’s example highlights how you can easily get into problems using take() to solve this use case. If doWorkOn() in @benjamingr’s example doesn’t consume the iterator passed to it immediately, The second take(4) won’t produce what you expect (which is intuitively 5, 6, 7, ...). And when the actual consumption happens in a different function like @benjamingr’s example, the issue will be much harder to diagnose.

(That said, having take(n) close the iterator when it’s done won’t actually solve the problem in @zloirock’s example, right? It still won’t close the iterator until it’s been read from N times, which doesn’t happen in the example. You’d still have the problem, just in a slightly different way — iter1 might discover iter is already closed by the time it is read from.)

Anyway! A function that returns two iterators (one for the first N items and another for the all the items after) might be more appropriate for that use case. For example:

const [first4, tail] = iter.splitAt(4);
first4.next().value === 1
tail.next().value === 5

(Other names for similar operations in other languages/tools include partition and divide.)

Or maybe a second argument to .drop(n) that is a function to receive an iterator of the dropped items. That would let you chain more nicely, but might be a little weird:

iter
  .drop(4, (first4) => {
    first4.next().value === 1
  })
  .map((item) => {
    // item = 5, then 6, then 7, ...
  });

Either of those gives people a suggested solution for needing to do one thing with the first N and something else with the rest, leaving take(n) free to close the underlying iterator when it’s done (which seems like a good thing since people will often naively call iter.take(4) and forget to close the iterator).

rluvaton · 2023-03-09T19:45:17Z

What do you guys think of having an option to take?

benjamingr · 2023-03-10T18:23:40Z

I think changing take -> limit like @bakkot suggested to make the behavior explicit since take does other things on other platforms and libraries is a good idea. I also think and adding a splitAt like @Mr0grog suggested would be a good resolution. What do you think @devsnek ?

bakkot · 2023-03-11T00:21:07Z

splitAt

splitAt fundamentally can't work because you can't advance the iterator for the second part until the iterator for the first part has finished. Iterators which support splitting are a different thing and require a different interface. (Java calls it Spliterator.)

You can have a "eagerly get me the first N items" method (i.e. just call .next up to N times and wrap the results in an array), which leaves the underlying iterator in the state that it now gives you the items following that. But you can't split into two iterators, at least not without more support from the underlying iterator.

take does other things on other platforms and libraries

It doesn't, really - take on an iterator always means "give me an iterator over the first N" items. See rust, scala, etc.

It's just that only JS really has a notion of automatically closing an iterator when you stop iterating it before it ends, so only JS has to answer the question of whether the underlying thing gets closed when you exhaust the result of take.

rluvaton · 2023-03-11T16:39:04Z

closing after take is important to avoid memory leaks

Maybe @ronag and @mcollina have some input?

What do you guys think?

benjamingr · 2023-03-11T19:08:26Z

It doesn't, really - take on an iterator always means "give me an iterator over the first N" items. See rust, scala, etc.

I meant with regards to this issue (whether the "input iterator" is closed or not). Though in Rust (if I remember correctly), .take takes ownership which means you can't iterate without cloning it anyway (I'm not a rust expert).

I suspect in most other languages the bigger difference is that there are usually iterable helpers and not iterator helpers which also (somewhat) subverts the issue.

benjamingr · 2023-03-11T19:09:28Z

splitAt fundamentally can't work because you can't advance the iterator for the second part until the iterator for the first part has finished.

Ok, then a different name (though I thought splitAt would mean "split the values at" and not "give me two iterators")?

benjamingr · 2023-03-11T19:10:46Z

It's just that only JS really has a notion of automatically closing an iterator when you stop iterating it before it ends, so only JS has to answer the question of whether the underlying thing gets closed when you exhaust the result of take.

Node would still have to close streams (even if it knows they're usable) for safety since users expect .return() on the stream or iterating it with for... of to close it.

bakkot · 2023-03-12T00:08:41Z

though I thought splitAt would mean "split the values at" and not "give me two iterators"

Ah, sure. In that case it's the "get the next N items without closing the iterator" thing I suggested above, right?

That is, something like

Iterator.prototype.getN = function (N) {
  let out = [];
  for (let i = 0; i < N; ++i) {
    let { done, value } = this.next();
    if (done) break;
    out.push(value);
  }
  return out;
};

// usage
let iter = new Set([0, 1, 2, 3, 4, 5, 6]).values();

let first3 = iter.getN(3); // first3 is [0, 1, 2]
let next3 = iter.getN(3); // next3 is [3, 4, 5]
iter.next(); // gives { done: false, value: 6 }
iter.next(); // gives { done: true }

If that's not the API you're thinking of, can you write it out in more detail?

Mr0grog · 2023-03-12T01:40:22Z

splitAt fundamentally can't work because you can't advance the iterator for the second part until the iterator for the first part has finished… But you can't split into two iterators, at least not without more support from the underlying iterator.

For sure! I guess I assumed a reasonable splitAt(n) implementation would buffer up to n items behind the scenes (depending what order you read from the two resulting iterators in). That should work fine, and I don’t think it’s exceedingly complicated (but obviously not trivial).

I guess I’m wanting a way to address the use case without being eager, like your getN() example above is.

bakkot · 2023-03-12T02:11:46Z

If you start iterating the second thing it's forced to immediately fetch and buffer all the results from the first thing, so it's still pretty eager. That's unavoidable given the iterator API, unfortunately. I think being explicit about that (by returning an array) is probably clearer.

(I guess in principle you could make it such that attempting to advance the second half without having exhausted the first would throw, but that seems awkward to use.)

Mr0grog · 2023-03-12T02:33:50Z

If you start iterating the second thing it's forced to immediately fetch and buffer all the results from the first thing, so it's still pretty eager.

Sure. It gives you more options, though (e.g. only buffering as much as it has to), and you can continue to chain iterator helper methods off it, which lets you do more in a single, non-eager pass. It’s better than the worst case.

I think being explicit about that (by returning an array) is probably clearer.

That’s fair! I can buy that argument. 🙂

(I guess in principle you could make it such that attempting to advance the second half without having exhausted the first would throw, but that seems awkward to use.)

Yeah, I think that would entirely defeat the value of it, TBH. What I want here (even if I’m not going to get it) is something where I can safely iterate over the sections I want to treat differently, and not have to worry whether my program will crash or give the wrong results if I screw up the timing of things (easy to happen if consuming the iterator involves async operations somewhere).

benjamingr · 2023-03-12T14:01:00Z

Ah, sure. In that case it's the "get the next N items without closing the iterator" thing I suggested above, right?

Right - and to be explicit the ask here is to rename take -> limit (or something similar) and to (optionally in a follow up proposal but preferably now) add a non-closing take (bikeshedding the name is less important to me to be honest and getN is fine).

By the way, who I can talk to to make sure this gets visibility? I think "visisibility" is a good first step before I approach my delegates through Node/Microsoft and the relevant orgs (e.g. TSC in the Node case) and I'd rather just fix this instead of blocking or delaying the proposal which I am very much eager/happy for.

bakkot · 2023-03-12T14:12:30Z

By the way, who I can talk to to make sure this gets visibility?

@michaelficarra (who is my coworker) is the person you want, as the most active champion. Normally I could ping him at work but I'm theoretically on vacation (whoops). I'll send a message on Matrix at least.

benjamingr · 2023-03-12T14:17:36Z

Awesome, I'll do my best to hold off any changes in Node.js's implementation until we can resolve this through the spec process.

rluvaton · 2023-03-12T19:36:46Z

Ah, sure. In that case it's the "get the next N items without closing the iterator" thing I suggested above, right?

I think that getN sounds like you are not advancing the iterator while you are...

And the splitAt sounds confusing like it split the iterator.

I think take is a great name for non closing operation.

ljharb · 2023-03-12T19:42:05Z

I kind of agree; “take” to me doesn’t destroy the cookie jar, it just takes some cookies out of it. I’d expect “truncate” to imply closing the iterator, for example.

bakkot · 2023-03-12T23:57:30Z

I think that getN sounds like you are not advancing the iterator while you are...

How so? You're... getting stuff from it. Getting things from an iterator is equivalent to advancing it.

I think take is a great name for non closing operation.

take is an existing, common operation in other languages, and by far the most common use of it involves no further use of the underlying iterator. So If we have a thing called take, I feel strongly that it must (by default) close the underlying thing.

bakkot · 2023-03-13T00:33:28Z

To put it another way: iterator helpers are generally set up on the assumption that you're not going to re-use the receiver. That's because the receiver draws from the same source as the result of the helper, which means that using both of them will result in seeing skipped items. So I don't think we should have any methods which encourage using both the result of the helper and also still using the original iterator (at least not without you having to carefully opt in) - i.e., we should not have a non-closing take.

benjamingr · 2023-03-13T10:44:24Z

I think that getN sounds like you are not advancing the iterator while you are...
How so? You're... getting stuff from it. Getting things from an iterator is equivalent to advancing it.

That is also my intuition.

take is an existing, common operation in other languages, and by far the most common use of it involves no further use of the underlying iterator. So If we have a thing called take, I feel strongly that it must (by default) close the underlying thing.

I still don't think that's the common case in other languages, usages of chaining .Skip(N).Take(N) are very common in .NET for example. That said - our case is kind of different from theirs because of iterable/iterator.

I think it's best to be cautious and avoid ambiguous naming - the data point we do have right now is that with the current proposal users found the behavior confusing and unintuitive.

i.e., we should not have a non-closing take.

I'm not sure I agree given people are asking for this. I would be content with naming the existing one in a way that is less ambiguous like .limit

benjamingr · 2023-03-13T11:35:29Z

And to be explicit I agree with @bakkot that having a non-closing .take if a closing take is available in other languages in rust is problematic.

bakkot · 2023-03-13T20:49:50Z

And to be explicit I agree with @bakkot that having a non-closing .take if a closing take is available in other languages in rust is problematic.

Sorry, I didn't mean to claim that. There's not a closing take in other languages because there's not a notion of "closed" in other languages (mostly). It's just that other languages haven't had to make a decision either way, so whichever decision we're making on the "closing" part, we can't look to other languages as precedent - the basic "take gives you an iterator over the first N items" part is well established, but the "does it close" question has never had to be answered.

I just think any method ought to close (by default), because the normal way of using these things is that you don't use the receiver again after chaining a helper off of it. (Note that .Skip(N).Take(N), mentioned above, is in fact discarding the receiver.) Indeed that's kind of a foundational assumption, because the helpers and the receiver both draw from the same source, so using both of them creates a race. We don't want to encourage that.

With that said, I am OK with having an option to give you non-closing (as in .take(N, { leaveOpen: true }) or whatever) - having to find the option at least gives you a chance to be warned about the risk of races or leaks. But from discussion above it seems like limit is a better name for the "closing" version, and the option makes less sense on limit, I think.

Maybe it's still fine? .limit(N, { leaveOpen: true }) or whatever? Otoh I think I personally would want the array-returning version more often, which does not have a risk of races, and .getN(5) is a lot easier to use than .limit(N, { leaveOpen: true }).toArray(). Admittedly it is less general, though.

jridgewell · 2023-03-13T22:05:34Z

We are actually running into the reverse of this in nodejs/node#46980 and it seems like in practice the expectation is for the iterator not to be closed to enable cases like:
const iter = getIterator();
doWorkOn(iter.take(4));
doWorkOn(iter.take(4)); // next 4, etc

FWIW, this seems more like a chunk. The race could be fixed, too, by preventing the next outer .next() call until all inner .next() calls are complete:

const iter = getIterator().chunk(4);
for (const chunk of iter) {
  for (const it of chunk) {
  }
}

littledan · 2023-03-14T20:00:27Z

Name bike shed if we are leaving it closed: cap.

(Edit: I also like limit and don't have any strong preferences here.)

bakkot · 2023-03-14T20:23:02Z

I should say that I suggested limit because that's what Java calls it. (It also uses "skip" instead of "drop", for whatever reason.)

rluvaton · 2023-03-16T19:53:39Z

We are actually running into the reverse of this in nodejs/node#46980 and it seems like in practice the expectation is for the iterator not to be closed to enable cases like:
const iter = getIterator();
doWorkOn(iter.take(4));
doWorkOn(iter.take(4)); // next 4, etc
FWIW, this seems more like a chunk. The race could be fixed, too, by preventing the next outer .next() call until all inner .next() calls are complete:
const iter = getIterator().chunk(4);
for (const chunk of iter) {
  for (const it of chunk) {
  }
}

IMO chunk is a less generic solution to non-closing take as it requires the data to be evenly distributed which in many cases it's not

devsnek closed this as completed in 7120496 Apr 30, 2020

rluvaton mentioned this issue Mar 11, 2023

stream: Add option to Readable.take operator to not close the stream nodejs/node#46980

Closed

This comment was marked as duplicate.

Sign in to view

This comment was marked as outdated.

Sign in to view

bakkot mentioned this issue Mar 15, 2023

renaming take #270

Closed

michaelficarra mentioned this issue Mar 15, 2023

stream: allow stream to stay open after take nodejs/node#47023

Open

2 tasks

bergus mentioned this issue Feb 20, 2024

A flaw in JavaScript iteration leaves a weird behavior with take, first, some, etc. #293

Closed

Should take() close the iterator when remaining is 0? #71

Should take() close the iterator when remaining is 0? #71

Comments

senocular commented Jan 19, 2020

devsnek commented Jan 19, 2020

bergus commented Jan 19, 2020

benjamingr commented Mar 9, 2023

bakkot commented Mar 9, 2023

bakkot commented Mar 9, 2023

benjamingr commented Mar 9, 2023

zloirock commented Mar 9, 2023

rluvaton commented Mar 9, 2023

Mr0grog commented Mar 9, 2023

rluvaton commented Mar 9, 2023

benjamingr commented Mar 10, 2023

bakkot commented Mar 11, 2023

rluvaton commented Mar 11, 2023

benjamingr commented Mar 11, 2023

benjamingr commented Mar 11, 2023

benjamingr commented Mar 11, 2023

bakkot commented Mar 12, 2023 • edited Loading

Mr0grog commented Mar 12, 2023 • edited Loading

bakkot commented Mar 12, 2023

Mr0grog commented Mar 12, 2023 • edited Loading

benjamingr commented Mar 12, 2023

bakkot commented Mar 12, 2023

benjamingr commented Mar 12, 2023

rluvaton commented Mar 12, 2023

ljharb commented Mar 12, 2023

bakkot commented Mar 12, 2023

bakkot commented Mar 13, 2023

benjamingr commented Mar 13, 2023

benjamingr commented Mar 13, 2023

This comment was marked as duplicate.

This comment was marked as outdated.

bakkot commented Mar 13, 2023 • edited Loading

jridgewell commented Mar 13, 2023

littledan commented Mar 14, 2023 • edited Loading

bakkot commented Mar 14, 2023

rluvaton commented Mar 16, 2023

bakkot commented Mar 12, 2023 •

edited

Loading

Mr0grog commented Mar 12, 2023 •

edited

Loading

Mr0grog commented Mar 12, 2023 •

edited

Loading

bakkot commented Mar 13, 2023 •

edited

Loading

littledan commented Mar 14, 2023 •

edited

Loading