Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Should take() close the iterator when remaining is 0? #71

Closed
senocular opened this issue Jan 19, 2020 · 36 comments
Closed

Should take() close the iterator when remaining is 0? #71

senocular opened this issue Jan 19, 2020 · 36 comments

Comments

@senocular
Copy link

take() returns undefined when remaining is 0 without calling back up to close the iterator. I'd expect iterator to get closed when this happens. For example

function* readLines() {
  try {
    open()
    while(1) yield read()
  } finally {
    close()
  }
}

// manual 
let count = 3
for (let line of readLines()) {
  // use line
  if (--count === 0) break; // closes
}
  
// using take()
for (let line of readLines().take(3)) {
  // use line
}
// not closed
  
@devsnek
Copy link
Member

devsnek commented Jan 19, 2020

you can still use the iterator afterwards though.

@bergus
Copy link

bergus commented Jan 19, 2020

I agree it should be closed. @senocular gave the most common use case for take(), and the iterator is normally not used afterwards. Closing resources is important here.

If people do want to keep using the iterator, they should need to specify explicitly that they don't want to close it already:

const iterator = readLines();
for (let i=0; i<3; i++)
  const line = iterator.next().value;
  // use line
}
const fourthLine = iterator.next().value;
iterator.return(); // close explcitly
const iterator = readLines();
let count = 3;
for (const line of notClosing(iterator)) {
  // use line
  if (--count === 0) break;
}
const fourthLine = iterator.next().value;
iterator.return(); // close explcitly
const iterator = readLines();
for (const line of notClosing(iterator.take(3))) {
  // use line
}
const fourthLine = iterator.next().value;
iterator.return(); // close explcitly

So this notClosing helper is necessary anyway for using an iterator multiple times. We could integrate it with take though, as an optional second parameter closes = true that can be used in for (const of iterator.take(3, false)).

@benjamingr
Copy link
Contributor

We are actually running into the reverse of this in nodejs/node#46980 and it seems like in practice the expectation is for the iterator not to be closed to enable cases like:

const iter = getIterator();
doWorkOn(iter.take(4));
doWorkOn(iter.take(4)); // next 4, etc

@bakkot
Copy link
Collaborator

bakkot commented Mar 9, 2023

Hmm. I think a "get the next N items without closing the iterator" operation might be useful, but I'm not sure "take" is the appropriate name for it. Note that if the "doWorkOn" function in that example doesn't run to completion immediately you're going to get completely the wrong behavior - take isn't eager, so it only pulls from the underlying iterator when it itself is pulled from.

I see where the confusion there is coming from, though. Honestly limit might have been a better name than take...

@bakkot
Copy link
Collaborator

bakkot commented Mar 9, 2023

A second parameter as discussed above does seem reasonable though.

@benjamingr
Copy link
Contributor

.take for taking and leaving the iterator open and .limit to only take n items and then close the iterator sounds like a good idea, good dsuggestion.

@zloirock
Copy link
Contributor

zloirock commented Mar 9, 2023

const iter = (function * (i) { yield ++i })(0);
const iter1 = iter.take(4);
const iter2 = iter.take(4);
iter2.next().value; // do you expect 1 or 5?

@rluvaton
Copy link

rluvaton commented Mar 9, 2023

const iter = (function * (i) { yield ++i })(0);
const iter1 = iter.take(4);
const iter2 = iter.take(4);
iter2.next().value; // do you expect 1 or 5?

I think 1 is the more correct behavior

@Mr0grog
Copy link

Mr0grog commented Mar 9, 2023

@senocular gave the most common use case for take(), and the iterator is normally not used afterwards. Closing resources is important here.

Is this is the most common use case? It is definitely very common, but wanting to do something with the first N items and something else with the rest like @benjamingr is suggesting is also something people do a lot. It’s certainly a pattern I have needed many times.

I think a "get the next N items without closing the iterator" operation might be useful, but I'm not sure "take" is the appropriate name for it.

Yeah I think I agree. @zloirock’s example highlights how you can easily get into problems using take() to solve this use case. If doWorkOn() in @benjamingr’s example doesn’t consume the iterator passed to it immediately, The second take(4) won’t produce what you expect (which is intuitively 5, 6, 7, ...). And when the actual consumption happens in a different function like @benjamingr’s example, the issue will be much harder to diagnose.

(That said, having take(n) close the iterator when it’s done won’t actually solve the problem in @zloirock’s example, right? It still won’t close the iterator until it’s been read from N times, which doesn’t happen in the example. You’d still have the problem, just in a slightly different way — iter1 might discover iter is already closed by the time it is read from.)

Anyway! A function that returns two iterators (one for the first N items and another for the all the items after) might be more appropriate for that use case. For example:

const [first4, tail] = iter.splitAt(4);
first4.next().value === 1
tail.next().value === 5

(Other names for similar operations in other languages/tools include partition and divide.)

Or maybe a second argument to .drop(n) that is a function to receive an iterator of the dropped items. That would let you chain more nicely, but might be a little weird:

iter
  .drop(4, (first4) => {
    first4.next().value === 1
  })
  .map((item) => {
    // item = 5, then 6, then 7, ...
  });

Either of those gives people a suggested solution for needing to do one thing with the first N and something else with the rest, leaving take(n) free to close the underlying iterator when it’s done (which seems like a good thing since people will often naively call iter.take(4) and forget to close the iterator).

@rluvaton
Copy link

rluvaton commented Mar 9, 2023

What do you guys think of having an option to take?

@benjamingr
Copy link
Contributor

I think changing take -> limit like @bakkot suggested to make the behavior explicit since take does other things on other platforms and libraries is a good idea. I also think and adding a splitAt like @Mr0grog suggested would be a good resolution. What do you think @devsnek ?

@bakkot
Copy link
Collaborator

bakkot commented Mar 11, 2023

splitAt

splitAt fundamentally can't work because you can't advance the iterator for the second part until the iterator for the first part has finished. Iterators which support splitting are a different thing and require a different interface. (Java calls it Spliterator.)

You can have a "eagerly get me the first N items" method (i.e. just call .next up to N times and wrap the results in an array), which leaves the underlying iterator in the state that it now gives you the items following that. But you can't split into two iterators, at least not without more support from the underlying iterator.

take does other things on other platforms and libraries

It doesn't, really - take on an iterator always means "give me an iterator over the first N" items. See rust, scala, etc.

It's just that only JS really has a notion of automatically closing an iterator when you stop iterating it before it ends, so only JS has to answer the question of whether the underlying thing gets closed when you exhaust the result of take.

@rluvaton
Copy link

closing after take is important to avoid memory leaks

Maybe @ronag and @mcollina have some input?

What do you guys think?

@benjamingr
Copy link
Contributor

It doesn't, really - take on an iterator always means "give me an iterator over the first N" items. See rust, scala, etc.

I meant with regards to this issue (whether the "input iterator" is closed or not). Though in Rust (if I remember correctly), .take takes ownership which means you can't iterate without cloning it anyway (I'm not a rust expert).

I suspect in most other languages the bigger difference is that there are usually iterable helpers and not iterator helpers which also (somewhat) subverts the issue.

@benjamingr
Copy link
Contributor

splitAt fundamentally can't work because you can't advance the iterator for the second part until the iterator for the first part has finished.

Ok, then a different name (though I thought splitAt would mean "split the values at" and not "give me two iterators")?

@benjamingr
Copy link
Contributor

It's just that only JS really has a notion of automatically closing an iterator when you stop iterating it before it ends, so only JS has to answer the question of whether the underlying thing gets closed when you exhaust the result of take.

Node would still have to close streams (even if it knows they're usable) for safety since users expect .return() on the stream or iterating it with for... of to close it.

@bakkot
Copy link
Collaborator

bakkot commented Mar 12, 2023

though I thought splitAt would mean "split the values at" and not "give me two iterators"

Ah, sure. In that case it's the "get the next N items without closing the iterator" thing I suggested above, right?

That is, something like

Iterator.prototype.getN = function (N) {
  let out = [];
  for (let i = 0; i < N; ++i) {
    let { done, value } = this.next();
    if (done) break;
    out.push(value);
  }
  return out;
};

// usage
let iter = new Set([0, 1, 2, 3, 4, 5, 6]).values();

let first3 = iter.getN(3); // first3 is [0, 1, 2]
let next3 = iter.getN(3); // next3 is [3, 4, 5]
iter.next(); // gives { done: false, value: 6 }
iter.next(); // gives { done: true }

If that's not the API you're thinking of, can you write it out in more detail?

@Mr0grog
Copy link

Mr0grog commented Mar 12, 2023

splitAt fundamentally can't work because you can't advance the iterator for the second part until the iterator for the first part has finished… But you can't split into two iterators, at least not without more support from the underlying iterator.

For sure! I guess I assumed a reasonable splitAt(n) implementation would buffer up to n items behind the scenes (depending what order you read from the two resulting iterators in). That should work fine, and I don’t think it’s exceedingly complicated (but obviously not trivial).

I guess I’m wanting a way to address the use case without being eager, like your getN() example above is.

@bakkot
Copy link
Collaborator

bakkot commented Mar 12, 2023

If you start iterating the second thing it's forced to immediately fetch and buffer all the results from the first thing, so it's still pretty eager. That's unavoidable given the iterator API, unfortunately. I think being explicit about that (by returning an array) is probably clearer.

(I guess in principle you could make it such that attempting to advance the second half without having exhausted the first would throw, but that seems awkward to use.)

@Mr0grog
Copy link

Mr0grog commented Mar 12, 2023

If you start iterating the second thing it's forced to immediately fetch and buffer all the results from the first thing, so it's still pretty eager.

Sure. It gives you more options, though (e.g. only buffering as much as it has to), and you can continue to chain iterator helper methods off it, which lets you do more in a single, non-eager pass. It’s better than the worst case.

I think being explicit about that (by returning an array) is probably clearer.

That’s fair! I can buy that argument. 🙂

(I guess in principle you could make it such that attempting to advance the second half without having exhausted the first would throw, but that seems awkward to use.)

Yeah, I think that would entirely defeat the value of it, TBH. What I want here (even if I’m not going to get it) is something where I can safely iterate over the sections I want to treat differently, and not have to worry whether my program will crash or give the wrong results if I screw up the timing of things (easy to happen if consuming the iterator involves async operations somewhere).

@benjamingr
Copy link
Contributor

Ah, sure. In that case it's the "get the next N items without closing the iterator" thing I suggested above, right?

Right - and to be explicit the ask here is to rename take -> limit (or something similar) and to (optionally in a follow up proposal but preferably now) add a non-closing take (bikeshedding the name is less important to me to be honest and getN is fine).

By the way, who I can talk to to make sure this gets visibility? I think "visisibility" is a good first step before I approach my delegates through Node/Microsoft and the relevant orgs (e.g. TSC in the Node case) and I'd rather just fix this instead of blocking or delaying the proposal which I am very much eager/happy for.

@bakkot
Copy link
Collaborator

bakkot commented Mar 12, 2023

By the way, who I can talk to to make sure this gets visibility?

@michaelficarra (who is my coworker) is the person you want, as the most active champion. Normally I could ping him at work but I'm theoretically on vacation (whoops). I'll send a message on Matrix at least.

@benjamingr
Copy link
Contributor

Awesome, I'll do my best to hold off any changes in Node.js's implementation until we can resolve this through the spec process.

@rluvaton
Copy link

Ah, sure. In that case it's the "get the next N items without closing the iterator" thing I suggested above, right?

I think that getN sounds like you are not advancing the iterator while you are...

And the splitAt sounds confusing like it split the iterator.

I think take is a great name for non closing operation.

@ljharb
Copy link
Member

ljharb commented Mar 12, 2023

I kind of agree; “take” to me doesn’t destroy the cookie jar, it just takes some cookies out of it. I’d expect “truncate” to imply closing the iterator, for example.

@bakkot
Copy link
Collaborator

bakkot commented Mar 12, 2023

I think that getN sounds like you are not advancing the iterator while you are...

How so? You're... getting stuff from it. Getting things from an iterator is equivalent to advancing it.

I think take is a great name for non closing operation.

take is an existing, common operation in other languages, and by far the most common use of it involves no further use of the underlying iterator. So If we have a thing called take, I feel strongly that it must (by default) close the underlying thing.

@bakkot
Copy link
Collaborator

bakkot commented Mar 13, 2023

To put it another way: iterator helpers are generally set up on the assumption that you're not going to re-use the receiver. That's because the receiver draws from the same source as the result of the helper, which means that using both of them will result in seeing skipped items. So I don't think we should have any methods which encourage using both the result of the helper and also still using the original iterator (at least not without you having to carefully opt in) - i.e., we should not have a non-closing take.

@benjamingr
Copy link
Contributor

I think that getN sounds like you are not advancing the iterator while you are...
How so? You're... getting stuff from it. Getting things from an iterator is equivalent to advancing it.

That is also my intuition.

take is an existing, common operation in other languages, and by far the most common use of it involves no further use of the underlying iterator. So If we have a thing called take, I feel strongly that it must (by default) close the underlying thing.

I still don't think that's the common case in other languages, usages of chaining .Skip(N).Take(N) are very common in .NET for example. That said - our case is kind of different from theirs because of iterable/iterator.

I think it's best to be cautious and avoid ambiguous naming - the data point we do have right now is that with the current proposal users found the behavior confusing and unintuitive.

i.e., we should not have a non-closing take.

I'm not sure I agree given people are asking for this. I would be content with naming the existing one in a way that is less ambiguous like .limit

@benjamingr
Copy link
Contributor

And to be explicit I agree with @bakkot that having a non-closing .take if a closing take is available in other languages in rust is problematic.

@rbuckton

This comment was marked as duplicate.

@rbuckton

This comment was marked as outdated.

@bakkot
Copy link
Collaborator

bakkot commented Mar 13, 2023

And to be explicit I agree with @bakkot that having a non-closing .take if a closing take is available in other languages in rust is problematic.

Sorry, I didn't mean to claim that. There's not a closing take in other languages because there's not a notion of "closed" in other languages (mostly). It's just that other languages haven't had to make a decision either way, so whichever decision we're making on the "closing" part, we can't look to other languages as precedent - the basic "take gives you an iterator over the first N items" part is well established, but the "does it close" question has never had to be answered.

I just think any method ought to close (by default), because the normal way of using these things is that you don't use the receiver again after chaining a helper off of it. (Note that .Skip(N).Take(N), mentioned above, is in fact discarding the receiver.) Indeed that's kind of a foundational assumption, because the helpers and the receiver both draw from the same source, so using both of them creates a race. We don't want to encourage that.

With that said, I am OK with having an option to give you non-closing (as in .take(N, { leaveOpen: true }) or whatever) - having to find the option at least gives you a chance to be warned about the risk of races or leaks. But from discussion above it seems like limit is a better name for the "closing" version, and the option makes less sense on limit, I think.

Maybe it's still fine? .limit(N, { leaveOpen: true }) or whatever? Otoh I think I personally would want the array-returning version more often, which does not have a risk of races, and .getN(5) is a lot easier to use than .limit(N, { leaveOpen: true }).toArray(). Admittedly it is less general, though.

@jridgewell
Copy link
Member

We are actually running into the reverse of this in nodejs/node#46980 and it seems like in practice the expectation is for the iterator not to be closed to enable cases like:

const iter = getIterator();
doWorkOn(iter.take(4));
doWorkOn(iter.take(4)); // next 4, etc

FWIW, this seems more like a chunk. The race could be fixed, too, by preventing the next outer .next() call until all inner .next() calls are complete:

const iter = getIterator().chunk(4);
for (const chunk of iter) {
  for (const it of chunk) {
  }
}

@littledan
Copy link
Member

littledan commented Mar 14, 2023

Name bike shed if we are leaving it closed: cap.

(Edit: I also like limit and don't have any strong preferences here.)

@bakkot
Copy link
Collaborator

bakkot commented Mar 14, 2023

I should say that I suggested limit because that's what Java calls it. (It also uses "skip" instead of "drop", for whatever reason.)

@rluvaton
Copy link

We are actually running into the reverse of this in nodejs/node#46980 and it seems like in practice the expectation is for the iterator not to be closed to enable cases like:

const iter = getIterator();
doWorkOn(iter.take(4));
doWorkOn(iter.take(4)); // next 4, etc

FWIW, this seems more like a chunk. The race could be fixed, too, by preventing the next outer .next() call until all inner .next() calls are complete:

const iter = getIterator().chunk(4);
for (const chunk of iter) {
  for (const it of chunk) {
  }
}

IMO chunk is a less generic solution to non-closing take as it requires the data to be evenly distributed which in many cases it's not

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests