Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove manual unrolling from slice::Iter(Mut)::try_fold #64600

Merged
merged 2 commits into from
Sep 30, 2019

Conversation

scottmcm
Copy link
Member

@scottmcm scottmcm commented Sep 19, 2019

While this definitely helps sometimes (particularly for trivial closures), it's also a pessimization sometimes, so it's better to leave this to (hypothetical) future LLVM improvements instead of forcing this on everyone.

I think it's better for the advice to be that sometimes you need to unroll manually than you sometimes need to not-unroll manually (like #64545).


Final perf comparison: #64600 (comment)


For context see #64572 (comment)

@rust-highfive rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Sep 19, 2019
@scottmcm
Copy link
Member Author

@bors try @rust-timer queue

@rust-timer
Copy link
Collaborator

Awaiting bors try build completion

@bors
Copy link
Contributor

bors commented Sep 19, 2019

⌛ Trying commit 38d8c8d with merge dd115ba...

bors added a commit that referenced this pull request Sep 19, 2019
[DO NOT MERGE] Experiment with removing unrolling from slice::Iter::try_fold

For context see #64572 (comment)

r? @scottmcm
@bors
Copy link
Contributor

bors commented Sep 19, 2019

☀️ Try build successful - checks-azure
Build commit: dd115ba

@rust-timer
Copy link
Collaborator

Queued dd115ba with parent eceec57, future comparison URL.

@rust-timer
Copy link
Collaborator

Finished benchmarking try commit dd115ba, comparison URL.

@nnethercote
Copy link
Contributor

nnethercote commented Sep 19, 2019

This change gets roughly half the improvements that the commit in #64572 gets.

@bluss
Copy link
Member

bluss commented Sep 20, 2019

I think that unrolling would eventually have to go and be removed from libcore, I was just hoping the compiler would catch up and be able to unroll loops with multiple exits itself. Unrolling should ideally belong to the compiler, so it can make the decision about when to duplicate code. I haven't revisited that, so for all I know llvm could have learned this by now. [Edit: checked -- rustc nightly does not unroll such things by itself right now either. I wonder if this multiple exit improvement means that things are on the way..? No clue]

This seems like a situation where it's easy to find both good and bad cases. Things like scans through bytes for parsing with a simple predicate benefit a lot from unrolling in all/find etc.

While this definitely helps sometimes (particularly for trivial closures), it's also a pessimization sometimes, so it's better to leave this to (hypothetical) future LLVM improvements instead of forcing this on everyone.

I think it's better for the advice to be that sometimes you need to unroll manually than you sometimes need to not-unroll manually (like rust-lang#64545).
@scottmcm scottmcm changed the title [DO NOT MERGE] Experiment with removing unrolling from slice::Iter::try_fold Remove manual unrolling from slice::Iter(Mut)::try_fold Sep 22, 2019
@scottmcm
Copy link
Member Author

Ok, it seems like the inclination is that we should do this so I've turned this into a "real" PR.

I do think it's better for the advice to be that sometimes you need to unroll manually than you sometimes need to not-unroll manually, though I certainly with LLVM was better at these cases.

I'm not sure who should approve this -- does it need libs sign-off?

@scottmcm scottmcm assigned bluss and unassigned scottmcm Sep 22, 2019
@scottmcm scottmcm added I-nominated T-libs-api Relevant to the library API team, which will review and decide on the PR/issue. labels Sep 22, 2019
@timvermeulen
Copy link
Contributor

Couldn't you just remove the try_fold and fold overrides entirely? They seem to now mirror the default implementations. And I think there's manual unrolling going on in the DoubleEndedIterator impl, too.

@rust-highfive

This comment has been minimized.

@scottmcm
Copy link
Member Author

@bors try @rust-timer queue

(I'm curious to see the new self-profile results, and want to make sure that removing the overrides still keeps the gain here -- it might mean more work to eliminate !s.)

@rust-timer
Copy link
Collaborator

Awaiting bors try build completion

@bors
Copy link
Contributor

bors commented Sep 24, 2019

⌛ Trying commit 6ac64ab with merge 8be3622...

bors added a commit that referenced this pull request Sep 24, 2019
Remove manual unrolling from slice::Iter(Mut)::try_fold

While this definitely helps sometimes (particularly for trivial closures), it's also a pessimization sometimes, so it's better to leave this to (hypothetical) future LLVM improvements instead of forcing this on everyone.

I think it's better for the advice to be that sometimes you need to unroll manually than you sometimes need to not-unroll manually (like #64545).

---

For context see #64572 (comment)
@bors
Copy link
Contributor

bors commented Sep 24, 2019

☀️ Try build successful - checks-azure
Build commit: 8be3622 (8be3622ad74755484fd9b9e401d0ee96837be244)

@rust-timer
Copy link
Collaborator

Queued 8be3622 with parent 66bf391, future comparison URL.

@rust-timer
Copy link
Collaborator

Finished benchmarking try commit 8be3622, comparison URL.

@bjorn3
Copy link
Member

bjorn3 commented Sep 24, 2019

script-servo-opt regressed, the rest is positive or neutral.

@scottmcm
Copy link
Member Author

Oh, interesting. script-servo-opt is new, right? I can't find it in the previous perf comparison...

@scottmcm
Copy link
Member Author

scottmcm commented Sep 24, 2019

New perf link (thank you, Mark-Simulacrum!) with self-profile results for both sides:
https://perf.rust-lang.org/compare.html?start=b4ba2a3953ea9ec28f01c314be315d46673bd782&end=8be3622ad74755484fd9b9e401d0ee96837be244

It looks like nearly all of the speedup for clap-rs-debug run clean is ~2.5s in LLVM_emit_obj (and ~0.3s in LLVM_make_bitcode). Any idea why this would be so much better there? I'd hypothesized that the closure might be getting inlined 5x even in debug, but based on a quick experiment (https://rust.godbolt.org/z/HmYVvd) that doesn't seem to happen. Since this PR still ends up using try_fold (unlike #64572), it doesn't feel like it should have had a major enough impact on the amount of code that's getting sent to LLVM to account for the size of the win.

And for script-servo-opt run baseline incremental it's LLVM_emit_obj that increased by ~13.5s with this change. Curious.

@bluss
Copy link
Member

bluss commented Sep 27, 2019

This change looks good to me, but I guess we are waiting for some discussion. I'll try to ask @Geal about nom performance and unrolling.

You know how much I would like to say we can just reimplement important stuff, like an unrolling slice iterator, outside libcore, but the libcore version is still tied up with unstable features like assume — unknown what impact they have as of current rustc version.

@Geal
Copy link
Contributor

Geal commented Sep 27, 2019

@bluss no issue for me, nom does not use try_fold nor try_rfold internally

@bluss
Copy link
Member

bluss commented Sep 27, 2019

It looks like it's not impossible for rustc to unroll an "Iterator::all" like loop. It just can't do it in the simplest forms that those loops take, for example not in for elt in v { .. }

I have some old alternative slice iterator code, and it can be automatically unrolled by the compiler. The code is here (github link to iter.rs) and there are benchmarks that show the unrolling on that specific branch. I haven't managed to reduce the loop that will unroll, though — maybe it's specific to the code in the benchmark? The compiler's unroll disappears if the special case .all() method is removed from that iterator, even though it only adds indirections (similar to using try_fold).

@scottmcm
Copy link
Member Author

@bluss do you still have r+ here, or do I need to find a different reviewer for this?

@bluss
Copy link
Member

bluss commented Sep 30, 2019

@scottmcm I guess I do, but with the nominated tag I thought we were waiting for the libs team

@bluss
Copy link
Member

bluss commented Sep 30, 2019

@bors r+ rollup=never

@bors
Copy link
Contributor

bors commented Sep 30, 2019

📌 Commit 6ac64ab has been approved by bluss

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Sep 30, 2019
@bors
Copy link
Contributor

bors commented Sep 30, 2019

⌛ Testing commit 6ac64ab with merge e0436d9...

bors added a commit that referenced this pull request Sep 30, 2019
Remove manual unrolling from slice::Iter(Mut)::try_fold

While this definitely helps sometimes (particularly for trivial closures), it's also a pessimization sometimes, so it's better to leave this to (hypothetical) future LLVM improvements instead of forcing this on everyone.

I think it's better for the advice to be that sometimes you need to unroll manually than you sometimes need to not-unroll manually (like #64545).

---

For context see #64572 (comment)
@bors
Copy link
Contributor

bors commented Sep 30, 2019

☀️ Test successful - checks-azure
Approved by: bluss
Pushing e0436d9 to master...

@bors bors added the merged-by-bors This PR was explicitly merged by bors. label Sep 30, 2019
@bors bors merged commit 6ac64ab into rust-lang:master Sep 30, 2019
@bluss
Copy link
Member

bluss commented Oct 1, 2019

@Geal is now in the rustc 1.40.0-nightly (22bc9e1 2019-09-30) nightly. I'd be surprised if it didn't impact benches in some way

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
merged-by-bors This PR was explicitly merged by bors. S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants