Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

use try_fold instead of try_for_each to reduce compile time #64885

Merged
merged 1 commit into from
Oct 2, 2019

Conversation

andjo403
Copy link
Contributor

@andjo403 andjo403 commented Sep 28, 2019

as it was stated in #64572 that the biggest gain was due to less code was generated I tried to reduce the number of functions to inline by using try_fold direct instead of calling try_for_each that calls try_fold.

as there is some gains with using the try_fold function this is maybe a way forward.
when I tried to compile the clap-rs benchmark I get times gains only some % from #64572

there is more function that use eg. fold that calls try_fold that also can be changed but the question is how mush "duplication" that is tolerated in std to give faster compile times

can someone start a perf run?

cc @nnethercote @scottmcm @bluss
r? @ghost

@Centril
Copy link
Contributor

Centril commented Sep 28, 2019

@bors try @rust-timer queue

@rust-timer
Copy link
Collaborator

Awaiting bors try build completion

@bors
Copy link
Contributor

bors commented Sep 28, 2019

⌛ Trying commit fc87c00c5b527660779dbcea0fe4291177100616 with merge 4a55e7b6a6a7beddaf5a2f71ee4d06f3a829524e...

@bors
Copy link
Contributor

bors commented Sep 29, 2019

☀️ Try build successful - checks-azure
Build commit: 4a55e7b6a6a7beddaf5a2f71ee4d06f3a829524e (4a55e7b6a6a7beddaf5a2f71ee4d06f3a829524e)

@rust-timer
Copy link
Collaborator

Queued 4a55e7b6a6a7beddaf5a2f71ee4d06f3a829524e with parent 488381c, future comparison URL.

@rust-timer
Copy link
Collaborator

Finished benchmarking try commit 4a55e7b6a6a7beddaf5a2f71ee4d06f3a829524e, comparison URL.

@andjo403
Copy link
Contributor Author

some procent better than only #64600 but still some way to go until the diffs from #64572 .
do not know how god it is to compare the procent changes for all the perf runs but as there is multiple bases for the PRs it is not possible to compare directly.

@nnethercote
Copy link
Contributor

@andjo403: #64600 has now landed. Could you rebase and update the code here? I think only the first commit will be necessary now, and we can get a fair comparison. Thanks.

removes two functions to inline by combining the check functions and extra call to try_for_each
@andjo403
Copy link
Contributor Author

andjo403 commented Oct 1, 2019

@nnethercote rebased and removed the second commit

@nnethercote
Copy link
Contributor

Thanks!

@bors try @rust-timer queue

@rust-timer
Copy link
Collaborator

Awaiting bors try build completion

@bors
Copy link
Contributor

bors commented Oct 1, 2019

⌛ Trying commit 8737061 with merge 40a3c41fdfde051926f256564c247e2ce94a667e...

@bluss
Copy link
Member

bluss commented Oct 1, 2019

Assuming we are using try_fold etc everywhere, we can still manually desugar to structs implementing FnMut instead of using closures.

Not the best abstraction level, but doesn't it look like we could save one generic item per iterator method then? Where we currently have the check functions.

@bors
Copy link
Contributor

bors commented Oct 1, 2019

☀️ Try build successful - checks-azure
Build commit: 40a3c41fdfde051926f256564c247e2ce94a667e (40a3c41fdfde051926f256564c247e2ce94a667e)

if f(x) { LoopState::Continue(()) }
else { LoopState::Break(()) }
}
}

self.try_for_each(check(f)) == LoopState::Continue(())
self.try_fold((), check(f)) == LoopState::Continue(())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thoughts on equality check vs pattern matching here, can it have an effect or none at all?

Copy link
Contributor Author

@andjo403 andjo403 Oct 1, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

made a quick diff in godbolt and there is less code to inline so that is something that I can do
ZN72$LT$example..LoopState$LT$C$C$B$GT$$u20$as$u20$core..cmp..PartialEq$GT$2eq17h37dbcaf2df999e09E is a lot to inline

Copy link
Member

@scottmcm scottmcm Oct 1, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would hope it has no effect, since LoopState<(),()> is an i1 in LLVM...

...and it is in -O, but very different in debug: https://rust.godbolt.org/z/LKOpZ7

Looks like the PartialEq::eq that gets generated is pretty bad, and it's still bad removing the generics: https://rust.godbolt.org/z/o6Nuaw Could there be a "this is a field-less enum so just compare the discriminants" path in the derive? It looks, unfortunately, like as u8 == 1 is the shortest-emitted-IR way to do these checks. And we're avoiding the derives in other places too, like

rust/src/libcore/cmp.rs

Lines 632 to 638 in 702b45e

#[stable(feature = "rust1", since = "1.0.0")]
impl Ord for Ordering {
#[inline]
fn cmp(&self, other: &Ordering) -> Ordering {
(*self as i32).cmp(&(*other as i32))
}
}

Copy link
Member

@bluss bluss Oct 1, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh interesting. So we could improve here just by implementing PartialEq manually, or even adding a separate method for just discriminant comparison. But then pattern matching works well too. Like, just a method for ".is_continue()"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but the pattern match avoids having a function to inline completely as long as a function is used the llvm-ir will contain a call and a function

@nnethercote
Copy link
Contributor

My rust-timer command above didn't work. Let's try doing it a different way:

@rust-timer build 40a3c41fdfde051926f256564c247e2ce94a667e

@rust-timer
Copy link
Collaborator

Queued 40a3c41fdfde051926f256564c247e2ce94a667e with parent 42ec683, future comparison URL.

@andjo403
Copy link
Contributor Author

andjo403 commented Oct 1, 2019

Assuming we are using try_fold etc everywhere, we can still manually desugar to structs implementing FnMut instead of using closures.
Not the best abstraction level, but doesn't it look like we could save one generic item per iterator method then? Where we currently have the check functions.

I do not understand can you show some example?

@bluss
Copy link
Member

bluss commented Oct 1, 2019

@andjo403 I have an example of a before-after change like that, that I made as PoC. Rust -Zprint-mono-items=lazy tells me this uses 1 less generic function (Before we use check<T> and the closure in the check body, after we use only Fun::call_mut (call_once is never used).) Regrettably it's from a smaller similar iterator, not the exact code in libcore.

Code here https://gist.github.com/b94c565bc5ba37206112c150b8b1cc20

It doesn't look great - maybe a macro could improve that? In fact the code looks so bad, I'm unsure we'd want to do that. 🙂

@andjo403
Copy link
Contributor Author

andjo403 commented Oct 1, 2019

thanks @bluss for the example and yes that code was hard to understand

@bluss
Copy link
Member

bluss commented Oct 1, 2019

It is equivalent to desugaring the original closure, without the "check(f) hack", but also without capturing extraneous type parameters. So a regular closure would be the same, when #46477 is fixed.

@rust-timer
Copy link
Collaborator

Finished benchmarking try commit 40a3c41fdfde051926f256564c247e2ce94a667e, comparison URL.

@Mark-Simulacrum
Copy link
Member

Crazy bots! I think I know what's wrong though, will try and fix in a bit, and silence bot for now.

@rust-lang rust-lang deleted a comment from rust-timer Oct 2, 2019
@rust-lang rust-lang deleted a comment from rust-timer Oct 2, 2019
@rust-lang rust-lang deleted a comment from rust-timer Oct 2, 2019
@rust-lang rust-lang deleted a comment from rust-timer Oct 2, 2019
@rust-lang rust-lang deleted a comment from rust-timer Oct 2, 2019
@rust-lang rust-lang deleted a comment from rust-timer Oct 2, 2019
@rust-lang rust-lang deleted a comment from rust-timer Oct 2, 2019
@rust-lang rust-lang deleted a comment from rust-timer Oct 2, 2019
@rust-lang rust-lang deleted a comment from rust-timer Oct 2, 2019
@rust-lang rust-lang deleted a comment from rust-timer Oct 2, 2019
@rust-lang rust-lang deleted a comment from rust-timer Oct 2, 2019
@rust-lang rust-lang deleted a comment from rust-timer Oct 2, 2019
@rust-lang rust-lang deleted a comment from rust-timer Oct 2, 2019
@rust-lang rust-lang deleted a comment from rust-timer Oct 2, 2019
@rust-lang rust-lang deleted a comment from rust-timer Oct 2, 2019
@rust-lang rust-lang deleted a comment from rust-timer Oct 2, 2019
@rust-lang rust-lang deleted a comment from rust-timer Oct 2, 2019
@rust-lang rust-lang deleted a comment from rust-timer Oct 2, 2019
@rust-lang rust-lang deleted a comment from rust-timer Oct 2, 2019
@rust-timer
Copy link
Collaborator

Finished benchmarking try commit 40a3c41fdfde051926f256564c247e2ce94a667e, comparison URL.

@nnethercote
Copy link
Contributor

The results are good: up to 7.5% win for clap, and lots of sub-1% wins. Really good for such a simple change!

@scottmcm scottmcm self-assigned this Oct 2, 2019
@scottmcm scottmcm changed the title [WIP] use try_fold instead of try_for_each to reduce compile time use try_fold instead of try_for_each to reduce compile time Oct 2, 2019
@scottmcm
Copy link
Member

scottmcm commented Oct 2, 2019

I'm happy with this as-is (we can explore other things like #64885 (comment) in a follow-up PR), so

@bors r+

@bors
Copy link
Contributor

bors commented Oct 2, 2019

📌 Commit 8737061 has been approved by scottmcm

@bors bors added the S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. label Oct 2, 2019
tmandry added a commit to tmandry/rust that referenced this pull request Oct 2, 2019
use try_fold instead of try_for_each to reduce compile time

as it was stated in rust-lang#64572 that the biggest gain was due to less code was generated I tried to reduce the number of functions to inline by using try_fold direct instead of calling try_for_each that calls try_fold.

as there is some gains with using the try_fold function this is maybe a way forward.
when I tried to compile the clap-rs benchmark I get times gains only some % from rust-lang#64572

there is more function that use eg. fold that calls try_fold that also can be changed but the question is how mush "duplication" that is tolerated in std to give faster compile times

can someone start a perf run?

cc @nnethercote @scottmcm @bluss
r? @ghost
bors added a commit that referenced this pull request Oct 2, 2019
Rollup of 11 pull requests

Successful merges:

 - #64649 (Avoid ICE on return outside of fn with literal array)
 - #64722 (Make all alt builders produce parallel-enabled compilers)
 - #64801 (Avoid `chain()` in `find_constraint_paths_between_regions()`.)
 - #64805 (Still more `ObligationForest` improvements.)
 - #64840 (SelfProfiler API refactoring and part one of event review)
 - #64885 (use try_fold instead of try_for_each to reduce compile time)
 - #64942 (Fix clippy warnings)
 - #64952 (Update cargo.)
 - #64974 (Fix zebra-striping in generic dataflow visualization)
 - #64978 (Fully clear `HandlerInner` in `Handler::reset_err_count`)
 - #64979 (Update books)

Failed merges:

 - #64959 (syntax: improve parameter without type suggestions)

r? @ghost
@bors bors merged commit 8737061 into rust-lang:master Oct 2, 2019
@andjo403 andjo403 deleted the iter branch October 2, 2019 10:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants