Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ci: Use multiple codegen units on non-dist bots #44675

Closed
wants to merge 1 commit into from

Conversation

alexcrichton
Copy link
Member

This commit is yet another attempt to bring down our cycle times by
parallelizing some of the long-and-serial parts of the build, for example
optimizing the libsyntax, librustc, and librustc_driver crate. The hope is that
any perf loss from codegen units is more than made up for with the perf gain
from using multiple codegen units.

The value of 16 codegen units here is pretty arbitrary, it's basically just a
number which hopefully means that the cores are always nice and warm.

@rust-highfive
Copy link
Collaborator

r? @aturon

(rust_highfive has picked a reviewer for you, use r? to override)

@alexcrichton
Copy link
Member Author

r? @aidanhs

@rust-highfive rust-highfive assigned aidanhs and unassigned aturon Sep 18, 2017
@aidanhs
Copy link
Member

aidanhs commented Sep 18, 2017

@bors r+ p=1

@bors
Copy link
Contributor

bors commented Sep 18, 2017

📌 Commit 1f9b02b has been approved by aidanhs

alexcrichton added a commit to alexcrichton/rust that referenced this pull request Sep 18, 2017
ci: Use multiple codegen units on non-dist bots

This commit is yet another attempt to bring down our cycle times by
parallelizing some of the long-and-serial parts of the build, for example
optimizing the libsyntax, librustc, and librustc_driver crate. The hope is that
any perf loss from codegen units is more than made up for with the perf gain
from using multiple codegen units.

The value of 16 codegen units here is pretty arbitrary, it's basically just a
number which hopefully means that the cores are always nice and warm.
bors added a commit that referenced this pull request Sep 18, 2017
Rollup of 10 pull requests

- Successful merges: #44364, #44466, #44537, #44640, #44651, #44657, #44661, #44668, #44671, #44675
- Failed merges:
@carols10cents carols10cents added the S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. label Sep 18, 2017
@Mark-Simulacrum
Copy link
Member

@bors r-

There's only 2 cores usually though at least on Travis, so this would seem like too many? I'm not too sure how that works out in practice though...

bors added a commit that referenced this pull request Sep 18, 2017
Rollup of 11 pull requests

- Successful merges: #44364, #44466, #44537, #44548, #44640, #44651, #44657, #44661, #44668, #44671, #44675
- Failed merges:
@mattico
Copy link
Contributor

mattico commented Sep 18, 2017

This isn't definitive data, but it's worth noticing that this build took about 10 minutes longer than every other recent PR.

@aidanhs
Copy link
Member

aidanhs commented Sep 18, 2017

After a brief look, it looks like it might be making compilation of rustc itself a touch faster, but building+running the tests a bunch slower. At minimum it'd be good to see the effect on full builds.

@alexcrichton
Copy link
Member Author

This PR is the likely cause of https://ci.appveyor.com/project/rust-lang/rust/build/1.0.4720, so I'll look into that when I get a chance.

@Mark-Simulacrum note that the number of cores and the number of codegen units don't tend to have a correlation on one another. With the support nowadays we'll never run more than 2 jobs in parallel, and trans will even attempt to finish codegen on some translation units before it moves on to even start translating the next unit.

I selected 16 here knowing that Travis only has 2 cores to ensure that we always keep the builders hot. Otherwise if we generate 2 codegen units one of them could finish codegen instantly while the other would spend the whole time translating, not actually getting us any benefit. The hope here is that with enough codegen units we can keep the cpus nice and busy and make sure that one's not idle for too much longer while the last cgu is finishing.

@mattico thanks for looking I'll probably turn this down to like 4 or 6 to have a minimal impact for now on compile times. Most of the time this just means there's missing #[inline] annotations.

@Mark-Simulacrum
Copy link
Member

It may be worth compiling the core crates and tools (libstd, libtest, librustc) with codegen units set to ~16 and then compile tests, which generally compile very fast, without codegen units, since I imagine the added parallelism there may be hurting us.

Feel free to re-r+ whenever (from my comment, at least). I didn't know that we never run more than two codegen units at the same time -- that seems like it would hurt people with more powerful computers (3 or more CPU cores).

@michaelwoerister
Copy link
Member

Another thing to note is that rustc recently gained the ability to start LLVM already after the first CGU has been translated. So having more CGUs than cpu-cores potentially does make sense, since the second thread can start working earlier.

@aidanhs
Copy link
Member

aidanhs commented Sep 19, 2017

@Mark-Simulacrum

I didn't know that we never run more than two codegen units at the same time -- that seems like it would hurt people with more powerful computers (3 or more CPU cores).

I don't think that's what @alexcrichton is saying. Instead, I believe the idea is that compilation is split into N bits of work, and then those bits of work are executed with as many units of parallelism as you have CPU cores. The problem is that you don't know if one of the bits of work is going to take 10x longer all the others, but if you do and you've just split into 2 units then you have one core idly for ages. By splitting into more you're reducing the probability of hitting this worst case (in theory).

@alexcrichton
Copy link
Member Author

@bors: r=aidanhs p=0

Hm ok I can't reproduce this locally, so I'm going to try to see the error on AppVeyor again. Also reduced to 8 codegen units.

@Mark-Simulacrum ah yes @aidanhs is right, the intention here is to reduce the chance of having one core running lots of work and one empty with work.

@bors
Copy link
Contributor

bors commented Sep 19, 2017

📌 Commit d670a77 has been approved by aidanhs

@alexcrichton
Copy link
Member Author

Also I believe the codegen-units setting here only applie to rustc/std, not any tests.

@michaelwoerister
Copy link
Member

Could we get more CPU cores on the machines that run large test suites? Running tests scales pretty well with the number of cores.

@alexcrichton
Copy link
Member Author

@michaelwoerister I wish! Unfortunately though it's not so easy :(

@michaelwoerister
Copy link
Member

Try adding echo -f "16" > /dev/cpu/num-cpus to your docker image. Don't forget the -f. That one is needed in order to work around travis's licensing model and the laws of physics.

@bors
Copy link
Contributor

bors commented Sep 21, 2017

⌛ Testing commit d670a7775309e9c0233907b426bd76e2c356004b with merge 6e7b5cd731ae1549bf0c52fa71af137046a5ec42...

@alexcrichton
Copy link
Member Author

@bors: r=aidanhs

@bors
Copy link
Contributor

bors commented Sep 21, 2017

💡 This pull request was already approved, no need to approve it again.

@bors
Copy link
Contributor

bors commented Sep 21, 2017

📌 Commit d670a77 has been approved by aidanhs

@alexcrichton
Copy link
Member Author

@bors: r=aidanhs

@bors
Copy link
Contributor

bors commented Sep 21, 2017

📌 Commit e157893 has been approved by aidanhs

@bors
Copy link
Contributor

bors commented Sep 21, 2017

⌛ Testing commit e157893859643e6fdecb21a9254f735bb2f4d926 with merge a47067e228c2ce4bcf5197be5f8fada509718c08...

@bors
Copy link
Contributor

bors commented Sep 21, 2017

💔 Test failed - status-appveyor

This commit is yet another attempt to bring down our cycle times by
parallelizing some of the long-and-serial parts of the build, for example
optimizing the libsyntax, librustc, and librustc_driver crate. The hope is that
any perf loss from codegen units is more than made up for with the perf gain
from using multiple codegen units.

The value of 8 codegen units here is pretty arbitrary, it's basically just a
number which hopefully means that the cores are always nice and warm. Also a
previous version of this commit bounced on Windows CI due to libstd being
compiled with multiple codegen units, so only the compiler is now compiled with
multiple codegen units.
@alexcrichton
Copy link
Member Author

@bors: r=aidanhs

@bors
Copy link
Contributor

bors commented Sep 21, 2017

📌 Commit 27c26da has been approved by aidanhs

let cgus = if mode == Mode::Libstd {
self.config.rust_codegen_units
} else {
self.config.rustc_codegen_units.unwrap_or(self.config.rust_codegen_units)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we make rustc_codegen_units do what it says on the tin and just apply to rustc (rather than it being an 'all-but-std' flag)? That seems to be where the majority of time goes anyway when I build the compiler.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure yeah. This I expect to be one of those "obscure options that no one ever touches", but I can change it after this PR lands or if it bounces.

@bors
Copy link
Contributor

bors commented Sep 22, 2017

⌛ Testing commit 27c26da with merge f7b98020a75f8e6701715473ffd3896bcc35a6fd...

@bors
Copy link
Contributor

bors commented Sep 22, 2017

💔 Test failed - status-travis

@kennytm
Copy link
Member

kennytm commented Sep 22, 2017

@bors retry

The two macs 3-hour timed out. Not sure if related to the current Travis incident.

It may also mean using multiple CGUs may slow down on the mac CIs though.

@bors
Copy link
Contributor

bors commented Sep 22, 2017

⌛ Testing commit 27c26da with merge 2075597...

bors added a commit that referenced this pull request Sep 22, 2017
ci: Use multiple codegen units on non-dist bots

This commit is yet another attempt to bring down our cycle times by
parallelizing some of the long-and-serial parts of the build, for example
optimizing the libsyntax, librustc, and librustc_driver crate. The hope is that
any perf loss from codegen units is more than made up for with the perf gain
from using multiple codegen units.

The value of 16 codegen units here is pretty arbitrary, it's basically just a
number which hopefully means that the cores are always nice and warm.
@alexcrichton
Copy link
Member Author

@bors r- retry

This caused rebuilds during testing I think that I want to investigate

@alexcrichton
Copy link
Member Author

Ok looking at the logs this is not clearly a win, the bootstrap was faster and the tests took way slower. Now I don't really expect predictable performance out of the OSX builders, but at least for now it seems like this is not the win we're hoping for.

@alexcrichton alexcrichton deleted the many-cgu branch September 23, 2017 04:43
@michaelwoerister
Copy link
Member

It was worth a try.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants