Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ci: Bring back ninja for dist builders #103295

Merged
merged 2 commits into from
Oct 30, 2022
Merged

Conversation

ishitatsuyuki
Copy link
Contributor

The primary reason for this is that make can result in a substantial under utilization of parallelism (noticed while testing on a workstation), mostly due to the submake structure preventing good dependency tracking and scheduling.

In f758c7b (Debian 6 doesn't have ninja, so use make for the dist builds) llvm.ninja was disabled due to lack of distro package. This is no longer the case with the CentOS 7 base, so bring ninja back for a performance boost.

The primary reason for this is that make can result in a substantial
under utilization of parallelism, mostly due to the submake structure
preventing good dependency tracking and scheduling.

In f758c7b (Debian 6 doesn't have ninja, so use make for the dist builds)
llvm.ninja was disabled due to lack of distro package. This is no longer the
case with the CentOS 7 base, so bring ninja back for a performance boost.
Now that we have brought ninja back to the installed base packages, we can
use it for the initial Clang build as well.
@rustbot rustbot added A-testsuite Area: The testsuite used to check the correctness of rustc T-infra Relevant to the infrastructure team, which will review and decide on the PR/issue. labels Oct 20, 2022
@rust-highfive
Copy link
Collaborator

r? @pietroalbini

(rust-highfive has picked a reviewer for you, use r? to override)

@rust-highfive rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Oct 20, 2022
@Kobzol
Copy link
Contributor

Kobzol commented Oct 20, 2022

FYI I tried this recently (#96969), but that was before the Debian bump. Let's see what happens now.

@bors try

@bors
Copy link
Contributor

bors commented Oct 20, 2022

⌛ Trying commit 354e95a with merge 431cb4d2a7b345b52167aa0f2d16473c8e0fb01b...

@ishitatsuyuki
Copy link
Contributor Author

If that's the case I suppose it won't be much different this time, though one benefit of this is that it's quite handy when you test CI builds on a high-core workstation (make can only achieve ~32 load average out of 128 cores).

@bors
Copy link
Contributor

bors commented Oct 20, 2022

☀️ Try build successful - checks-actions
Build commit: 431cb4d2a7b345b52167aa0f2d16473c8e0fb01b (431cb4d2a7b345b52167aa0f2d16473c8e0fb01b)

@ishitatsuyuki
Copy link
Contributor Author

Comparing the final build of LLVM (which doesn't hit the cache due to PGO) of this PR to a recent master build (https://github.com/rust-lang-ci/rust/actions/runs/3301120227/jobs/5447282596), this PR took 15m11s while the master build took 12m58s.

It looks like the CI machines suffers a lot from noisy neighbors though, hence I don't think we'll be able to get a conclusion of significance out of the timings.

Given the simplicity of doing this (we don't need extra build-from-source scripts unlike #96969) I still think this is worth doing, mainly for convenience of testing on many-core machines.

@Kobzol
Copy link
Contributor

Kobzol commented Oct 22, 2022

@bors try

(we should do multiple builds, preferably in rapid succession, to make sure that caches are primed)

@bors
Copy link
Contributor

bors commented Oct 22, 2022

⌛ Trying commit 354e95a with merge 2bfc8954ac2c8143b4f5c0c1fcddffdb7101fc84...

@ishitatsuyuki
Copy link
Contributor Author

Sure, a successive build would hit the Docker and first-stage LLVM cache, but what are we trying to measure here? Given the variance of CI machines, we'll need hundreds of builds to get a statistically significant comparison, which I don't think is worth the effort.

The timing I measured above is from the PGOed build, whose --instr-use flag explicitly disable sccache. I doubt cache is what that matters here.

@Kobzol
Copy link
Contributor

Kobzol commented Oct 22, 2022

Yeah it's not about statistical significance, just to see the final dist build number without the broken Docker cache, so that we can make sure that it stays in the same ballpark.

@mati865
Copy link
Contributor

mati865 commented Oct 22, 2022

While comparing times you should also check CPU on particular runner (I think it's still logged). Discrepancy between slowest and fastest machines is rather big on GHA.

@bors
Copy link
Contributor

bors commented Oct 22, 2022

☀️ Try build successful - checks-actions
Build commit: 2bfc8954ac2c8143b4f5c0c1fcddffdb7101fc84 (2bfc8954ac2c8143b4f5c0c1fcddffdb7101fc84)

@ishitatsuyuki
Copy link
Contributor Author

OK, sccache is hitting now, but the Docker one doesn't. Maybe the cache key is different for try builds?

@Kobzol
Copy link
Contributor

Kobzol commented Oct 22, 2022

@bors try

From my experience it's just a matter of running the build 2-3 times in a quick succession to make all caches work.

@bors
Copy link
Contributor

bors commented Oct 22, 2022

⌛ Trying commit 354e95a with merge 0c08aa9071086a229d1b9bd610d0f21617610848...

@bors
Copy link
Contributor

bors commented Oct 22, 2022

☀️ Try build successful - checks-actions
Build commit: 0c08aa9071086a229d1b9bd610d0f21617610848 (0c08aa9071086a229d1b9bd610d0f21617610848)

@ishitatsuyuki
Copy link
Contributor Author

2h11m on Skylake, time looks fine.

@ishitatsuyuki
Copy link
Contributor Author

Any opinions on whether this should be pushed forward?

@cuviper
Copy link
Member

cuviper commented Oct 26, 2022

We use ninja on every other dist builder, so I see no reason why we shouldn't here.

@bors r+ rollup=never (because of the caching effects)

@bors
Copy link
Contributor

bors commented Oct 26, 2022

📌 Commit 354e95a has been approved by cuviper

It is now in the queue for this repository.

@bors bors removed the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Oct 26, 2022
@bors bors added the S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. label Oct 26, 2022
@bors
Copy link
Contributor

bors commented Oct 27, 2022

⌛ Testing commit 354e95a with merge d3f9aa5c6e20bfc58af72d7b30457e4f5de97101...

@bors
Copy link
Contributor

bors commented Oct 27, 2022

💥 Test timed out

@bors bors added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. labels Oct 27, 2022
@rust-log-analyzer
Copy link
Collaborator

A job failed! Check out the build log: (web) (plain)

Click to see the possible cause of the failure (guessed by this bot)

@cuviper
Copy link
Member

cuviper commented Oct 27, 2022

Should be faster with the docker image now primed...

@bors retry

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Oct 27, 2022
@bors
Copy link
Contributor

bors commented Oct 30, 2022

⌛ Testing commit 354e95a with merge 5ab7445...

@bors
Copy link
Contributor

bors commented Oct 30, 2022

☀️ Test successful - checks-actions
Approved by: cuviper
Pushing 5ab7445 to master...

@bors bors added the merged-by-bors This PR was explicitly merged by bors. label Oct 30, 2022
@bors bors merged commit 5ab7445 into rust-lang:master Oct 30, 2022
@rustbot rustbot added this to the 1.67.0 milestone Oct 30, 2022
@rust-timer
Copy link
Collaborator

Finished benchmarking commit (5ab7445): comparison URL.

Overall result: ❌ regressions - ACTION NEEDED

Next Steps: If you can justify the regressions found in this perf run, please indicate this with @rustbot label: +perf-regression-triaged along with sufficient written justification. If you cannot justify the regressions please open an issue or create a new PR that fixes the regressions, add a comment linking to the newly created issue or PR, and then add the perf-regression-triaged label to this PR.

@rustbot label: +perf-regression
cc @rust-lang/wg-compiler-performance

Instruction count

This is a highly reliable metric that was used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
1.5% [0.2%, 7.2%] 22
Regressions ❌
(secondary)
2.0% [0.2%, 6.5%] 68
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
-1.0% [-1.0%, -1.0%] 1
All ❌✅ (primary) 1.5% [0.2%, 7.2%] 22

Max RSS (memory usage)

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
3.6% [3.3%, 4.2%] 3
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) - - 0

Cycles

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
7.0% [6.2%, 7.8%] 4
Regressions ❌
(secondary)
3.9% [2.1%, 7.1%] 30
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) 7.0% [6.2%, 7.8%] 4

@rustbot rustbot added the perf-regression Performance regression. label Oct 30, 2022
@cuviper
Copy link
Member

cuviper commented Oct 30, 2022

Any idea what happened to perf? Perhaps this disrupted LTO/PGO?

@nnethercote
Copy link
Contributor

Could be. @Kobzol, what do you think?

@ishitatsuyuki
Copy link
Contributor Author

Note that only the check results are mainly disrupted (hence the regression is on the Rust code side). I'll take a day to see if I can find some obvious cause, but if I don't reply in 1 day please feel free to revert it until the root cause is figured out.

@Kobzol
Copy link
Contributor

Kobzol commented Nov 1, 2022

I have no idea why there should be regressions from switching to ninja. I'll file a revert to see if it helps.

@rylev
Copy link
Member

rylev commented Nov 2, 2022

Looks like the revert shows reverting the performance regressions as well. Would it make sense to revert until we can figure out why?

@ishitatsuyuki
Copy link
Contributor Author

Yes, I think we should go for the revert. I did look into the CI logs and tried to figure out if it's LTO or PGO, but I'm completely clueless and it seems unlikely I'll be able to reland this in the predictable future.

bors added a commit to rust-lang-ci/rust that referenced this pull request Nov 2, 2022
Revert "ci: Bring back ninja for dist builders"

Reverts rust-lang#103295 because of the perf regression.

r? `@cuviper`
Aaron1011 pushed a commit to Aaron1011/rust that referenced this pull request Jan 6, 2023
ci: Bring back ninja for dist builders

The primary reason for this is that make can result in a substantial under utilization of parallelism (noticed while testing on a workstation), mostly due to the submake structure preventing good dependency tracking and scheduling.

In f758c7b (Debian 6 doesn't have ninja, so use make for the dist builds) llvm.ninja was disabled due to lack of distro package. This is no longer the case with the CentOS 7 base, so bring ninja back for a performance boost.
RalfJung pushed a commit to RalfJung/rust-analyzer that referenced this pull request Apr 20, 2024
Revert "ci: Bring back ninja for dist builders"

Reverts rust-lang/rust#103295 because of the perf regression.

r? `@cuviper`
RalfJung pushed a commit to RalfJung/rust-analyzer that referenced this pull request Apr 27, 2024
Revert "ci: Bring back ninja for dist builders"

Reverts rust-lang/rust#103295 because of the perf regression.

r? `@cuviper`
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-testsuite Area: The testsuite used to check the correctness of rustc merged-by-bors This PR was explicitly merged by bors. perf-regression Performance regression. S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. T-infra Relevant to the infrastructure team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.