ci: Bring back ninja for dist builders #103295

ishitatsuyuki · 2022-10-20T09:19:31Z

The primary reason for this is that make can result in a substantial under utilization of parallelism (noticed while testing on a workstation), mostly due to the submake structure preventing good dependency tracking and scheduling.

In f758c7b (Debian 6 doesn't have ninja, so use make for the dist builds) llvm.ninja was disabled due to lack of distro package. This is no longer the case with the CentOS 7 base, so bring ninja back for a performance boost.

The primary reason for this is that make can result in a substantial under utilization of parallelism, mostly due to the submake structure preventing good dependency tracking and scheduling. In f758c7b (Debian 6 doesn't have ninja, so use make for the dist builds) llvm.ninja was disabled due to lack of distro package. This is no longer the case with the CentOS 7 base, so bring ninja back for a performance boost.

Now that we have brought ninja back to the installed base packages, we can use it for the initial Clang build as well.

rust-highfive · 2022-10-20T09:19:35Z

r? @pietroalbini

(rust-highfive has picked a reviewer for you, use r? to override)

Kobzol · 2022-10-20T13:46:35Z

FYI I tried this recently (#96969), but that was before the Debian bump. Let's see what happens now.

@bors try

bors · 2022-10-20T13:47:13Z

⌛ Trying commit 354e95a with merge 431cb4d2a7b345b52167aa0f2d16473c8e0fb01b...

ishitatsuyuki · 2022-10-20T13:55:54Z

If that's the case I suppose it won't be much different this time, though one benefit of this is that it's quite handy when you test CI builds on a high-core workstation (make can only achieve ~32 load average out of 128 cores).

bors · 2022-10-20T17:27:46Z

☀️ Try build successful - checks-actions
Build commit: 431cb4d2a7b345b52167aa0f2d16473c8e0fb01b (431cb4d2a7b345b52167aa0f2d16473c8e0fb01b)

ishitatsuyuki · 2022-10-22T03:23:36Z

Comparing the final build of LLVM (which doesn't hit the cache due to PGO) of this PR to a recent master build (https://github.com/rust-lang-ci/rust/actions/runs/3301120227/jobs/5447282596), this PR took 15m11s while the master build took 12m58s.

It looks like the CI machines suffers a lot from noisy neighbors though, hence I don't think we'll be able to get a conclusion of significance out of the timings.

Given the simplicity of doing this (we don't need extra build-from-source scripts unlike #96969) I still think this is worth doing, mainly for convenience of testing on many-core machines.

Kobzol · 2022-10-22T07:24:56Z

@bors try

(we should do multiple builds, preferably in rapid succession, to make sure that caches are primed)

bors · 2022-10-22T07:25:04Z

⌛ Trying commit 354e95a with merge 2bfc8954ac2c8143b4f5c0c1fcddffdb7101fc84...

ishitatsuyuki · 2022-10-22T07:33:39Z

Sure, a successive build would hit the Docker and first-stage LLVM cache, but what are we trying to measure here? Given the variance of CI machines, we'll need hundreds of builds to get a statistically significant comparison, which I don't think is worth the effort.

The timing I measured above is from the PGOed build, whose --instr-use flag explicitly disable sccache. I doubt cache is what that matters here.

Kobzol · 2022-10-22T07:59:20Z

Yeah it's not about statistical significance, just to see the final dist build number without the broken Docker cache, so that we can make sure that it stays in the same ballpark.

mati865 · 2022-10-22T08:04:25Z

While comparing times you should also check CPU on particular runner (I think it's still logged). Discrepancy between slowest and fastest machines is rather big on GHA.

bors · 2022-10-22T10:42:20Z

☀️ Try build successful - checks-actions
Build commit: 2bfc8954ac2c8143b4f5c0c1fcddffdb7101fc84 (2bfc8954ac2c8143b4f5c0c1fcddffdb7101fc84)

ishitatsuyuki · 2022-10-22T10:57:01Z

OK, sccache is hitting now, but the Docker one doesn't. Maybe the cache key is different for try builds?

Kobzol · 2022-10-22T10:58:43Z

@bors try

From my experience it's just a matter of running the build 2-3 times in a quick succession to make all caches work.

bors · 2022-10-22T10:58:51Z

⌛ Trying commit 354e95a with merge 0c08aa9071086a229d1b9bd610d0f21617610848...

bors · 2022-10-22T13:10:41Z

☀️ Try build successful - checks-actions
Build commit: 0c08aa9071086a229d1b9bd610d0f21617610848 (0c08aa9071086a229d1b9bd610d0f21617610848)

ishitatsuyuki · 2022-10-22T13:12:18Z

2h11m on Skylake, time looks fine.

ishitatsuyuki · 2022-10-26T03:01:29Z

Any opinions on whether this should be pushed forward?

cuviper · 2022-10-26T16:54:31Z

We use ninja on every other dist builder, so I see no reason why we shouldn't here.

@bors r+ rollup=never (because of the caching effects)

bors · 2022-10-26T16:54:34Z

📌 Commit 354e95a has been approved by cuviper

It is now in the queue for this repository.

bors · 2022-10-27T12:55:55Z

⌛ Testing commit 354e95a with merge d3f9aa5c6e20bfc58af72d7b30457e4f5de97101...

bors · 2022-10-27T17:10:55Z

💥 Test timed out

rust-log-analyzer · 2022-10-27T17:27:52Z

A job failed! Check out the build log: (web) (plain)

Click to see the possible cause of the failure (guessed by this bot)

cuviper · 2022-10-27T23:37:42Z

Should be faster with the docker image now primed...

@bors retry

bors · 2022-10-30T14:13:46Z

⌛ Testing commit 354e95a with merge 5ab7445...

bors · 2022-10-30T17:44:57Z

☀️ Test successful - checks-actions
Approved by: cuviper
Pushing 5ab7445 to master...

rust-timer · 2022-10-30T20:01:26Z

Finished benchmarking commit (5ab7445): comparison URL.

Overall result: ❌ regressions - ACTION NEEDED

Next Steps: If you can justify the regressions found in this perf run, please indicate this with @rustbot label: +perf-regression-triaged along with sufficient written justification. If you cannot justify the regressions please open an issue or create a new PR that fixes the regressions, add a comment linking to the newly created issue or PR, and then add the perf-regression-triaged label to this PR.

@rustbot label: +perf-regression
cc @rust-lang/wg-compiler-performance

Instruction count

This is a highly reliable metric that was used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	1.5%	[0.2%, 7.2%]	22
Regressions ❌ (secondary)	2.0%	[0.2%, 6.5%]	68
Improvements ✅ (primary)	-	-	0
Improvements ✅ (secondary)	-1.0%	[-1.0%, -1.0%]	1
All ❌✅ (primary)	1.5%	[0.2%, 7.2%]	22

Max RSS (memory usage)

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	-	-	0
Regressions ❌ (secondary)	3.6%	[3.3%, 4.2%]	3
Improvements ✅ (primary)	-	-	0
Improvements ✅ (secondary)	-	-	0
All ❌✅ (primary)	-	-	0

Cycles

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	7.0%	[6.2%, 7.8%]	4
Regressions ❌ (secondary)	3.9%	[2.1%, 7.1%]	30
Improvements ✅ (primary)	-	-	0
Improvements ✅ (secondary)	-	-	0
All ❌✅ (primary)	7.0%	[6.2%, 7.8%]	4

cuviper · 2022-10-30T20:18:44Z

Any idea what happened to perf? Perhaps this disrupted LTO/PGO?

nnethercote · 2022-10-30T23:15:21Z

Could be. @Kobzol, what do you think?

ishitatsuyuki · 2022-10-31T02:03:02Z

Note that only the check results are mainly disrupted (hence the regression is on the Rust code side). I'll take a day to see if I can find some obvious cause, but if I don't reply in 1 day please feel free to revert it until the root cause is figured out.

Kobzol · 2022-11-01T18:35:13Z

I have no idea why there should be regressions from switching to ninja. I'll file a revert to see if it helps.

rylev · 2022-11-02T12:46:38Z

Looks like the revert shows reverting the performance regressions as well. Would it make sense to revert until we can figure out why?

ishitatsuyuki · 2022-11-02T12:50:35Z

Yes, I think we should go for the revert. I did look into the CI logs and tried to figure out if it's LTO or PGO, but I'm completely clueless and it seems unlikely I'll be able to reland this in the predictable future.

Revert "ci: Bring back ninja for dist builders" Reverts rust-lang#103295 because of the perf regression. r? `@cuviper`

ci: Bring back ninja for dist builders The primary reason for this is that make can result in a substantial under utilization of parallelism (noticed while testing on a workstation), mostly due to the submake structure preventing good dependency tracking and scheduling. In f758c7b (Debian 6 doesn't have ninja, so use make for the dist builds) llvm.ninja was disabled due to lack of distro package. This is no longer the case with the CentOS 7 base, so bring ninja back for a performance boost.

Revert "ci: Bring back ninja for dist builders" Reverts rust-lang/rust#103295 because of the perf regression. r? `@cuviper`

ishitatsuyuki added 2 commits October 20, 2022 18:18

ci: Use ninja in build-clang.sh

354e95a

Now that we have brought ninja back to the installed base packages, we can use it for the initial Clang build as well.

rust-highfive assigned pietroalbini Oct 20, 2022

rustbot added A-testsuite Area: The testsuite used to check the correctness of rustc T-infra Relevant to the infrastructure team, which will review and decide on the PR/issue. labels Oct 20, 2022

rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Oct 20, 2022

bors removed the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Oct 26, 2022

bors added the S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. label Oct 26, 2022

bors added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. labels Oct 27, 2022

bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Oct 27, 2022

bors added the merged-by-bors This PR was explicitly merged by bors. label Oct 30, 2022

bors merged commit 5ab7445 into rust-lang:master Oct 30, 2022

rustbot added this to the 1.67.0 milestone Oct 30, 2022

rustbot added the perf-regression Performance regression. label Oct 30, 2022

Kobzol mentioned this pull request Nov 1, 2022

Revert "ci: Bring back ninja for dist builders" #103846

Merged

ishitatsuyuki deleted the ninja branch November 2, 2022 09:49

bors added a commit to rust-lang-ci/rust that referenced this pull request Nov 2, 2022

Auto merge of rust-lang#103846 - Kobzol:revert-103295-ninja, r=cuviper

5b1304a

Revert "ci: Bring back ninja for dist builders" Reverts rust-lang#103295 because of the perf regression. r? `@cuviper`

RalfJung pushed a commit to RalfJung/rust-analyzer that referenced this pull request Apr 20, 2024

Auto merge of #103846 - Kobzol:revert-103295-ninja, r=cuviper

749d0f1

Revert "ci: Bring back ninja for dist builders" Reverts rust-lang/rust#103295 because of the perf regression. r? `@cuviper`

RalfJung pushed a commit to RalfJung/rust-analyzer that referenced this pull request Apr 27, 2024

Auto merge of #103846 - Kobzol:revert-103295-ninja, r=cuviper

5414e81

Revert "ci: Bring back ninja for dist builders" Reverts rust-lang/rust#103295 because of the perf regression. r? `@cuviper`

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ci: Bring back ninja for dist builders #103295

ci: Bring back ninja for dist builders #103295

ishitatsuyuki commented Oct 20, 2022

rust-highfive commented Oct 20, 2022

Kobzol commented Oct 20, 2022

bors commented Oct 20, 2022

ishitatsuyuki commented Oct 20, 2022

bors commented Oct 20, 2022

ishitatsuyuki commented Oct 22, 2022

Kobzol commented Oct 22, 2022

bors commented Oct 22, 2022

ishitatsuyuki commented Oct 22, 2022

Kobzol commented Oct 22, 2022

mati865 commented Oct 22, 2022

bors commented Oct 22, 2022

ishitatsuyuki commented Oct 22, 2022

Kobzol commented Oct 22, 2022

bors commented Oct 22, 2022

bors commented Oct 22, 2022

ishitatsuyuki commented Oct 22, 2022

ishitatsuyuki commented Oct 26, 2022

cuviper commented Oct 26, 2022

bors commented Oct 26, 2022

bors commented Oct 27, 2022

bors commented Oct 27, 2022

rust-log-analyzer commented Oct 27, 2022

cuviper commented Oct 27, 2022

bors commented Oct 30, 2022

bors commented Oct 30, 2022

rust-timer commented Oct 30, 2022

cuviper commented Oct 30, 2022

nnethercote commented Oct 30, 2022

ishitatsuyuki commented Oct 31, 2022

Kobzol commented Nov 1, 2022

rylev commented Nov 2, 2022

ishitatsuyuki commented Nov 2, 2022

ci: Bring back ninja for dist builders #103295

ci: Bring back ninja for dist builders #103295

Conversation

ishitatsuyuki commented Oct 20, 2022

rust-highfive commented Oct 20, 2022

Kobzol commented Oct 20, 2022

bors commented Oct 20, 2022

ishitatsuyuki commented Oct 20, 2022

bors commented Oct 20, 2022

ishitatsuyuki commented Oct 22, 2022

Kobzol commented Oct 22, 2022

bors commented Oct 22, 2022

ishitatsuyuki commented Oct 22, 2022

Kobzol commented Oct 22, 2022

mati865 commented Oct 22, 2022

bors commented Oct 22, 2022

ishitatsuyuki commented Oct 22, 2022

Kobzol commented Oct 22, 2022

bors commented Oct 22, 2022

bors commented Oct 22, 2022

ishitatsuyuki commented Oct 22, 2022

ishitatsuyuki commented Oct 26, 2022

cuviper commented Oct 26, 2022

bors commented Oct 26, 2022

bors commented Oct 27, 2022

bors commented Oct 27, 2022

rust-log-analyzer commented Oct 27, 2022

cuviper commented Oct 27, 2022

bors commented Oct 30, 2022

bors commented Oct 30, 2022

rust-timer commented Oct 30, 2022

Overall result: ❌ regressions - ACTION NEEDED

Instruction count

Max RSS (memory usage)

Cycles

cuviper commented Oct 30, 2022

nnethercote commented Oct 30, 2022

ishitatsuyuki commented Oct 31, 2022

Kobzol commented Nov 1, 2022

rylev commented Nov 2, 2022

ishitatsuyuki commented Nov 2, 2022