Move metadata generation before analysis #64112

nnethercote · 2019-09-03T04:00:40Z

The goal of this commit is to improve the effectiveness of pipelining, because metadata generation must complete before dependent crates can begin compiling.

r? @ghost

It's a false dependency. The result isn't used and there are no relevant side-effects.

They each have two call sites, and the sequencing is almost identical in both cases.

`Compiler::register_plugins()` calls `passes::register_plugins()`, which calls `Compiler::dep_graph_future()`. This is the only way in which a `passes` function calls a `Compiler` function. This commit moves the `dep_graph_future()` call out of `passes::register_plugins()` and into `Compiler::register_plugins()`, which is a more sensible spot for it. This will delay the loading of the dep graph slightly -- from the middle of plugin registration to the end of plugin registration -- but plugin registration is fast enough (especially compared to expansion) that the impact should be neglible.

`Compiler::compile()` is different to all the other `Compiler` methods because it lacks a `Queries` entry. It only has one call site, which is in a test that doesn't need its specific characteristics. This patch removes `Compile::compile()` and replaces the call to it in the test with a call to `Compile::codegen_and_link()`, which is similar enough for the test's purposes.

The goal of this commit is to improve the effectiveness of pipelining, because metadata generation must complete before dependent crates can begin compiling. Unfortunately it's not much of a win. The biggest speed-up I have seen is 1.07x, vs 1.79x for the original pipelining implementation. And it slightly slows down the common case when an error occurs during analysis, because metadata generation now precedes analysis. Currently three tests fail with the change, due to bounds violations. These tests are currently `git rm`'d, so that subsequent tests can run.

nnethercote · 2019-09-03T04:06:18Z

Unfortunately it's not much of a win. The biggest speed-up I have seen is 1.07x, vs 1.79x for the original pipelining implementation. And it slightly slows down the common case when an error occurs during analysis, because metadata generation now precedes analysis.

Currently three tests fail with the change, due to bounds violations. These tests are currently git rm'd, so that subsequent tests can run.

The first five commits above are from #64016, which is awaiting review.

Here are timings for the multi-crate rustc-perf benchmarks that currently compile on my Linux box. For each configuration there are three measurements:

no pipelining;
existing pipelining;
new, more aggressive pipelining.

syn
- dbg: 3.01 s -> 2.51 s (1.19x faster) -> 2.51 s (1.00x faster)
- opt: 5.86 s -> 3.26 s (1.79x faster) -> 3.03 s (1.07x faster)

regex
- dbg: 3.59 s -> 3.24 s (1.10x faster) -> 3.24 s (1.00x faster)
- opt: 6.37 s -> 4.58 s (1.39x faster) -> 4.42 s (1.03x faster)

piston-image
- dbg: 7.75 s -> 6.99 s (1.10x faster) -> 6.92 s (1.00x faster)
- opt: 14.45 s -> 10.50 s (1.37x faster) -> 10.27 s (1.02x faster)

webrender/webrender
- dbg: 36.94 s -> 32.83 s (1.12x faster) -> 31.35 s (1.04x faster)
- opt: 57.79 s -> 43.89 s (1.31x faster) -> 41.95 s (1.04x faster)

ripgrep
- dbg: 11.18 s -> 10.47 s (1.06x faster) -> 10.58 s (.98x faster)
- opt: 23.02 s -> 19.55 s (1.17x faster) -> 19.56 s (.99x faster)

cranelift-codegen/cranelift-codegen
- dbg: 17.66 s -> 17.31 s (1.02x faster) -> 16.61 s (1.04x faster)
- opt: 23.46 s -> 19.97 s (1.17x faster) -> 19.45 s (1.02x faster)

futures
- dbg: 1.18 s -> 1.15 s (1.02x faster) -> 1.24 s (.93x faster)
- opt: 1.25 s -> 1.06 s (1.17x faster) -> 1.06 s (1.00x faster)

wg-grammar
- dbg: 9.28 s -> 8.87 s (1.04x faster) -> 8.85 s (1.00x faster)
- opt: 12.03 s -> 10.98 s (1.09x faster) -> 10.70 s (1.02x faster)

webr_api/webrender_api
- dbg: 29.45 s -> 28.94 s (1.01x faster) -> 28.26 s (1.02x faster)
- opt: 44.75 s -> 43.01 s (1.04x faster) -> 41.63 s (1.03x faster)

clap-rs
- dbg: 11.64 s -> 11.52 s (1.01x faster) -> 11.54 s (.99x faster)
- opt: 25.05 s -> 24.75 s (1.01x faster) -> 24.84 s (.99x faster)

html5ever
- dbg: 6.13 s -> 6.08 s (1.00x faster) -> 6.11 s (.99x faster)
- opt: 8.05 s -> 7.95 s (1.01x faster) -> 8.02 s (.99x faster)

encoding
- dbg: 1.60 s -> 1.54 s (1.03x faster) -> 1.58 s (.97x faster)
- opt: 1.80 s -> 1.78 s (1.00x faster) -> 1.76 s (1.01x faster)

ctfe-stress-2
- dbg: 11.89 s -> 12.10 s (.98x faster) -> 12.06 s (1.00x faster)
- opt: 9.08 s -> 9.06 s (1.00x faster) -> 8.97 s (1.01x faster)

coercions
- dbg: 1.46 s -> 1.42 s (1.02x faster) -> 1.42 s (.99x faster)
- opt: 1.08 s -> 1.09 s (.99x faster) -> 1.08 s (1.00x faster)

cargo
- dbg: 33.75 s -> 33.57 s (1.00x faster) -> 33.61 s (.99x faster)
- opt: 53.38 s -> 53.48 s (.99x faster) -> 53.36 s (1.00x faster)

You can see that the additional improvements from the new, aggressive pipelining are paltry compared to the improvements from the existing pipelining.

Because of all this, my current inclination is to abandon the effort, but I'm posting it here in case somebody has an idea to improve it.

nnethercote · 2019-09-03T04:43:22Z

The small improvements make sense; analysis takes a lot less time than code generation, so the amount of additional overlap is relatively small.

Centril · 2019-09-03T04:54:50Z

My reading of the diff is that metadata generation still happens after lowering and more importantly after cfg stripping/macro expansion. Can you confirm?

nnethercote · 2019-09-03T05:42:19Z

My reading of the diff is that metadata generation still happens after lowering and more importantly after cfg stripping/macro expansion. Can you confirm?

That is correct. I couldn't push it any earlier because do_metadata() requires a TyCtxt, which is produced by Compiler::global_ctxt(), which relies on the results of Compiler::lower_to_hir(), which relies on the results of Compiler::expansion().

Centril · 2019-09-03T05:45:45Z

Thanks.

The numbers for syn still look enticing. cc @dtolnay

est31 · 2019-09-03T07:57:10Z

it slightly slows down the common case when an error occurs during analysis, because metadata generation now precedes analysis.

What about doing them in parallel? Of course, it might still be slightly slower on hardware with not enough cores available, but at least there won't be any regression on sufficiently "good" hardware.

michaelwoerister · 2019-09-04T11:07:54Z

Doesn't metadata generation needed data that is produced only very late in the process (like optimized MIR and, potentially, the list of exported monomorphizations)? The query system should take care of producing this data on demand but with what's expected to be contained in RLIBs at the moment, it seems hard to effectively move metadata generation more towards the front.

nnethercote · 2019-09-04T23:55:29Z

@alexcrichton said on Zulip:

rustc has tons of caches internally
so the time previously spent in analysis may just be shifted to metadata generation now
(maybe)
and the analysis phase, which happens later, just hits all the caches and doesn't actually do anything

This may be true. Understanding the ordering of things in rustc is surprisingly hard. I did spend enough time looking at (and refactoring!) librustc_interface/queries.rs and librustc_interface/passes.rs that I understand Compiler quite well, and I am confident that I got the high-level ordering right. (And diagnostic println! statements confirmed this.) It is still possible that there are things going on at a different level that I'm not aware of.

I am still inclined to abandon this effort. I tried what I set out to try, got a result that makes sense (very slight speedups, but much smaller than the original pipelining because analysis is much quicker than codegen), and suggestions for further work are more speculative. If somebody else wants to experiment with it further, please do so.

nnethercote added 6 commits September 2, 2019 08:46

Don't call self.parse() in Compiler::crate_name() unless necessary.

d61605c

Remove lower_to_hir() call from prepare_output().

cd0c21b

It's a false dependency. The result isn't used and there are no relevant side-effects.

Merge Compiler::{ongoing_codegen,link}.

c6117d9

They each have two call sites, and the sequencing is almost identical in both cases.

nnethercote closed this Sep 16, 2019

nnethercote mentioned this pull request Oct 20, 2019

Split libsyntax apart #65324

Merged

nnethercote deleted the earlier-metadata branch April 2, 2020 03:50

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Move metadata generation before analysis #64112

Move metadata generation before analysis #64112

nnethercote commented Sep 3, 2019

nnethercote commented Sep 3, 2019 •

edited

Loading

nnethercote commented Sep 3, 2019

Centril commented Sep 3, 2019

nnethercote commented Sep 3, 2019

Centril commented Sep 3, 2019

est31 commented Sep 3, 2019

michaelwoerister commented Sep 4, 2019

nnethercote commented Sep 4, 2019 •

edited

Loading

Move metadata generation before analysis #64112

Move metadata generation before analysis #64112

Conversation

nnethercote commented Sep 3, 2019

nnethercote commented Sep 3, 2019 • edited Loading

nnethercote commented Sep 3, 2019

Centril commented Sep 3, 2019

nnethercote commented Sep 3, 2019

Centril commented Sep 3, 2019

est31 commented Sep 3, 2019

michaelwoerister commented Sep 4, 2019

nnethercote commented Sep 4, 2019 • edited Loading

nnethercote commented Sep 3, 2019 •

edited

Loading

nnethercote commented Sep 4, 2019 •

edited

Loading