Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MIR CopyPropagation is slow #36673

Closed
chrivers opened this issue Sep 23, 2016 · 36 comments · Fixed by #77373
Closed

MIR CopyPropagation is slow #36673

chrivers opened this issue Sep 23, 2016 · 36 comments · Fixed by #77373
Labels
A-mir Area: Mid-level IR (MIR) - https://blog.rust-lang.org/2016/04/19/MIR.html A-mir-opt Area: MIR optimizations C-enhancement Category: An issue proposing an enhancement or a PR with one. I-compiletime Issue: Problems and improvements with respect to compile times. P-medium Medium priority T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.

Comments

@chrivers
Copy link

Hello everyone

I've tried searching for a github issue that reports this, but I couldn't find anything, so I hope this isn't a duplicate. I'm on nightly, since my project uses type macros. Overall, there is a good amount of macro expansion going on, since I build parsers for binary data based on structs.

Compilation time for a 5500 LOC project has been around 7-9 seconds on my Core i7-4770, going a bit up and down with changes in the project, and changes in nightly.

However, today this changed very drastically:

[chrivers@helios]~/git/artemis/oxide #cargo build
Compiling oxide v0.1.0 (file:///home/chrivers/git/artemis/oxide)
Finished debug [unoptimized + debuginfo] target(s) in 21.94 secs

This is more than twice as slow as before! Something pretty significant must have happened.

This version is definitely affected:
rustc 1.13.0-nightly (4f9812a59 2016-09-21)

Honestly, I'm not quite sure when it worked, but I can say that the version I used yesterday, which was probably no more than 1 week old, worked much better.

@chrivers
Copy link
Author

I figured out how to install older nightly versions with rustup - I'm going to try searching for the problem

@jonas-schievink
Copy link
Contributor

Is the code available somewhere (or a smaller reproduction of the bug)? Can you try running with cargo rustc -Ztime-passes?

@chrivers
Copy link
Author

I can definitely send the code in private, but unfortunately I can't make it publicly available at this point. I'd be more than happy to send you a targz (or some other format, if you prefer)

I'll try a few different versions to see where it happens, and measure time-passes of the affected version(s)

@chrivers

This comment has been minimized.

@chrivers

This comment has been minimized.

@jonas-schievink
Copy link
Contributor

cargo rustc only passes the flags to the last crate compiled, which is a binary in your case. Try cargo rustc --lib -- -Ztime-passes.

@chrivers
Copy link
Author

Oops, you're quite right. This is that:

[chrivers@helios]~/git/artemis/oxide #cargo rustc --lib -- -Ztime-passes
   Compiling oxide v0.1.0 (file:///home/chrivers/git/artemis/oxide)
time: 0.014; rss: 53MB  parsing
time: 0.000; rss: 53MB  recursion limit
time: 0.000; rss: 53MB  crate injection
time: 0.000; rss: 53MB  plugin loading
time: 0.000; rss: 53MB  plugin registration
time: 0.587; rss: 136MB expansion
time: 0.000; rss: 136MB maybe building test harness
time: 0.005; rss: 136MB maybe creating a macro crate
time: 0.000; rss: 136MB checking for inline asm in case the target doesn't support it
time: 0.009; rss: 136MB complete gated feature checking
time: 0.032; rss: 138MB collecting defs
time: 0.017; rss: 142MB external crate/lib resolution
time: 0.022; rss: 142MB early lint checks
time: 0.007; rss: 142MB AST validation
time: 0.074; rss: 159MB name resolution
time: 0.042; rss: 187MB lowering ast -> hir
time: 0.009; rss: 192MB indexing hir
time: 0.006; rss: 192MB attribute checking
time: 0.005; rss: 147MB language item collection
time: 0.008; rss: 147MB lifetime resolution
time: 0.000; rss: 147MB looking for entry point
time: 0.000; rss: 147MB looking for plugin registrar
time: 0.037; rss: 157MB region resolution
time: 0.005; rss: 157MB loop checking
time: 0.005; rss: 157MB static item recursion checking
time: 0.221; rss: 159MB compute_incremental_hashes_map
time: 0.000; rss: 159MB load_dep_graph
time: 0.020; rss: 161MB type collecting
time: 0.001; rss: 161MB variance inference
time: 0.040; rss: 167MB coherence checking
time: 0.044; rss: 168MB wf checking
time: 0.144; rss: 171MB item-types checking
time: 2.333; rss: 204MB item-bodies checking
time: 0.000; rss: 204MB drop-impl checking
time: 0.157; rss: 206MB const checking
time: 0.039; rss: 206MB privacy checking
time: 0.006; rss: 206MB stability index
time: 0.018; rss: 206MB intrinsic checking
time: 0.017; rss: 206MB effect checking
time: 0.038; rss: 206MB match checking
time: 0.149; rss: 205MB liveness checking
time: 0.116; rss: 205MB rvalue checking
time: 0.277; rss: 320MB MIR dump
  time: 0.042; rss: 331MB       SimplifyCfg
  time: 0.055; rss: 331MB       QualifyAndPromoteConstants
  time: 0.110; rss: 332MB       TypeckMir
  time: 0.003; rss: 332MB       SimplifyBranches
  time: 0.026; rss: 332MB       SimplifyCfg
time: 0.235; rss: 332MB MIR passes
time: 1.213; rss: 349MB borrow checking
time: 0.001; rss: 349MB reachability checking
time: 0.033; rss: 349MB death checking
time: 0.030; rss: 349MB stability checking
time: 0.000; rss: 349MB unused lib feature checking
warning: this feature has been stable since 1.13.0. Attribute no longer needed, #[warn(stable_features)] on by default
 --> src/lib.rs:2:12
  |
2 | #![feature(type_macros)]
  |            ^^^^^^^^^^^

time: 0.171; rss: 349MB lint checking
time: 0.000; rss: 349MB resolving dependency formats
  time: 0.002; rss: 349MB       NoLandingPads
  time: 0.017; rss: 349MB       SimplifyCfg
  time: 0.061; rss: 349MB       EraseRegions
  time: 0.008; rss: 349MB       AddCallGuards
  time: 1.188; rss: 377MB       ElaborateDrops
  time: 0.002; rss: 377MB       NoLandingPads
  time: 0.034; rss: 348MB       SimplifyCfg
  time: 0.023; rss: 348MB       InstCombine
  time: 0.007; rss: 348MB       Deaggregator
  time: 6.139; rss: 352MB       CopyPropagation
  time: 0.007; rss: 352MB       AddCallGuards
  time: 0.002; rss: 352MB       PreTrans
time: 7.490; rss: 352MB Prepare MIR codegen passes
  time: 0.049; rss: 353MB       write metadata
  time: 0.139; rss: 356MB       translation item collection
  time: 0.015; rss: 359MB       codegen unit partitioning
  time: 0.007; rss: 417MB       internalize symbols
time: 1.690; rss: 318MB translation
time: 0.000; rss: 318MB assert dep graph
time: 0.000; rss: 318MB serialize dep graph
  time: 0.196; rss: 185MB       llvm function passes [0]
  time: 0.095; rss: 189MB       llvm module passes [0]
  time: 3.877; rss: 194MB       codegen passes [0]
  time: 0.001; rss: 194MB       codegen passes [0]
time: 4.401; rss: 194MB LLVM passes
time: 0.000; rss: 194MB serialize work products
time: 0.239; rss: 112MB linking
    Finished debug [unoptimized + debuginfo] target(s) in 20.18 secs

@jonas-schievink
Copy link
Contributor

time: 6.139; rss: 352MB CopyPropagation

cc @pcwalton

@chrivers
Copy link
Author

chrivers commented Sep 26, 2016

Is there anything I can do to help?

@nikomatsakis nikomatsakis added I-nominated I-compiletime Issue: Problems and improvements with respect to compile times. regression-from-stable-to-nightly Performance or correctness regression from stable to nightly. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. A-mir Area: Mid-level IR (MIR) - https://blog.rust-lang.org/2016/04/19/MIR.html labels Sep 26, 2016
@nikomatsakis
Copy link
Contributor

Nominated to discuss priority. Marked as regression.

@pnkfelix pnkfelix added regression-from-stable-to-beta Performance or correctness regression from stable to beta. P-high High priority and removed regression-from-stable-to-nightly Performance or correctness regression from stable to nightly. I-nominated labels Sep 29, 2016
@pnkfelix
Copy link
Member

pnkfelix commented Oct 3, 2016

@chrivers You said that you were not willing to post the code publicly, but that you are willing to send it via some private channel?

If you have some reduced test case to share, are you willing to share it with all the members of the rust compiler team? Or is there some non-disclosure agreement issues in play?

@chrivers
Copy link
Author

chrivers commented Oct 3, 2016

I haven't really been able to figure out what causes this, but I believe it could be because I'm building a large-ish structure in memory. I actually ended up creating a template compiler just for this (kind of) use case:

https://github.com/chrivers/transwarp

It's nothing super secret at all, and sharing it with the rust language team is certainly fine. Just as long as it doesn't hit google, that's no problem :)

There are some licensing terms I have to clear, before putting it on github. But nothing that prevents me from sharing it with a closed group of experts, certainly.

@pnkfelix can you email me? I can send you a tarball

@nikomatsakis
Copy link
Contributor

Nominating to discuss how to fix this in compiler team mtg.

@brson
Copy link
Contributor

brson commented Oct 6, 2016

Please assign someone.

@nikomatsakis
Copy link
Contributor

I think we should probably comment out the copy-propagation fix at least for beta:

  • I believe the pass doesn't take aliasing or borrowing into account
  • The fix for this is likely to be somewhat involved

But we should separately be moving on improving the performance/correctness of CopyPropagation, which is definitely an important thing to have.

@arielb1
Copy link
Contributor

arielb1 commented Oct 6, 2016

CopyPropagation bails out instantly on the first sign of borrowing, so that shouldn't be a problem.

@nikomatsakis
Copy link
Contributor

@arielb1 ah ok I was under the impression it did not do that yet. I have to go re-read the code.

@nikomatsakis
Copy link
Contributor

Conclusion from meeting: Proper fix then for nightly would be to try to cache work here, but for the beta branch an easy short-term fix is to disable copy-prop. I'll assign myself the beta PR.

@nikomatsakis nikomatsakis self-assigned this Oct 6, 2016
@brson
Copy link
Contributor

brson commented Nov 3, 2016

Monday is probably the cutoff for backporting to beta if this isn't in already.

@pnkfelix
Copy link
Member

pnkfelix commented Nov 3, 2016

The test crate in question does not compile out of the box with the beta branch itself (it uses features only available on nightly).

However, good news: The point in my experiments where the speed became acceptable again was the nightly from sept 27, which is this commit: d0623cf

That commit, if I understand the output of git log (which may be a big "if") is included in the current beta branch.

@nikomatsakis
Copy link
Contributor

triage: P-medium

It is still a mystery why this is a problem, but not a burning issue at this point.

@rust-highfive rust-highfive added P-medium Medium priority and removed P-high High priority labels Nov 3, 2016
@pnkfelix
Copy link
Member

pnkfelix commented Nov 4, 2016

Here is a small test case adapted from a vast reduction of the code that @chrivers sent.

(In this reduction, I have left in some seemingly unrelated details, such as repeated calls to println! and some similar looking match constructs. I found no immediately obvious relationship between how many times one repeats these forms and how slow the compilation is; its certainly slower either way. Its probably a sign that control-flow in general is the way to expose the problem.)

// one_file.rs

fn bit() -> Option<bool> { unimplemented!() }

macro_rules! def_thing_fields {
    (  { $($field:ident,)* } )
        =>
    {
        pub struct Thing { $(pub $field: (),)* }
        pub fn go()
        {
            Thing
            {
                $(
                    $field:
                    {
                        println!("Thing");
                        println!("{}", "Thing");
                        println!("{}", stringify!($field));

                        let b = match bit() {
                            Some(x) => x,
                            None => return,
                        };

                        if b {
                            let result = match bit() {
                                Some(x) => x,
                                None => return,
                            };
                            println!("next_bit -> {:?}", result);
                        }
                    }
                ),*
            };
        }
    }
}


fn main() {

}

def_thing_fields! {
    {
        a00, a01, a02, a03, a04, a05, a06, a07, a08, a09,
        a10, a11, a12, a13, a14, a15, a16, a17, a18, a19,
        a20, a21, a22, a23, a24, a25, a26, a27, a28, a29,
        a30, a31, a32, a33, a34, a35, a36, a37, a38, a39,
        a40, a41, a42, a43, a44, a45, a46, a47, a48, a49,
        a50, a51, a52, a53, a54, a55, a56, a57, a58, a59,
        a60, a61, a62, a63, a64, a65, a66, a67, a68, a69,
        a70, a71, a72, a73, a74, a75, a76, a77, a78, a79,
        a80, a81, a82, a83, a84, a85, a86, a87, a88, a89,
        a90, a91, a92, a93, a94, a95, a96, a97, a98, a99,
    }
}

Here are timing results showing that the same compilers I tested above show an analogous change in speed for this test case:

% multirust default nightly-2016-09-22
multirust: using existing install for 'nightly-2016-09-22'
multirust: default toolchain set to 'nightly-2016-09-22'
% time rustc -O one_file.rs  -Z time-passes | grep CopyProp
  time: 1.795   CopyPropagation

real    0m2.804s
user    0m2.585s
sys 0m0.235s
% multirust default nightly-2016-09-27
multirust: using existing install for 'nightly-2016-09-27'
multirust: default toolchain set to 'nightly-2016-09-27'
% time rustc -O one_file.rs  -Z time-passes | grep CopyProp
  time: 0.000   CopyPropagation

real    0m0.970s
user    0m0.754s
sys 0m0.237s
% 

@pnkfelix
Copy link
Member

pnkfelix commented Nov 4, 2016

Ah!

Someone mentioned in the meeting that the time spent in CopyPropagation may depend on the optimization level. I had thought that passing -O would compensate for that, but I am now inferring that -O has no effect on the mir-optimize-level ? Is that correct?

Here's data that reflects this:

% rustc --version
rustc 1.14.0-nightly (5665bdf3e 2016-11-02)
% time rustc -Z time-passes -O ./one_file.rs  | grep CopyProp
  time: 0.000   CopyPropagation

real    0m1.001s
user    0m0.760s
sys 0m0.265s
% time rustc -Z time-passes -Z mir-opt-level=1 ./one_file.rs  | grep CopyProp
  time: 1.857   CopyPropagation

real    0m2.911s
user    0m2.661s
sys 0m0.277s
% 

(and with that knowledge, it seems obvious now why the problem "went away" -- pcwalton's commit 79cb2db deliberately avoids running the CopyPropogation when the mir optimization level is 0.

@pnkfelix
Copy link
Member

pnkfelix commented Nov 4, 2016

@brson the short version of all of this that I do not think there is anything that needs to be backported to beta at this point.

@arielb1 arielb1 changed the title Significant compile-speed regression in nightly MIR CopyPropagation is slow Nov 10, 2016
@arielb1 arielb1 removed the regression-from-stable-to-beta Performance or correctness regression from stable to beta. label Nov 10, 2016
@nikomatsakis
Copy link
Contributor

nikomatsakis commented Nov 10, 2016

We untagged this as a regression but kept the issue open since we would like to eventually make copy-prop fast again.

@bluss
Copy link
Member

bluss commented Dec 15, 2016

FYI: copy propagation has moved to -Zmir-opt-level=2 (default is 1)

@Mark-Simulacrum Mark-Simulacrum added the C-enhancement Category: An issue proposing an enhancement or a PR with one. label Jul 26, 2017
bors added a commit that referenced this issue Jan 5, 2019
Don't actually create a full MIR stack frame when not needed

r? @dotdash

This should significantly reduce overhead during const propagation and reduce overhead *after* copy propagation (cc #36673)
pietroalbini added a commit to pietroalbini/rust that referenced this issue Jan 12, 2019
Don't actually create a full MIR stack frame when not needed

r? @dotdash

This should significantly reduce overhead during const propagation and reduce overhead *after* copy propagation (cc rust-lang#36673)
Centril added a commit to Centril/rust that referenced this issue Jan 13, 2019
Don't actually create a full MIR stack frame when not needed

r? @dotdash

This should significantly reduce overhead during const propagation and reduce overhead *after* copy propagation (cc rust-lang#36673)
@nikomatsakis nikomatsakis removed their assignment Apr 30, 2019
@jonas-schievink jonas-schievink added the A-mir-opt Area: MIR optimizations label Mar 29, 2020
@bors bors closed this as completed in ffeeb20 Oct 17, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-mir Area: Mid-level IR (MIR) - https://blog.rust-lang.org/2016/04/19/MIR.html A-mir-opt Area: MIR optimizations C-enhancement Category: An issue proposing an enhancement or a PR with one. I-compiletime Issue: Problems and improvements with respect to compile times. P-medium Medium priority T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
9 participants