Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Race condition when trying to compile a lib multiple times for different purposes. #5444

Closed
ehuss opened this issue May 1, 2018 · 12 comments · Fixed by #5460
Closed

Race condition when trying to compile a lib multiple times for different purposes. #5444

ehuss opened this issue May 1, 2018 · 12 comments · Fixed by #5460

Comments

@ehuss
Copy link
Contributor

ehuss commented May 1, 2018

As part of #5384 I created some circumstances where the same lib may be built concurrently with different settings. I thought this would be safe since they have different hashes in the filename, but I neglected to address this case when the library is hardlinked. There is a race condition in hardlink_or_copy where one compilation will stomp on another via the hardlink.

Details for the curious.
  1. One compilation creates a hardlink.
  2. The other compilation fails to create a hardlink with EEXIST.
  3. 2nd compilation copies its lib into the existing hardlink, modifying the first one's contents.

This can happen in a variety of situations (mainly with panic), but here is a small repro:

cargo new --lib stomp
cd stomp
cat >> Cargo.toml <<EOL
[profile.dev]
panic = "abort"
EOL
cat > src/main.rs <<EOL
extern crate stomp;
fn main() {}
EOL

# This will fail some of the time on some platforms with wrong panic strategy.
# Note: This requires #5384.  Earlier versions would fail 100% of the time.
# This attempts to build `lib.rs` twice at the same time, once with `panic` 
# and once without (for tests).
cargo build --all-targets -v

I'm trying to think of a fix, but I haven't come up with anything, yet. Generally I'm thinking that these "non-panic" dependencies should not be hard linked at all, but there's no obvious way to detect when a Unit is one of these. We could add a flag to Unit to detect this scenario, but that would require adding some extra de-duplication logic and generally seems a little too complex.

I don't think it is worth it to try to fix the race condition itself since that is very tricky, and would end up with a random lib in the output directory anyway.

I'll keep trying to think of a solution, but if anyone has any thoughts I'd be happy to hear.

@alexcrichton
Copy link
Member

Thanks for tracking this down! This I think is the cause of the errors I saw yesterday, right?

I may be reaching a different conclusion with this though, when I use your above example I get compliations that look like:

  • Library, panic=abort
  • Library, as a benchmark
  • LIbrary, as a test harness
  • Binary, as a benchmark, linked to library as panic=abort
  • Binary, as a test harness, linked to library as panic=abort
  • Binary, panic=abort, linked to library as panic=abort

It looks like the bug here is that the library is never compiled in a linkable format with panic=unwind so the binary, when build as a benchmark or test, doesn't have a suitable library to link against.

Now I'd definitely imagine that there's also some stomping over hard links going on here! I'm curious though what you think of the above?

@ehuss
Copy link
Contributor Author

ehuss commented May 1, 2018

It is either this or #5445 (cargo build -p wasm-bindgen-cli --release && cargo build -p add --release reliably triggers #5445 because the wasm-bindgen-macro package depends on serde).

What you're describing is the old behavior. With a build that includes #5384, lib.rs is built four times:

  1. lib with/panic=abort
  2. lib without panic (for tests/benches)
  3. lib as --test
  4. lib as --test for benchmarks

And because 1 and 2 are built at the same time, and both want to hardlink at the same time, that's what causes this problem.

Can you describe what the use cases are for hardlinking libs? I'm guessing it's for integrating with other build systems? Are there other scenarios? Why are only workspace libs elevated (and not dependencies)?

@alexcrichton
Copy link
Member

Aha right sorry my mistake, it looks like that hasn't yet made its way into the nightlies which is why I was confused!

Currently yeah it's primarily for integrating into other build systems, but we should only link up the "primary" dependency which in this case is the one with panic=abort. I also think it's fine to actually skip hardlinking rlibs entirely, I don't think any build system integration happens at the rlib layer, only at other staticlib/executable/dylib layers.

I think that'd fix the issue, right? If we stopped hardlinking rlibs?

@alexcrichton
Copy link
Member

I think this may have caused this failure: #5456 (comment)

@matklad
Copy link
Member

matklad commented May 2, 2018

I also think it's fine to actually skip hardlinking rlibs entirely

+1 for hard linking less stuff. Note that #5203 specifically went through several hoops to copy only relevant files to the --out-dir. The logic that that PR uses is "link only top-level units, requested from the command line". Perhaps we can use the same logic for the target dir as well?

@alexcrichton
Copy link
Member

So before #5458 I'm able to reproduce a spurious failure via:

$ cargo test --test testsuite profile_selection_build_all_targets

(basically reproduce #5456 (comment))

After that PR, however, I cannot reproduce a failure. Writing a test without that PR as well using the OP here it also doesn't seem to fail.

Put another way, I'm not sure if this is still observable after #5458 since panic=unwind and panic=abort have different filenames rather than being placed into the same bucket (and racing).

Now I'd definitely believe that the hard link is still nodeterministically being stomped over, but the severity of this issue is probably far less decreased after #5458. Mind confirming that though @ehuss?

@ehuss
Copy link
Contributor Author

ehuss commented May 2, 2018

If we stopped hardlinking rlibs?

In my tests, other output types have the same problem. (dylib is particularly broken since it doesn't have a hash in the filename.)

Another idea I was thinking about is only hardlinking the top-level targets (essentially the targets created in generate_targets()). Do you think that would be a problem? That would skip hardlinking implied dependencies in a few cases (like cargo build --bin foo would no longer link the lib, but if you want the lib you can just add --lib, or just run cargo build with no arguments).

EDIT: Just read your comments, I'm a little behind.

@ehuss
Copy link
Contributor Author

ehuss commented May 2, 2018

@alexcrichton I'm still able to repro the example up top with #5458. 5458 should only address #5445. It maybe happens about 10% of the time on my particular system.

@alexcrichton
Copy link
Member

@ehuss and to confirm, by reproduce you mean that it fails spuriously?

@ehuss
Copy link
Contributor Author

ehuss commented May 2, 2018

@alexcrichton yes

@matklad
Copy link
Member

matklad commented May 2, 2018

Hm, could this failure be related? I think hardlink stomping happens for binaries as well, because, IIRC, panic flag affects them as well?

@ehuss
Copy link
Contributor Author

ehuss commented May 2, 2018

Hm, could this failure be related?

I don't think so. This issue would only appear when it attempts to compile the same lib at the same time with panic set. build --all-targets is one of the only ways to trigger this. I'll see if I can repro the appveyor error, I don't see how it should be possible.

ehuss added a commit to ehuss/cargo that referenced this issue May 10, 2018
This changes it so that only top-level targets requested on the command-line will be included in the output directory.  Dependencies are no longer included.

Fixes rust-lang#5444.
bors added a commit that referenced this issue May 10, 2018
Be more conservative about which files are linked to the output dir.

This changes it so that only top-level targets requested on the command-line will be included in the output directory.  Dependencies are no longer included.

Fixes #5444.
bors added a commit that referenced this issue Nov 14, 2018
Check for duplicate output filenames.

This generates an error if it detects that different units would output the same file name. This can happen in a few situations:
- Multiple targets in a workspace use the same name.
- `--out-dir` in some cases (like `examples` because it does not include the directory).
- Bugs in cargo (like #5524, #5444)

This includes a few cleanups/consolidations:
- `export_path` added to `OutputFile` so there is one place where the `--out-dir` filename logic is located.
- `TargetKind::description()` added for a simple way to get a human-readable name of a target kind.
- The `PartialOrd` changes are a slight performance improvement (overall things are now slightly faster even though it is doing more work).

Closes #5524, closes #6293.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants