Skip to content

Commit

Permalink
Auto merge of #8087 - ehuss:freshness-interrupted2, r=alexcrichton
Browse files Browse the repository at this point in the history
Fix freshness when linking is interrupted.

Fixes a scenario where hitting Ctrl-C while linking would leave a corrupted executable, but Cargo would think it is "fresh" and fail to rebuild it.

This also includes a separate commit which adds more documentation on fingerprinting.

Fixes #7767
  • Loading branch information
bors committed Apr 10, 2020
2 parents 7d720ef + 14e86cc commit 53b1c48
Show file tree
Hide file tree
Showing 3 changed files with 288 additions and 47 deletions.
17 changes: 16 additions & 1 deletion src/cargo/core/compiler/context/compilation_files.rs
Original file line number Diff line number Diff line change
Expand Up @@ -13,14 +13,20 @@ use crate::core::compiler::{CompileMode, CompileTarget, Unit};
use crate::core::{Target, TargetKind, Workspace};
use crate::util::{self, CargoResult};

/// The `Metadata` is a hash used to make unique file names for each unit in a build.
/// The `Metadata` is a hash used to make unique file names for each unit in a
/// build. It is also use for symbol mangling.
///
/// For example:
/// - A project may depend on crate `A` and crate `B`, so the package name must be in the file name.
/// - Similarly a project may depend on two versions of `A`, so the version must be in the file name.
///
/// In general this must include all things that need to be distinguished in different parts of
/// the same build. This is absolutely required or we override things before
/// we get chance to use them.
///
/// It is also used for symbol mangling, because if you have two versions of
/// the same crate linked together, their symbols need to be differentiated.
///
/// We use a hash because it is an easy way to guarantee
/// that all the inputs can be converted to a valid path.
///
Expand All @@ -39,6 +45,15 @@ use crate::util::{self, CargoResult};
/// more space than needed. This makes not including something in `Metadata`
/// a form of cache invalidation.
///
/// You should also avoid anything that would interfere with reproducible
/// builds. For example, *any* absolute path should be avoided. This is one
/// reason that `RUSTFLAGS` is not in `Metadata`, because it often has
/// absolute paths (like `--remap-path-prefix` which is fundamentally used for
/// reproducible builds and has absolute paths in it). Also, in some cases the
/// mangled symbols need to be stable between different builds with different
/// settings. For example, profile-guided optimizations need to swap
/// `RUSTFLAGS` between runs, but needs to keep the same symbol names.
///
/// Note that the `Fingerprint` is in charge of tracking everything needed to determine if a
/// rebuild is needed.
#[derive(Copy, Clone, Hash, Eq, PartialEq, Ord, PartialOrd)]
Expand Down
233 changes: 187 additions & 46 deletions src/cargo/core/compiler/fingerprint.rs
Original file line number Diff line number Diff line change
Expand Up @@ -5,23 +5,30 @@
//! (needs to be recompiled) or "fresh" (it does not need to be recompiled).
//! There are several mechanisms that influence a Unit's freshness:
//!
//! - The `Metadata` hash isolates each Unit on the filesystem by being
//! embedded in the filename. If something in the hash changes, then the
//! output files will be missing, and the Unit will be dirty (missing
//! outputs are considered "dirty").
//! - The `Fingerprint` is another hash, saved to the filesystem in the
//! `.fingerprint` directory, that tracks information about the inputs to a
//! Unit. If any of the inputs changes from the last compilation, then the
//! Unit is considered dirty. A missing fingerprint (such as during the
//! first build) is also considered dirty.
//! - Whether or not input files are actually present. For example a build
//! script which says it depends on a nonexistent file `foo` is always rerun.
//! - Propagation throughout the dependency graph of file modification time
//! information, used to detect changes on the filesystem. Each `Fingerprint`
//! keeps track of what files it'll be processing, and when necessary it will
//! check the `mtime` of each file (last modification time) and compare it to
//! dependencies and output to see if files have been changed or if a change
//! needs to force recompiles of downstream dependencies.
//! - The `Fingerprint` is a hash, saved to the filesystem in the
//! `.fingerprint` directory, that tracks information about the Unit. If the
//! fingerprint is missing (such as the first time the unit is being
//! compiled), then the unit is dirty. If any of the fingerprint fields
//! change (like the name of the source file), then the Unit is considered
//! dirty.
//!
//! The `Fingerprint` also tracks the fingerprints of all its dependencies,
//! so a change in a dependency will propagate the "dirty" status up.
//!
//! - Filesystem mtime tracking is also used to check if a unit is dirty.
//! See the section below on "Mtime comparison" for more details. There
//! are essentially two parts to mtime tracking:
//!
//! 1. The mtime of a Unit's output files is compared to the mtime of all
//! its dependencies' output file mtimes (see `check_filesystem`). If any
//! output is missing, or is older than a dependency's output, then the
//! unit is dirty.
//! 2. The mtime of a Unit's source files is compared to the mtime of its
//! dep-info file in the fingerprint directory (see `find_stale_file`).
//! The dep-info file is used as an anchor to know when the last build of
//! the unit was done. See the "dep-info files" section below for more
//! details. If any input files are missing, or are newer than the
//! dep-info, then the unit is dirty.
//!
//! Note: Fingerprinting is not a perfect solution. Filesystem mtime tracking
//! is notoriously imprecise and problematic. Only a small part of the
Expand All @@ -33,11 +40,16 @@
//!
//! ## Fingerprints and Metadata
//!
//! The `Metadata` hash is a hash added to the output filenames to isolate
//! each unit. See the documentation in the `compilation_files` module for
//! more details. NOTE: Not all output files are isolated via filename hashes
//! (like dylibs), but the fingerprint directory always has the `Metadata`
//! hash in its directory name.
//!
//! Fingerprints and Metadata are similar, and track some of the same things.
//! The Metadata contains information that is required to keep Units separate.
//! The Fingerprint includes additional information that should cause a
//! recompile, but it is desired to reuse the same filenames. Generally the
//! items in the Metadata do not need to be in the Fingerprint. A comparison
//! recompile, but it is desired to reuse the same filenames. A comparison
//! of what is tracked:
//!
//! Value | Fingerprint | Metadata
Expand All @@ -54,8 +66,7 @@
//! __CARGO_DEFAULT_LIB_METADATA[^4] | | ✓
//! package_id | | ✓
//! authors, description, homepage, repo | ✓ |
//! Target src path | ✓ |
//! Target path relative to ws | ✓ |
//! Target src path relative to ws | ✓ |
//! Target flags (test/bench/for_host/edition) | ✓ |
//! -C incremental=… flag | ✓ |
//! mtime of sources | ✓[^3] |
Expand All @@ -64,12 +75,19 @@
//!
//! [^1]: Build script and bin dependencies are not included.
//!
//! [^3]: The mtime is only tracked for workspace members and path
//! dependencies. Git dependencies track the git revision.
//! [^3]: See below for details on mtime tracking.
//!
//! [^4]: `__CARGO_DEFAULT_LIB_METADATA` is set by rustbuild to embed the
//! release channel (bootstrap/stable/beta/nightly) in libstd.
//!
//! When deciding what should go in the Metadata vs the Fingerprint, consider
//! that some files (like dylibs) do not have a hash in their filename. Thus,
//! if a value changes, only the fingerprint will detect the change (consider,
//! for example, swapping between different features). Fields that are only in
//! Metadata generally aren't relevant to the fingerprint because they
//! fundamentally change the output (like target vs host changes the directory
//! where it is emitted).
//!
//! ## Fingerprint files
//!
//! Fingerprint information is stored in the
Expand All @@ -83,9 +101,7 @@
//! `CARGO_LOG=cargo::core::compiler::fingerprint=trace cargo build` can be
//! used to display this log information.
//! - A "dep-info" file which contains a list of source filenames for the
//! target. This is produced by reading the output of `rustc
//! --emit=dep-info` and packing it into a condensed format. Cargo uses this
//! to check the mtime of every file to see if any of them have changed.
//! target. See below for details.
//! - An `invoked.timestamp` file whose filesystem mtime is updated every time
//! the Unit is built. This is an experimental feature used for cleaning
//! unused artifacts.
Expand All @@ -110,6 +126,103 @@
//! all dependencies, when it is updated, by using `Arc` clones, it
//! automatically picks up the updates to its dependencies.
//!
//! ### dep-info files
//!
//! Cargo passes the `--emit=dep-info` flag to `rustc` so that `rustc` will
//! generate a "dep info" file (with the `.d` extension). This is a
//! Makefile-like syntax that includes all of the source files used to build
//! the crate. This file is used by Cargo to know which files to check to see
//! if the crate will need to be rebuilt.
//!
//! After `rustc` exits successfully, Cargo will read the dep info file and
//! translate it into a binary format that is stored in the fingerprint
//! directory (`translate_dep_info`). The mtime of the fingerprint dep-info
//! file itself is used as the reference for comparing the source files to
//! determine if any of the source files have been modified (see below for
//! more detail).
//!
//! There is also a third dep-info file. Cargo will extend the file created by
//! rustc with some additional information and saves this into the output
//! directory. This is intended for build system integration. See the
//! `output_depinfo` module for more detail.
//!
//! #### -Zbinary-dep-depinfo
//!
//! `rustc` has an experimental flag `-Zbinary-dep-depinfo`. This causes
//! `rustc` to include binary files (like rlibs) in the dep-info file. This is
//! primarily to support rustc development, so that Cargo can check the
//! implicit dependency to the standard library (which lives in the sysroot).
//! We want Cargo to recompile whenever the standard library rlib/dylibs
//! change, and this is a generic mechanism to make that work.
//!
//! ### Mtime comparison
//!
//! The use of modification timestamps is the most common way a unit will be
//! determined to be dirty or fresh between builds. There are many subtle
//! issues and edge cases with mtime comparisons. This gives a high-level
//! overview, but you'll need to read the code for the gritty details. Mtime
//! handling is different for different unit kinds. The different styles are
//! driven by the `Fingerprint.local` field, which is set based on the unit
//! kind.
//!
//! The status of whether or not the mtime is "stale" or "up-to-date" is
//! stored in `Fingerprint.fs_status`.
//!
//! All units will compare the mtime of its newest output file with the mtimes
//! of the outputs of all its dependencies. If any output file is missing,
//! then the unit is stale. If any dependency is newer, the unit is stale.
//!
//! #### Normal package mtime handling
//!
//! `LocalFingerprint::CheckDepinfo` is used for checking the mtime of
//! packages. It compares the mtime of the input files (the source files) to
//! the mtime of the dep-info file (which is written last after a build is
//! finished). If the dep-info is missing, the unit is stale (it has never
//! been built). The list of input files comes from the dep-info file. See the
//! section above for details on dep-info files.
//!
//! Also note that although registry and git packages use `CheckDepInfo`, none
//! of their source files are included in the dep-info (see
//! `translate_dep_info`), so for those kinds no mtime checking is done
//! (unless `-Zbinary-dep-depinfo` is used). Repository and git packages are
//! static, so there is no need to check anything.
//!
//! When a build is complete, the mtime of the dep-info file in the
//! fingerprint directory is modified to rewind it to the time when the build
//! started. This is done by creating an `invoked.timestamp` file when the
//! build starts to capture the start time. The mtime is rewound to the start
//! to handle the case where the user modifies a source file while a build is
//! running. Cargo can't know whether or not the file was included in the
//! build, so it takes a conservative approach of assuming the file was *not*
//! included, and it should be rebuilt during the next build.
//!
//! #### Rustdoc mtime handling
//!
//! Rustdoc does not emit a dep-info file, so Cargo currently has a relatively
//! simple system for detecting rebuilds. `LocalFingerprint::Precalculated` is
//! used for rustdoc units. For registry packages, this is the package
//! version. For git packages, it is the git hash. For path packages, it is
//! the a string of the mtime of the newest file in the package.
//!
//! There are some known bugs with how this works, so it should be improved at
//! some point.
//!
//! #### Build script mtime handling
//!
//! Build script mtime handling runs in different modes. There is the "old
//! style" where the build script does not emit any `rerun-if` directives. In
//! this mode, Cargo will use `LocalFingerprint::Precalculated`. See the
//! "rustdoc" section above how it works.
//!
//! In the new-style, each `rerun-if` directive is translated to the
//! corresponding `LocalFingerprint` variant. The `RerunIfChanged` variant
//! compares the mtime of the given filenames against the mtime of the
//! "output" file.
//!
//! Similar to normal units, the build script "output" file mtime is rewound
//! to the time just before the build script is executed to handle mid-build
//! modifications.
//!
//! ## Considerations for inclusion in a fingerprint
//!
//! Over time we've realized a few items which historically were included in
Expand Down Expand Up @@ -277,6 +390,40 @@ pub fn prepare_target<'a, 'cfg>(
return Ok(Job::new(Work::noop(), Fresh));
}

// Clear out the old fingerprint file if it exists. This protects when
// compilation is interrupted leaving a corrupt file. For example, a
// project with a lib.rs and integration test (two units):
//
// 1. Build the library and integration test.
// 2. Make a change to lib.rs (NOT the integration test).
// 3. Build the integration test, hit Ctrl-C while linking. With gcc, this
// will leave behind an incomplete executable (zero size, or partially
// written). NOTE: The library builds successfully, it is the linking
// of the integration test that we are interrupting.
// 4. Build the integration test again.
//
// Without the following line, then step 3 will leave a valid fingerprint
// on the disk. Then step 4 will think the integration test is "fresh"
// because:
//
// - There is a valid fingerprint hash on disk (written in step 1).
// - The mtime of the output file (the corrupt integration executable
// written in step 3) is newer than all of its dependencies.
// - The mtime of the integration test fingerprint dep-info file (written
// in step 1) is newer than the integration test's source files, because
// we haven't modified any of its source files.
//
// But the executable is corrupt and needs to be rebuilt. Clearing the
// fingerprint at step 3 ensures that Cargo never mistakes a partially
// written output as up-to-date.
if loc.exists() {
// Truncate instead of delete so that compare_old_fingerprint will
// still log the reason for the fingerprint failure instead of just
// reporting "failed to read fingerprint" during the next build if
// this build fails.
paths::write(&loc, b"")?;
}

let write_fingerprint = if unit.mode.is_run_custom_build() {
// For build scripts the `local` field of the fingerprint may change
// while we're executing it. For example it could be in the legacy
Expand Down Expand Up @@ -484,9 +631,8 @@ impl<'de> Deserialize<'de> for DepFingerprint {
#[derive(Debug, Serialize, Deserialize, Hash)]
enum LocalFingerprint {
/// This is a precalculated fingerprint which has an opaque string we just
/// hash as usual. This variant is primarily used for git/crates.io
/// dependencies where the source never changes so we can quickly conclude
/// that there's some string we can hash and it won't really change much.
/// hash as usual. This variant is primarily used for rustdoc where we
/// don't have a dep-info file to compare against.
///
/// This is also used for build scripts with no `rerun-if-*` statements, but
/// that's overall a mistake and causes bugs in Cargo. We shouldn't use this
Expand Down Expand Up @@ -1072,19 +1218,16 @@ fn calculate_normal<'a, 'cfg>(
.collect::<CargoResult<Vec<_>>>()?;
deps.sort_by(|a, b| a.pkg_id.cmp(&b.pkg_id));

// Afterwards calculate our own fingerprint information. We specially
// handle `path` packages to ensure we track files on the filesystem
// correctly, but otherwise upstream packages like from crates.io or git
// get bland fingerprints because they don't change without their
// `PackageId` changing.
// Afterwards calculate our own fingerprint information.
let target_root = target_root(cx);
let local = if use_dep_info(unit) {
let local = if unit.mode.is_doc() {
// rustdoc does not have dep-info files.
let fingerprint = pkg_fingerprint(cx.bcx, unit.pkg)?;
vec![LocalFingerprint::Precalculated(fingerprint)]
} else {
let dep_info = dep_info_loc(cx, unit);
let dep_info = dep_info.strip_prefix(&target_root).unwrap().to_path_buf();
vec![LocalFingerprint::CheckDepInfo { dep_info }]
} else {
let fingerprint = pkg_fingerprint(cx.bcx, unit.pkg)?;
vec![LocalFingerprint::Precalculated(fingerprint)]
};

// Figure out what the outputs of our unit is, and we'll be storing them
Expand Down Expand Up @@ -1128,12 +1271,6 @@ fn calculate_normal<'a, 'cfg>(
})
}

/// Whether or not the fingerprint should track the dependencies from the
/// dep-info file for this unit.
fn use_dep_info(unit: &Unit<'_>) -> bool {
!unit.mode.is_doc()
}

/// Calculate a fingerprint for an "execute a build script" unit. This is an
/// internal helper of `calculate`, don't call directly.
fn calculate_run_custom_build<'a, 'cfg>(
Expand Down Expand Up @@ -1412,7 +1549,10 @@ fn compare_old_fingerprint(
let old_fingerprint_json = paths::read(&loc.with_extension("json"))?;
let old_fingerprint: Fingerprint = serde_json::from_str(&old_fingerprint_json)
.chain_err(|| internal("failed to deserialize json"))?;
debug_assert_eq!(util::to_hex(old_fingerprint.hash()), old_fingerprint_short);
// Fingerprint can be empty after a failed rebuild (see comment in prepare_target).
if !old_fingerprint_short.is_empty() {
debug_assert_eq!(util::to_hex(old_fingerprint.hash()), old_fingerprint_short);
}
let result = new_fingerprint.compare(&old_fingerprint);
assert!(result.is_err());
result
Expand Down Expand Up @@ -1588,7 +1728,8 @@ impl DepInfoPathType {
/// included. If it is false, then package-relative paths are skipped and
/// ignored (typically used for registry or git dependencies where we assume
/// the source never changes, and we don't want the cost of running `stat` on
/// all those files).
/// all those files). See the module-level docs for the note about
/// `-Zbinary-dep-depinfo` for more details on why this is done.
///
/// The serialized Cargo format will contain a list of files, all of which are
/// relative if they're under `root`. or absolute if they're elsewhere.
Expand Down
Loading

0 comments on commit 53b1c48

Please sign in to comment.