Auto merge of #8087 - ehuss:freshness-interrupted2, r=alexcrichton

Fix freshness when linking is interrupted. Fixes a scenario where hitting Ctrl-C while linking would leave a corrupted executable, but Cargo would think it is "fresh" and fail to rebuild it. This also includes a separate commit which adds more documentation on fingerprinting. Fixes #7767
rust-lang · Apr 10, 2020 · 53b1c48 · 53b1c48
2 parents 7d720ef + 14e86cc
commit 53b1c48
Show file tree

Hide file tree

Showing 3 changed files with 288 additions and 47 deletions.
diff --git a/src/cargo/core/compiler/context/compilation_files.rs b/src/cargo/core/compiler/context/compilation_files.rs
@@ -13,14 +13,20 @@ use crate::core::compiler::{CompileMode, CompileTarget, Unit};
 use crate::core::{Target, TargetKind, Workspace};
 use crate::util::{self, CargoResult};
 
-/// The `Metadata` is a hash used to make unique file names for each unit in a build.
+/// The `Metadata` is a hash used to make unique file names for each unit in a
+/// build. It is also use for symbol mangling.
+///
 /// For example:
 /// - A project may depend on crate `A` and crate `B`, so the package name must be in the file name.
 /// - Similarly a project may depend on two versions of `A`, so the version must be in the file name.
+///
 /// In general this must include all things that need to be distinguished in different parts of
 /// the same build. This is absolutely required or we override things before
 /// we get chance to use them.
 ///
+/// It is also used for symbol mangling, because if you have two versions of
+/// the same crate linked together, their symbols need to be differentiated.
+///
 /// We use a hash because it is an easy way to guarantee
 /// that all the inputs can be converted to a valid path.
 ///
@@ -39,6 +45,15 @@ use crate::util::{self, CargoResult};
 /// more space than needed. This makes not including something in `Metadata`
 /// a form of cache invalidation.
 ///
+/// You should also avoid anything that would interfere with reproducible
+/// builds. For example, *any* absolute path should be avoided. This is one
+/// reason that `RUSTFLAGS` is not in `Metadata`, because it often has
+/// absolute paths (like `--remap-path-prefix` which is fundamentally used for
+/// reproducible builds and has absolute paths in it). Also, in some cases the
+/// mangled symbols need to be stable between different builds with different
+/// settings. For example, profile-guided optimizations need to swap
+/// `RUSTFLAGS` between runs, but needs to keep the same symbol names.
+///
 /// Note that the `Fingerprint` is in charge of tracking everything needed to determine if a
 /// rebuild is needed.
 #[derive(Copy, Clone, Hash, Eq, PartialEq, Ord, PartialOrd)]

diff --git a/src/cargo/core/compiler/fingerprint.rs b/src/cargo/core/compiler/fingerprint.rs
@@ -5,23 +5,30 @@
 //! (needs to be recompiled) or "fresh" (it does not need to be recompiled).
 //! There are several mechanisms that influence a Unit's freshness:
 //!
-//! - The `Metadata` hash isolates each Unit on the filesystem by being
-//!   embedded in the filename. If something in the hash changes, then the
-//!   output files will be missing, and the Unit will be dirty (missing
-//!   outputs are considered "dirty").
-//! - The `Fingerprint` is another hash, saved to the filesystem in the
-//!   `.fingerprint` directory, that tracks information about the inputs to a
-//!   Unit. If any of the inputs changes from the last compilation, then the
-//!   Unit is considered dirty. A missing fingerprint (such as during the
-//!   first build) is also considered dirty.
-//! - Whether or not input files are actually present. For example a build
-//!   script which says it depends on a nonexistent file `foo` is always rerun.
-//! - Propagation throughout the dependency graph of file modification time
-//!   information, used to detect changes on the filesystem. Each `Fingerprint`
-//!   keeps track of what files it'll be processing, and when necessary it will
-//!   check the `mtime` of each file (last modification time) and compare it to
-//!   dependencies and output to see if files have been changed or if a change
-//!   needs to force recompiles of downstream dependencies.
+//! - The `Fingerprint` is a hash, saved to the filesystem in the
+//!   `.fingerprint` directory, that tracks information about the Unit. If the
+//!   fingerprint is missing (such as the first time the unit is being
+//!   compiled), then the unit is dirty. If any of the fingerprint fields
+//!   change (like the name of the source file), then the Unit is considered
+//!   dirty.
+//!
+//!   The `Fingerprint` also tracks the fingerprints of all its dependencies,
+//!   so a change in a dependency will propagate the "dirty" status up.
+//!
+//! - Filesystem mtime tracking is also used to check if a unit is dirty.
+//!   See the section below on "Mtime comparison" for more details. There
+//!   are essentially two parts to mtime tracking:
+//!
+//!   1. The mtime of a Unit's output files is compared to the mtime of all
+//!      its dependencies' output file mtimes (see `check_filesystem`). If any
+//!      output is missing, or is older than a dependency's output, then the
+//!      unit is dirty.
+//!   2. The mtime of a Unit's source files is compared to the mtime of its
+//!      dep-info file in the fingerprint directory (see `find_stale_file`).
+//!      The dep-info file is used as an anchor to know when the last build of
+//!      the unit was done. See the "dep-info files" section below for more
+//!      details. If any input files are missing, or are newer than the
+//!      dep-info, then the unit is dirty.
 //!
 //! Note: Fingerprinting is not a perfect solution. Filesystem mtime tracking
 //! is notoriously imprecise and problematic. Only a small part of the
@@ -33,11 +40,16 @@
 //!
 //! ## Fingerprints and Metadata
 //!
+//! The `Metadata` hash is a hash added to the output filenames to isolate
+//! each unit. See the documentation in the `compilation_files` module for
+//! more details. NOTE: Not all output files are isolated via filename hashes
+//! (like dylibs), but the fingerprint directory always has the `Metadata`
+//! hash in its directory name.
+//!
 //! Fingerprints and Metadata are similar, and track some of the same things.
 //! The Metadata contains information that is required to keep Units separate.
 //! The Fingerprint includes additional information that should cause a
-//! recompile, but it is desired to reuse the same filenames. Generally the
-//! items in the Metadata do not need to be in the Fingerprint. A comparison
+//! recompile, but it is desired to reuse the same filenames. A comparison
 //! of what is tracked:
 //!
 //! Value                                      | Fingerprint | Metadata
@@ -54,8 +66,7 @@
 //! __CARGO_DEFAULT_LIB_METADATA[^4]           |             | ✓
 //! package_id                                 |             | ✓
 //! authors, description, homepage, repo       | ✓           |
-//! Target src path                            | ✓           |
-//! Target path relative to ws                 | ✓           |
+//! Target src path relative to ws             | ✓           |
 //! Target flags (test/bench/for_host/edition) | ✓           |
 //! -C incremental=… flag                      | ✓           |
 //! mtime of sources                           | ✓[^3]       |
@@ -64,12 +75,19 @@
 //!
 //! [^1]: Build script and bin dependencies are not included.
 //!
-//! [^3]: The mtime is only tracked for workspace members and path
-//!       dependencies. Git dependencies track the git revision.
+//! [^3]: See below for details on mtime tracking.
 //!
 //! [^4]: `__CARGO_DEFAULT_LIB_METADATA` is set by rustbuild to embed the
 //!        release channel (bootstrap/stable/beta/nightly) in libstd.
 //!
+//! When deciding what should go in the Metadata vs the Fingerprint, consider
+//! that some files (like dylibs) do not have a hash in their filename. Thus,
+//! if a value changes, only the fingerprint will detect the change (consider,
+//! for example, swapping between different features). Fields that are only in
+//! Metadata generally aren't relevant to the fingerprint because they
+//! fundamentally change the output (like target vs host changes the directory
+//! where it is emitted).
+//!
 //! ## Fingerprint files
 //!
 //! Fingerprint information is stored in the
@@ -83,9 +101,7 @@
 //!   `CARGO_LOG=cargo::core::compiler::fingerprint=trace cargo build` can be
 //!   used to display this log information.
 //! - A "dep-info" file which contains a list of source filenames for the
-//!   target. This is produced by reading the output of `rustc
-//!   --emit=dep-info` and packing it into a condensed format. Cargo uses this
-//!   to check the mtime of every file to see if any of them have changed.
+//!   target. See below for details.
 //! - An `invoked.timestamp` file whose filesystem mtime is updated every time
 //!   the Unit is built. This is an experimental feature used for cleaning
 //!   unused artifacts.
@@ -110,6 +126,103 @@
 //! all dependencies, when it is updated, by using `Arc` clones, it
 //! automatically picks up the updates to its dependencies.
 //!
+//! ### dep-info files
+//!
+//! Cargo passes the `--emit=dep-info` flag to `rustc` so that `rustc` will
+//! generate a "dep info" file (with the `.d` extension). This is a
+//! Makefile-like syntax that includes all of the source files used to build
+//! the crate. This file is used by Cargo to know which files to check to see
+//! if the crate will need to be rebuilt.
+//!
+//! After `rustc` exits successfully, Cargo will read the dep info file and
+//! translate it into a binary format that is stored in the fingerprint
+//! directory (`translate_dep_info`). The mtime of the fingerprint dep-info
+//! file itself is used as the reference for comparing the source files to
+//! determine if any of the source files have been modified (see below for
+//! more detail).
+//!
+//! There is also a third dep-info file. Cargo will extend the file created by
+//! rustc with some additional information and saves this into the output
+//! directory. This is intended for build system integration. See the
+//! `output_depinfo` module for more detail.
+//!
+//! #### -Zbinary-dep-depinfo
+//!
+//! `rustc` has an experimental flag `-Zbinary-dep-depinfo`. This causes
+//! `rustc` to include binary files (like rlibs) in the dep-info file. This is
+//! primarily to support rustc development, so that Cargo can check the
+//! implicit dependency to the standard library (which lives in the sysroot).
+//! We want Cargo to recompile whenever the standard library rlib/dylibs
+//! change, and this is a generic mechanism to make that work.
+//!
+//! ### Mtime comparison
+//!
+//! The use of modification timestamps is the most common way a unit will be
+//! determined to be dirty or fresh between builds. There are many subtle
+//! issues and edge cases with mtime comparisons. This gives a high-level
+//! overview, but you'll need to read the code for the gritty details. Mtime
+//! handling is different for different unit kinds. The different styles are
+//! driven by the `Fingerprint.local` field, which is set based on the unit
+//! kind.
+//!
+//! The status of whether or not the mtime is "stale" or "up-to-date" is
+//! stored in `Fingerprint.fs_status`.
+//!
+//! All units will compare the mtime of its newest output file with the mtimes
+//! of the outputs of all its dependencies. If any output file is missing,
+//! then the unit is stale. If any dependency is newer, the unit is stale.
+//!
+//! #### Normal package mtime handling
+//!
+//! `LocalFingerprint::CheckDepinfo` is used for checking the mtime of
+//! packages. It compares the mtime of the input files (the source files) to
+//! the mtime of the dep-info file (which is written last after a build is
+//! finished). If the dep-info is missing, the unit is stale (it has never
+//! been built). The list of input files comes from the dep-info file. See the
+//! section above for details on dep-info files.
+//!
+//! Also note that although registry and git packages use `CheckDepInfo`, none
+//! of their source files are included in the dep-info (see
+//! `translate_dep_info`), so for those kinds no mtime checking is done
+//! (unless `-Zbinary-dep-depinfo` is used). Repository and git packages are
+//! static, so there is no need to check anything.
+//!
+//! When a build is complete, the mtime of the dep-info file in the
+//! fingerprint directory is modified to rewind it to the time when the build
+//! started. This is done by creating an `invoked.timestamp` file when the
+//! build starts to capture the start time. The mtime is rewound to the start
+//! to handle the case where the user modifies a source file while a build is
+//! running. Cargo can't know whether or not the file was included in the
+//! build, so it takes a conservative approach of assuming the file was *not*
+//! included, and it should be rebuilt during the next build.
+//!
+//! #### Rustdoc mtime handling
+//!
+//! Rustdoc does not emit a dep-info file, so Cargo currently has a relatively
+//! simple system for detecting rebuilds. `LocalFingerprint::Precalculated` is
+//! used for rustdoc units. For registry packages, this is the package
+//! version. For git packages, it is the git hash. For path packages, it is
+//! the a string of the mtime of the newest file in the package.
+//!
+//! There are some known bugs with how this works, so it should be improved at
+//! some point.
+//!
+//! #### Build script mtime handling
+//!
+//! Build script mtime handling runs in different modes. There is the "old
+//! style" where the build script does not emit any `rerun-if` directives. In
+//! this mode, Cargo will use `LocalFingerprint::Precalculated`. See the
+//! "rustdoc" section above how it works.
+//!
+//! In the new-style, each `rerun-if` directive is translated to the
+//! corresponding `LocalFingerprint` variant. The `RerunIfChanged` variant
+//! compares the mtime of the given filenames against the mtime of the
+//! "output" file.
+//!
+//! Similar to normal units, the build script "output" file mtime is rewound
+//! to the time just before the build script is executed to handle mid-build
+//! modifications.
+//!
 //! ## Considerations for inclusion in a fingerprint
 //!
 //! Over time we've realized a few items which historically were included in
@@ -277,6 +390,40 @@ pub fn prepare_target<'a, 'cfg>(
         return Ok(Job::new(Work::noop(), Fresh));
     }
 
+    // Clear out the old fingerprint file if it exists. This protects when
+    // compilation is interrupted leaving a corrupt file. For example, a
+    // project with a lib.rs and integration test (two units):
+    //
+    // 1. Build the library and integration test.
+    // 2. Make a change to lib.rs (NOT the integration test).
+    // 3. Build the integration test, hit Ctrl-C while linking. With gcc, this
+    //    will leave behind an incomplete executable (zero size, or partially
+    //    written). NOTE: The library builds successfully, it is the linking
+    //    of the integration test that we are interrupting.
+    // 4. Build the integration test again.
+    //
+    // Without the following line, then step 3 will leave a valid fingerprint
+    // on the disk. Then step 4 will think the integration test is "fresh"
+    // because:
+    //
+    // - There is a valid fingerprint hash on disk (written in step 1).
+    // - The mtime of the output file (the corrupt integration executable
+    //   written in step 3) is newer than all of its dependencies.
+    // - The mtime of the integration test fingerprint dep-info file (written
+    //   in step 1) is newer than the integration test's source files, because
+    //   we haven't modified any of its source files.
+    //
+    // But the executable is corrupt and needs to be rebuilt. Clearing the
+    // fingerprint at step 3 ensures that Cargo never mistakes a partially
+    // written output as up-to-date.
+    if loc.exists() {
+        // Truncate instead of delete so that compare_old_fingerprint will
+        // still log the reason for the fingerprint failure instead of just
+        // reporting "failed to read fingerprint" during the next build if
+        // this build fails.
+        paths::write(&loc, b"")?;
+    }
+
     let write_fingerprint = if unit.mode.is_run_custom_build() {
         // For build scripts the `local` field of the fingerprint may change
         // while we're executing it. For example it could be in the legacy
@@ -484,9 +631,8 @@ impl<'de> Deserialize<'de> for DepFingerprint {
 #[derive(Debug, Serialize, Deserialize, Hash)]
 enum LocalFingerprint {
     /// This is a precalculated fingerprint which has an opaque string we just
-    /// hash as usual. This variant is primarily used for git/crates.io
-    /// dependencies where the source never changes so we can quickly conclude
-    /// that there's some string we can hash and it won't really change much.
+    /// hash as usual. This variant is primarily used for rustdoc where we
+    /// don't have a dep-info file to compare against.
     ///
     /// This is also used for build scripts with no `rerun-if-*` statements, but
     /// that's overall a mistake and causes bugs in Cargo. We shouldn't use this
@@ -1072,19 +1218,16 @@ fn calculate_normal<'a, 'cfg>(
         .collect::<CargoResult<Vec<_>>>()?;
     deps.sort_by(|a, b| a.pkg_id.cmp(&b.pkg_id));
 
-    // Afterwards calculate our own fingerprint information. We specially
-    // handle `path` packages to ensure we track files on the filesystem
-    // correctly, but otherwise upstream packages like from crates.io or git
-    // get bland fingerprints because they don't change without their
-    // `PackageId` changing.
+    // Afterwards calculate our own fingerprint information.
     let target_root = target_root(cx);
-    let local = if use_dep_info(unit) {
+    let local = if unit.mode.is_doc() {
+        // rustdoc does not have dep-info files.
+        let fingerprint = pkg_fingerprint(cx.bcx, unit.pkg)?;
+        vec![LocalFingerprint::Precalculated(fingerprint)]
+    } else {
         let dep_info = dep_info_loc(cx, unit);
         let dep_info = dep_info.strip_prefix(&target_root).unwrap().to_path_buf();
         vec![LocalFingerprint::CheckDepInfo { dep_info }]
-    } else {
-        let fingerprint = pkg_fingerprint(cx.bcx, unit.pkg)?;
-        vec![LocalFingerprint::Precalculated(fingerprint)]
     };
 
     // Figure out what the outputs of our unit is, and we'll be storing them
@@ -1128,12 +1271,6 @@ fn calculate_normal<'a, 'cfg>(
     })
 }
 
-/// Whether or not the fingerprint should track the dependencies from the
-/// dep-info file for this unit.
-fn use_dep_info(unit: &Unit<'_>) -> bool {
-    !unit.mode.is_doc()
-}
-
 /// Calculate a fingerprint for an "execute a build script" unit.  This is an
 /// internal helper of `calculate`, don't call directly.
 fn calculate_run_custom_build<'a, 'cfg>(
@@ -1412,7 +1549,10 @@ fn compare_old_fingerprint(
     let old_fingerprint_json = paths::read(&loc.with_extension("json"))?;
     let old_fingerprint: Fingerprint = serde_json::from_str(&old_fingerprint_json)
         .chain_err(|| internal("failed to deserialize json"))?;
-    debug_assert_eq!(util::to_hex(old_fingerprint.hash()), old_fingerprint_short);
+    // Fingerprint can be empty after a failed rebuild (see comment in prepare_target).
+    if !old_fingerprint_short.is_empty() {
+        debug_assert_eq!(util::to_hex(old_fingerprint.hash()), old_fingerprint_short);
+    }
     let result = new_fingerprint.compare(&old_fingerprint);
     assert!(result.is_err());
     result
@@ -1588,7 +1728,8 @@ impl DepInfoPathType {
 /// included. If it is false, then package-relative paths are skipped and
 /// ignored (typically used for registry or git dependencies where we assume
 /// the source never changes, and we don't want the cost of running `stat` on
-/// all those files).
+/// all those files). See the module-level docs for the note about
+/// `-Zbinary-dep-depinfo` for more details on why this is done.
 ///
 /// The serialized Cargo format will contain a list of files, all of which are
 /// relative if they're under `root`. or absolute if they're elsewhere.