Skip to content

Commit cc53eca

Browse files
committed
Add some more exposition on fingerprints.
1 parent da54d6b commit cc53eca

File tree

1 file changed

+118
-28
lines changed

1 file changed

+118
-28
lines changed

src/cargo/core/compiler/fingerprint.rs

Lines changed: 118 additions & 28 deletions
Original file line numberDiff line numberDiff line change
@@ -36,8 +36,7 @@
3636
//! Fingerprints and Metadata are similar, and track some of the same things.
3737
//! The Metadata contains information that is required to keep Units separate.
3838
//! The Fingerprint includes additional information that should cause a
39-
//! recompile, but it is desired to reuse the same filenames. Generally the
40-
//! items in the Metadata do not need to be in the Fingerprint. A comparison
39+
//! recompile, but it is desired to reuse the same filenames. A comparison
4140
//! of what is tracked:
4241
//!
4342
//! Value | Fingerprint | Metadata
@@ -54,8 +53,7 @@
5453
//! __CARGO_DEFAULT_LIB_METADATA[^4] | | ✓
5554
//! package_id | | ✓
5655
//! authors, description, homepage, repo | ✓ |
57-
//! Target src path | ✓ |
58-
//! Target path relative to ws | ✓ |
56+
//! Target src path relative to ws | ✓ |
5957
//! Target flags (test/bench/for_host/edition) | ✓ |
6058
//! -C incremental=… flag | ✓ |
6159
//! mtime of sources | ✓[^3] |
@@ -64,12 +62,18 @@
6462
//!
6563
//! [^1]: Build script and bin dependencies are not included.
6664
//!
67-
//! [^3]: The mtime is only tracked for workspace members and path
68-
//! dependencies. Git dependencies track the git revision.
65+
//! [^3]: See below for details on mtime tracking.
6966
//!
7067
//! [^4]: `__CARGO_DEFAULT_LIB_METADATA` is set by rustbuild to embed the
7168
//! release channel (bootstrap/stable/beta/nightly) in libstd.
7269
//!
70+
//! When deciding what should go in the Metadata vs the Fingerprint, consider
71+
//! that some files (like dylibs) do not have a hash in their filename. Thus,
72+
//! if a value changes, only the fingerprint will detect the change. Fields
73+
//! that are only in Metadata generally aren't relevant to the fingerprint
74+
//! because they fundamentally change the output (like target vs host changes
75+
//! the directory where it is emitted).
76+
//!
7377
//! ## Fingerprint files
7478
//!
7579
//! Fingerprint information is stored in the
@@ -83,9 +87,7 @@
8387
//! `CARGO_LOG=cargo::core::compiler::fingerprint=trace cargo build` can be
8488
//! used to display this log information.
8589
//! - A "dep-info" file which contains a list of source filenames for the
86-
//! target. This is produced by reading the output of `rustc
87-
//! --emit=dep-info` and packing it into a condensed format. Cargo uses this
88-
//! to check the mtime of every file to see if any of them have changed.
90+
//! target. See below for details.
8991
//! - An `invoked.timestamp` file whose filesystem mtime is updated every time
9092
//! the Unit is built. This is an experimental feature used for cleaning
9193
//! unused artifacts.
@@ -110,6 +112,103 @@
110112
//! all dependencies, when it is updated, by using `Arc` clones, it
111113
//! automatically picks up the updates to its dependencies.
112114
//!
115+
//! ### dep-info files
116+
//!
117+
//! Cargo passes the `--emit=dep-info` flag to `rustc` so that `rustc` will
118+
//! generate a "dep info" file (with the `.d` extension). This is a
119+
//! Makefile-like syntax that includes all of the source files used to build
120+
//! the crate. This file is used by Cargo to know which files to check to see
121+
//! if the crate will need to be rebuilt.
122+
//!
123+
//! After `rustc` exits successfully, Cargo will read the dep info file and
124+
//! translate it into a binary format that is stored in the fingerprint
125+
//! directory (`translate_dep_info`). The mtime of the fingerprint dep-info
126+
//! file itself is used as the reference for comparing the source files to
127+
//! determine if any of the source files have been modified (see below for
128+
//! more detail).
129+
//!
130+
//! There is also a third dep-info file. Cargo will extend the file created by
131+
//! rustc with some additional information and saves this into the output
132+
//! directory. This is intended for build system integration. See the
133+
//! `output_depinfo` module for more detail.
134+
//!
135+
//! #### -Zbinary-dep-depinfo
136+
//!
137+
//! `rustc` has an experimental flag `-Zbinary-dep-depinfo`. This causes
138+
//! `rustc` to include binary files (like rlibs) in the dep-info file. This is
139+
//! primarily to support rustc development, so that Cargo can check the
140+
//! implicit dependency to the standard library (which lives in the sysroot).
141+
//! We want Cargo to recompile whenever the standard library rlib/dylibs
142+
//! change, and this is a generic mechanism to make that work.
143+
//!
144+
//! ### Mtime comparison
145+
//!
146+
//! The use of modification timestamps is the most common way a unit will be
147+
//! determined to be dirty or fresh between builds. There are many subtle
148+
//! issues and edge cases with mtime comparisons. This gives a high-level
149+
//! overview, but you'll need to read the code for the gritty details. Mtime
150+
//! handling is different for different unit kinds. The different styles are
151+
//! driven by the `Fingerprint.local` field, which is set based on the unit
152+
//! kind.
153+
//!
154+
//! The status of whether or not the mtime is "stale" or "up-to-date" is
155+
//! stored in `Fingerprint.fs_status`.
156+
//!
157+
//! All units will compare the mtime of its newest output file with the mtimes
158+
//! of the outputs of all its dependencies. If any output file is missing,
159+
//! then the unit is stale. If any dependency is newer, the unit is stale.
160+
//!
161+
//! #### Normal package mtime handling
162+
//!
163+
//! `LocalFingerprint::CheckDepinfo` is used for checking the mtime of
164+
//! packages. It compares the mtime of the input files (the source files) to
165+
//! the mtime of the dep-info file (which is written last after a build is
166+
//! finished). If the dep-info is missing, the unit is stale (it has never
167+
//! been built). The list of input files comes from the dep-info file. See the
168+
//! section above for details on dep-info files.
169+
//!
170+
//! Also note that although registry and git packages use `CheckDepInfo`, none
171+
//! of their source files are included in the dep-info (see
172+
//! `translate_dep_info`), so for those kinds no mtime checking is done
173+
//! (unless `-Zbinary-dep-depinfo` is used). Repository and git packages are
174+
//! static, so there is no need to check anything.
175+
//!
176+
//! When a build is complete, the mtime of the dep-info file in the
177+
//! fingerprint directory is modified to rewind it to the time when the build
178+
//! started. This is done by creating an `invoked.timestamp` file when the
179+
//! build starts to capture the start time. The mtime is rewound to the start
180+
//! to handle the case where the user modifies a source file while a build is
181+
//! running. Cargo can't know whether or not the file was included in the
182+
//! build, so it takes a conservative approach of assuming the file was *not*
183+
//! included, and it should be rebuilt during the next build.
184+
//!
185+
//! #### Rustdoc mtime handling
186+
//!
187+
//! Rustdoc does not emit a dep-info file, so Cargo currently has a relatively
188+
//! simple system for detecting rebuilds. `LocalFingerprint::Precalculated` is
189+
//! used for rustdoc units. For registry packages, this is the package
190+
//! version. For git packages, it is the git hash. For path packages, it is
191+
//! the a string of the mtime of the newest file in the package.
192+
//!
193+
//! There are some known bugs with how this works, so it should be improved at
194+
//! some point.
195+
//!
196+
//! #### Build script mtime handling
197+
//!
198+
//! Build script mtime handling runs in different modes. There is the "old
199+
//! style" where the build script does not emit any `rerun-if` directives. In
200+
//! this mode, Cargo will use `LocalFingerprint::Precalculated`. See the
201+
//! "rustdoc" section above how it works.
202+
//!
203+
//! In the new-style, each `rerun-if` directive is translated to the
204+
//! corresponding `LocalFingerprint` variant. The `RerunIfChanged` variant
205+
//! compares the mtime of the given filenames against the mtime of the
206+
//! "output" file.
207+
//!
208+
//! Similar to normal units, the build script "output" file mtime is rewound
209+
//! to the time just before the build script is executed to handle mid-build
210+
//! modifications.
211+
//!
113212
//! ## Considerations for inclusion in a fingerprint
114213
//!
115214
//! Over time we've realized a few items which historically were included in
@@ -484,9 +583,8 @@ impl<'de> Deserialize<'de> for DepFingerprint {
484583
#[derive(Debug, Serialize, Deserialize, Hash)]
485584
enum LocalFingerprint {
486585
/// This is a precalculated fingerprint which has an opaque string we just
487-
/// hash as usual. This variant is primarily used for git/crates.io
488-
/// dependencies where the source never changes so we can quickly conclude
489-
/// that there's some string we can hash and it won't really change much.
586+
/// hash as usual. This variant is primarily used for rustdoc where we
587+
/// don't have a dep-info file to compare against.
490588
///
491589
/// This is also used for build scripts with no `rerun-if-*` statements, but
492590
/// that's overall a mistake and causes bugs in Cargo. We shouldn't use this
@@ -1072,19 +1170,16 @@ fn calculate_normal<'a, 'cfg>(
10721170
.collect::<CargoResult<Vec<_>>>()?;
10731171
deps.sort_by(|a, b| a.pkg_id.cmp(&b.pkg_id));
10741172

1075-
// Afterwards calculate our own fingerprint information. We specially
1076-
// handle `path` packages to ensure we track files on the filesystem
1077-
// correctly, but otherwise upstream packages like from crates.io or git
1078-
// get bland fingerprints because they don't change without their
1079-
// `PackageId` changing.
1173+
// Afterwards calculate our own fingerprint information.
10801174
let target_root = target_root(cx);
1081-
let local = if use_dep_info(unit) {
1175+
let local = if unit.mode.is_doc() {
1176+
// rustdoc does not have dep-info files.
1177+
let fingerprint = pkg_fingerprint(cx.bcx, unit.pkg)?;
1178+
vec![LocalFingerprint::Precalculated(fingerprint)]
1179+
} else {
10821180
let dep_info = dep_info_loc(cx, unit);
10831181
let dep_info = dep_info.strip_prefix(&target_root).unwrap().to_path_buf();
10841182
vec![LocalFingerprint::CheckDepInfo { dep_info }]
1085-
} else {
1086-
let fingerprint = pkg_fingerprint(cx.bcx, unit.pkg)?;
1087-
vec![LocalFingerprint::Precalculated(fingerprint)]
10881183
};
10891184

10901185
// Figure out what the outputs of our unit is, and we'll be storing them
@@ -1128,12 +1223,6 @@ fn calculate_normal<'a, 'cfg>(
11281223
})
11291224
}
11301225

1131-
/// Whether or not the fingerprint should track the dependencies from the
1132-
/// dep-info file for this unit.
1133-
fn use_dep_info(unit: &Unit<'_>) -> bool {
1134-
!unit.mode.is_doc()
1135-
}
1136-
11371226
/// Calculate a fingerprint for an "execute a build script" unit. This is an
11381227
/// internal helper of `calculate`, don't call directly.
11391228
fn calculate_run_custom_build<'a, 'cfg>(
@@ -1588,7 +1677,8 @@ impl DepInfoPathType {
15881677
/// included. If it is false, then package-relative paths are skipped and
15891678
/// ignored (typically used for registry or git dependencies where we assume
15901679
/// the source never changes, and we don't want the cost of running `stat` on
1591-
/// all those files).
1680+
/// all those files). See the module-level docs for the note about
1681+
/// `-Zbinary-dep-depinfo` for more details on why this is done.
15921682
///
15931683
/// The serialized Cargo format will contain a list of files, all of which are
15941684
/// relative if they're under `root`. or absolute if they're elsewhere.

0 commit comments

Comments
 (0)