Skip to content

Commit 05898e1

Browse files
committed
doc: enhancement of job_queue mod-level odc
1 parent f3de2de commit 05898e1

File tree

1 file changed

+93
-45
lines changed

1 file changed

+93
-45
lines changed

src/cargo/core/compiler/job_queue.rs

Lines changed: 93 additions & 45 deletions
Original file line numberDiff line numberDiff line change
@@ -1,16 +1,39 @@
1-
//! This module implements the job queue which determines the ordering in which
2-
//! rustc is spawned off. It also manages the allocation of jobserver tokens to
3-
//! rustc beyond the implicit token each rustc owns (i.e., the ones used for
4-
//! parallel LLVM work and parallel rustc threads).
1+
//! Management of the interaction between the main `cargo` and all spawned jobs.
2+
//!
3+
//! ## Overview
4+
//!
5+
//! This module implements a job queue. A job here represents a unit of work,
6+
//! which is roughly a rusc invocation, a build script run, or just a no-op.
7+
//! The job queue primarily handles the following things:
8+
//!
9+
//! * Spawns concurrent jobs. Depending on its [`Freshness`], a job could be
10+
//! either executed on a spawned thread or ran on the same thread to avoid
11+
//! the threading overhead.
12+
//! * Controls the number of concurrency. It allocates and manages [`jobserver`]
13+
//! tokens to each spawned off rustc and build scripts.
14+
//! * Manages the communication between the main `cargo` process and its
15+
//! spawned jobs. Those [`Message`]s are sent over a [`Queue`] shared
16+
//! across threads.
17+
//! * Schedules the execution order of each [`Job`]. Priorities are determined
18+
//! when calling [`JobQueue::enqueue`] to enqueue a job. The scheduling is
19+
//! relatively rudimentary and could likely be improved.
20+
//!
21+
//! A rough outline of building a queue and executing jobs is:
22+
//!
23+
//! 1. [`JobQueue::new`] to simply create one queue.
24+
//! 2. [`JobQueue::enqueue`] to add new jobs onto the queue.
25+
//! 3. Consumes the queue and executes all jobs via [`JobQueue::execute`].
26+
//!
27+
//! The primary loop happens insides [`JobQueue::execute`], which is effectively
28+
//! [`DrainState::drain_the_queue`]. [`DrainState`] is, as its name tells,
29+
//! the running state of the job queue getting drained.
30+
//!
31+
//! ## Jobserver
532
//!
633
//! Cargo and rustc have a somewhat non-trivial jobserver relationship with each
734
//! other, which is due to scaling issues with sharing a single jobserver
835
//! amongst what is potentially hundreds of threads of work on many-cored
9-
//! systems on (at least) linux, and likely other platforms as well.
10-
//!
11-
//! The details of this algorithm are (also) written out in
12-
//! src/librustc_jobserver/lib.rs. What follows is a description focusing on the
13-
//! Cargo side of things.
36+
//! systems on (at least) Linux, and likely other platforms as well.
1437
//!
1538
//! Cargo wants to complete the build as quickly as possible, fully saturating
1639
//! all cores (as constrained by the -j=N) parameter. Cargo also must not spawn
@@ -23,31 +46,78 @@
2346
//! implement prioritizes spawning as many crates (i.e., rustc processes) as
2447
//! possible, and then filling each rustc with tokens on demand.
2548
//!
26-
//! The primary loop is in `drain_the_queue` below.
27-
//!
28-
//! We integrate with the jobserver, originating from GNU make, to make sure
49+
//! We integrate with the [jobserver], originating from GNU make, to make sure
2950
//! that build scripts which use make to build C code can cooperate with us on
3051
//! the number of used tokens and avoid overfilling the system we're on.
3152
//!
3253
//! The jobserver is unfortunately a very simple protocol, so we enhance it a
3354
//! little when we know that there is a rustc on the other end. Via the stderr
34-
//! pipe we have to rustc, we get messages such as "NeedsToken" and
35-
//! "ReleaseToken" from rustc.
55+
//! pipe we have to rustc, we get messages such as `NeedsToken` and
56+
//! `ReleaseToken` from rustc.
3657
//!
37-
//! "NeedsToken" indicates that a rustc is interested in acquiring a token, but
38-
//! never that it would be impossible to make progress without one (i.e., it
39-
//! would be incorrect for rustc to not terminate due to an unfulfilled
40-
//! NeedsToken request); we do not usually fulfill all NeedsToken requests for a
58+
//! [`NeedsToken`] indicates that a rustc is interested in acquiring a token,
59+
//! but never that it would be impossible to make progress without one (i.e.,
60+
//! it would be incorrect for rustc to not terminate due to an unfulfilled
61+
//! `NeedsToken` request); we do not usually fulfill all `NeedsToken` requests for a
4162
//! given rustc.
4263
//!
43-
//! "ReleaseToken" indicates that a rustc is done with one of its tokens and is
44-
//! ready for us to re-acquire ownership -- we will either release that token
64+
//! [`ReleaseToken`] indicates that a rustc is done with one of its tokens and
65+
//! is ready for us to re-acquire ownership we will either release that token
4566
//! back into the general pool or reuse it ourselves. Note that rustc will
4667
//! inform us that it is releasing a token even if it itself is also requesting
47-
//! tokens; is is up to us whether to return the token to that same rustc.
68+
//! tokens; is up to us whether to return the token to that same rustc.
69+
//!
70+
//! `jobserver` also manages the allocation of tokens to rustc beyond
71+
//! the implicit token each rustc owns (i.e., the ones used for parallel LLVM
72+
//! work and parallel rustc threads).
73+
//!
74+
//! ## Scheduling
75+
//!
76+
//! The current scheduling algorithm is not really polished. It is simply based
77+
//! on a dependency graph [`DependencyQueue`]. We continue adding nodes onto
78+
//! the graph until we finalize it. When the graph gets finalized, it finds the
79+
//! sum of the cost of each dependencies of each node, including transitively.
80+
//! The sum of dependency cost turns out to be the cost of each given node.
81+
//!
82+
//! At the time being, the cost is just passed as a fixed placeholder in
83+
//! [`JobQueue::enqueue`]. In the future, we could explore more possibilities
84+
//! around it. For instance, we start persisting timing information for each
85+
//! build somewhere. For a subsequent build, we can look into the historical
86+
//! data and perform a PGO-like optimization to prioritize jobs, making a build
87+
//! fully pipelined.
4888
//!
49-
//! The current scheduling algorithm is relatively primitive and could likely be
50-
//! improved.
89+
//! ## Message queue
90+
//!
91+
//! Each spawned thread running a process uses the message queue [`Queue`] to
92+
//! send messages back to the main thread (the one running `cargo`).
93+
//! The main thread coordinates everything, and handles printing output.
94+
//!
95+
//! It is important to be careful which messages use [`push`] vs [`push_bounded`].
96+
//! `push` is for priority messages (like tokens, or "finished") where the
97+
//! sender shouldn't block. We want to handle those so real work can proceed
98+
//! ASAP.
99+
//!
100+
//! `push_bounded` is only for messages being printed to stdout/stderr. Being
101+
//! bounded prevents a flood of messages causing a large amount of memory
102+
//! being used.
103+
//!
104+
//! `push` also avoids blocking which helps avoid deadlocks. For example, when
105+
//! the diagnostic server thread is dropped, it waits for the thread to exit.
106+
//! But if the thread is blocked on a full queue, and there is a critical
107+
//! error, the drop will deadlock. This should be fixed at some point in the
108+
//! future. The jobserver thread has a similar problem, though it will time
109+
//! out after 1 second.
110+
//!
111+
//! To access the message queue, each running `Job` is given its own [`JobState`],
112+
//! containing everything it needs to communicate with the main thread.
113+
//!
114+
//! See [`Message`] for all available message kinds.
115+
//!
116+
//! [jobserver]: https://docs.rs/jobserver
117+
//! [`NeedsToken`]: Message::NeedsToken
118+
//! [`ReleaseToken`]: Message::ReleaseToken
119+
//! [`push`]: Queue::push
120+
//! [`push_bounded`]: Queue::push_bounded
51121
52122
use std::cell::{Cell, RefCell};
53123
use std::collections::{BTreeMap, HashMap, HashSet};
@@ -100,28 +170,6 @@ pub struct JobQueue<'cfg> {
100170
///
101171
/// It is created from JobQueue when we have fully assembled the crate graph
102172
/// (i.e., all package dependencies are known).
103-
///
104-
/// # Message queue
105-
///
106-
/// Each thread running a process uses the message queue to send messages back
107-
/// to the main thread. The main thread coordinates everything, and handles
108-
/// printing output.
109-
///
110-
/// It is important to be careful which messages use `push` vs `push_bounded`.
111-
/// `push` is for priority messages (like tokens, or "finished") where the
112-
/// sender shouldn't block. We want to handle those so real work can proceed
113-
/// ASAP.
114-
///
115-
/// `push_bounded` is only for messages being printed to stdout/stderr. Being
116-
/// bounded prevents a flood of messages causing a large amount of memory
117-
/// being used.
118-
///
119-
/// `push` also avoids blocking which helps avoid deadlocks. For example, when
120-
/// the diagnostic server thread is dropped, it waits for the thread to exit.
121-
/// But if the thread is blocked on a full queue, and there is a critical
122-
/// error, the drop will deadlock. This should be fixed at some point in the
123-
/// future. The jobserver thread has a similar problem, though it will time
124-
/// out after 1 second.
125173
struct DrainState<'cfg> {
126174
// This is the length of the DependencyQueue when starting out
127175
total_units: usize,

0 commit comments

Comments
 (0)