|
1 |
| -//! This module implements the job queue which determines the ordering in which |
2 |
| -//! rustc is spawned off. It also manages the allocation of jobserver tokens to |
3 |
| -//! rustc beyond the implicit token each rustc owns (i.e., the ones used for |
4 |
| -//! parallel LLVM work and parallel rustc threads). |
| 1 | +//! Management of the interaction between the main `cargo` and all spawned jobs. |
| 2 | +//! |
| 3 | +//! ## Overview |
| 4 | +//! |
| 5 | +//! This module implements a job queue. A job here represents a unit of work, |
| 6 | +//! which is roughly a rusc invocation, a build script run, or just a no-op. |
| 7 | +//! The job queue primarily handles the following things: |
| 8 | +//! |
| 9 | +//! * Spawns concurrent jobs. Depending on its [`Freshness`], a job could be |
| 10 | +//! either executed on a spawned thread or ran on the same thread to avoid |
| 11 | +//! the threading overhead. |
| 12 | +//! * Controls the number of concurrency. It allocates and manages [`jobserver`] |
| 13 | +//! tokens to each spawned off rustc and build scripts. |
| 14 | +//! * Manages the communication between the main `cargo` process and its |
| 15 | +//! spawned jobs. Those [`Message`]s are sent over a [`Queue`] shared |
| 16 | +//! across threads. |
| 17 | +//! * Schedules the execution order of each [`Job`]. Priorities are determined |
| 18 | +//! when calling [`JobQueue::enqueue`] to enqueue a job. The scheduling is |
| 19 | +//! relatively rudimentary and could likely be improved. |
| 20 | +//! |
| 21 | +//! A rough outline of building a queue and executing jobs is: |
| 22 | +//! |
| 23 | +//! 1. [`JobQueue::new`] to simply create one queue. |
| 24 | +//! 2. [`JobQueue::enqueue`] to add new jobs onto the queue. |
| 25 | +//! 3. Consumes the queue and executes all jobs via [`JobQueue::execute`]. |
| 26 | +//! |
| 27 | +//! The primary loop happens insides [`JobQueue::execute`], which is effectively |
| 28 | +//! [`DrainState::drain_the_queue`]. [`DrainState`] is, as its name tells, |
| 29 | +//! the running state of the job queue getting drained. |
| 30 | +//! |
| 31 | +//! ## Jobserver |
5 | 32 | //!
|
6 | 33 | //! Cargo and rustc have a somewhat non-trivial jobserver relationship with each
|
7 | 34 | //! other, which is due to scaling issues with sharing a single jobserver
|
8 | 35 | //! amongst what is potentially hundreds of threads of work on many-cored
|
9 |
| -//! systems on (at least) linux, and likely other platforms as well. |
10 |
| -//! |
11 |
| -//! The details of this algorithm are (also) written out in |
12 |
| -//! src/librustc_jobserver/lib.rs. What follows is a description focusing on the |
13 |
| -//! Cargo side of things. |
| 36 | +//! systems on (at least) Linux, and likely other platforms as well. |
14 | 37 | //!
|
15 | 38 | //! Cargo wants to complete the build as quickly as possible, fully saturating
|
16 | 39 | //! all cores (as constrained by the -j=N) parameter. Cargo also must not spawn
|
|
23 | 46 | //! implement prioritizes spawning as many crates (i.e., rustc processes) as
|
24 | 47 | //! possible, and then filling each rustc with tokens on demand.
|
25 | 48 | //!
|
26 |
| -//! The primary loop is in `drain_the_queue` below. |
27 |
| -//! |
28 |
| -//! We integrate with the jobserver, originating from GNU make, to make sure |
| 49 | +//! We integrate with the [jobserver], originating from GNU make, to make sure |
29 | 50 | //! that build scripts which use make to build C code can cooperate with us on
|
30 | 51 | //! the number of used tokens and avoid overfilling the system we're on.
|
31 | 52 | //!
|
32 | 53 | //! The jobserver is unfortunately a very simple protocol, so we enhance it a
|
33 | 54 | //! little when we know that there is a rustc on the other end. Via the stderr
|
34 |
| -//! pipe we have to rustc, we get messages such as "NeedsToken" and |
35 |
| -//! "ReleaseToken" from rustc. |
| 55 | +//! pipe we have to rustc, we get messages such as `NeedsToken` and |
| 56 | +//! `ReleaseToken` from rustc. |
36 | 57 | //!
|
37 |
| -//! "NeedsToken" indicates that a rustc is interested in acquiring a token, but |
38 |
| -//! never that it would be impossible to make progress without one (i.e., it |
39 |
| -//! would be incorrect for rustc to not terminate due to an unfulfilled |
40 |
| -//! NeedsToken request); we do not usually fulfill all NeedsToken requests for a |
| 58 | +//! [`NeedsToken`] indicates that a rustc is interested in acquiring a token, |
| 59 | +//! but never that it would be impossible to make progress without one (i.e., |
| 60 | +//! it would be incorrect for rustc to not terminate due to an unfulfilled |
| 61 | +//! `NeedsToken` request); we do not usually fulfill all `NeedsToken` requests for a |
41 | 62 | //! given rustc.
|
42 | 63 | //!
|
43 |
| -//! "ReleaseToken" indicates that a rustc is done with one of its tokens and is |
44 |
| -//! ready for us to re-acquire ownership -- we will either release that token |
| 64 | +//! [`ReleaseToken`] indicates that a rustc is done with one of its tokens and |
| 65 | +//! is ready for us to re-acquire ownership — we will either release that token |
45 | 66 | //! back into the general pool or reuse it ourselves. Note that rustc will
|
46 | 67 | //! inform us that it is releasing a token even if it itself is also requesting
|
47 |
| -//! tokens; is is up to us whether to return the token to that same rustc. |
| 68 | +//! tokens; is up to us whether to return the token to that same rustc. |
| 69 | +//! |
| 70 | +//! `jobserver` also manages the allocation of tokens to rustc beyond |
| 71 | +//! the implicit token each rustc owns (i.e., the ones used for parallel LLVM |
| 72 | +//! work and parallel rustc threads). |
| 73 | +//! |
| 74 | +//! ## Scheduling |
| 75 | +//! |
| 76 | +//! The current scheduling algorithm is not really polished. It is simply based |
| 77 | +//! on a dependency graph [`DependencyQueue`]. We continue adding nodes onto |
| 78 | +//! the graph until we finalize it. When the graph gets finalized, it finds the |
| 79 | +//! sum of the cost of each dependencies of each node, including transitively. |
| 80 | +//! The sum of dependency cost turns out to be the cost of each given node. |
| 81 | +//! |
| 82 | +//! At the time being, the cost is just passed as a fixed placeholder in |
| 83 | +//! [`JobQueue::enqueue`]. In the future, we could explore more possibilities |
| 84 | +//! around it. For instance, we start persisting timing information for each |
| 85 | +//! build somewhere. For a subsequent build, we can look into the historical |
| 86 | +//! data and perform a PGO-like optimization to prioritize jobs, making a build |
| 87 | +//! fully pipelined. |
48 | 88 | //!
|
49 |
| -//! The current scheduling algorithm is relatively primitive and could likely be |
50 |
| -//! improved. |
| 89 | +//! ## Message queue |
| 90 | +//! |
| 91 | +//! Each spawned thread running a process uses the message queue [`Queue`] to |
| 92 | +//! send messages back to the main thread (the one running `cargo`). |
| 93 | +//! The main thread coordinates everything, and handles printing output. |
| 94 | +//! |
| 95 | +//! It is important to be careful which messages use [`push`] vs [`push_bounded`]. |
| 96 | +//! `push` is for priority messages (like tokens, or "finished") where the |
| 97 | +//! sender shouldn't block. We want to handle those so real work can proceed |
| 98 | +//! ASAP. |
| 99 | +//! |
| 100 | +//! `push_bounded` is only for messages being printed to stdout/stderr. Being |
| 101 | +//! bounded prevents a flood of messages causing a large amount of memory |
| 102 | +//! being used. |
| 103 | +//! |
| 104 | +//! `push` also avoids blocking which helps avoid deadlocks. For example, when |
| 105 | +//! the diagnostic server thread is dropped, it waits for the thread to exit. |
| 106 | +//! But if the thread is blocked on a full queue, and there is a critical |
| 107 | +//! error, the drop will deadlock. This should be fixed at some point in the |
| 108 | +//! future. The jobserver thread has a similar problem, though it will time |
| 109 | +//! out after 1 second. |
| 110 | +//! |
| 111 | +//! To access the message queue, each running `Job` is given its own [`JobState`], |
| 112 | +//! containing everything it needs to communicate with the main thread. |
| 113 | +//! |
| 114 | +//! See [`Message`] for all available message kinds. |
| 115 | +//! |
| 116 | +//! [jobserver]: https://docs.rs/jobserver |
| 117 | +//! [`NeedsToken`]: Message::NeedsToken |
| 118 | +//! [`ReleaseToken`]: Message::ReleaseToken |
| 119 | +//! [`push`]: Queue::push |
| 120 | +//! [`push_bounded`]: Queue::push_bounded |
51 | 121 |
|
52 | 122 | use std::cell::{Cell, RefCell};
|
53 | 123 | use std::collections::{BTreeMap, HashMap, HashSet};
|
@@ -100,28 +170,6 @@ pub struct JobQueue<'cfg> {
|
100 | 170 | ///
|
101 | 171 | /// It is created from JobQueue when we have fully assembled the crate graph
|
102 | 172 | /// (i.e., all package dependencies are known).
|
103 |
| -/// |
104 |
| -/// # Message queue |
105 |
| -/// |
106 |
| -/// Each thread running a process uses the message queue to send messages back |
107 |
| -/// to the main thread. The main thread coordinates everything, and handles |
108 |
| -/// printing output. |
109 |
| -/// |
110 |
| -/// It is important to be careful which messages use `push` vs `push_bounded`. |
111 |
| -/// `push` is for priority messages (like tokens, or "finished") where the |
112 |
| -/// sender shouldn't block. We want to handle those so real work can proceed |
113 |
| -/// ASAP. |
114 |
| -/// |
115 |
| -/// `push_bounded` is only for messages being printed to stdout/stderr. Being |
116 |
| -/// bounded prevents a flood of messages causing a large amount of memory |
117 |
| -/// being used. |
118 |
| -/// |
119 |
| -/// `push` also avoids blocking which helps avoid deadlocks. For example, when |
120 |
| -/// the diagnostic server thread is dropped, it waits for the thread to exit. |
121 |
| -/// But if the thread is blocked on a full queue, and there is a critical |
122 |
| -/// error, the drop will deadlock. This should be fixed at some point in the |
123 |
| -/// future. The jobserver thread has a similar problem, though it will time |
124 |
| -/// out after 1 second. |
125 | 173 | struct DrainState<'cfg> {
|
126 | 174 | // This is the length of the DependencyQueue when starting out
|
127 | 175 | total_units: usize,
|
|
0 commit comments