Skip to content

Commit f5e991b

Browse files
committed
Expand the documentation for the std::sync module
Provides an overview on why synchronization is required, as well a short summary of what sync primitives are available.
1 parent 4415195 commit f5e991b

File tree

1 file changed

+119
-4
lines changed

1 file changed

+119
-4
lines changed

src/libstd/sync/mod.rs

Lines changed: 119 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -10,10 +10,125 @@
1010

1111
//! Useful synchronization primitives.
1212
//!
13-
//! This module contains useful safe and unsafe synchronization primitives.
14-
//! Most of the primitives in this module do not provide any sort of locking
15-
//! and/or blocking at all, but rather provide the necessary tools to build
16-
//! other types of concurrent primitives.
13+
//! ## The need for synchronization
14+
//!
15+
//! On an ideal single-core CPU, the timeline of events happening in a program
16+
//! is linear, consistent with the order of operations in the code.
17+
//!
18+
//! Considering the following code, operating on some global static variables:
19+
//!
20+
//! ```rust
21+
//! # static mut A: u32 = 0;
22+
//! # static mut B: u32 = 0;
23+
//! # static mut C: u32 = 0;
24+
//! # unsafe {
25+
//! A = 3;
26+
//! B = 4;
27+
//! A = A + B;
28+
//! C = B;
29+
//! println!("{} {} {}", A, B, C);
30+
//! C = A;
31+
//! # }
32+
//! ```
33+
//!
34+
//! It appears _as if_ some variables stored in memory are changed, an addition
35+
//! is performed, result is stored in A and the variable C is modified twice.
36+
//! When only a single thread is involved, the results are as expected:
37+
//! the line `7 4 4` gets printed.
38+
//!
39+
//! As for what happens behind the scenes, when an optimizing compiler is used
40+
//! the final generated machine code might look very different from the code:
41+
//!
42+
//! - first store to `C` might be moved before the store to `A` or `B`,
43+
//! _as if_ we had written `C = 4; A = 3; B = 4;`
44+
//!
45+
//! - last store to `C` might be removed, since we never read from it again.
46+
//!
47+
//! - assignment of `A + B` to `A` might be removed, the sum can be stored in a
48+
//! in a register until it gets printed, and the global variable never gets
49+
//! updated.
50+
//!
51+
//! - the final result could be determined just by looking at the code at compile time,
52+
//! so [constant folding] might turn the whole block into a simple `println!("7 4 4")`
53+
//!
54+
//! The compiler is allowed to perform any combination of these optimizations, as long
55+
//! as the final optimized code, when executed, produces the same results as the one
56+
//! without optimizations.
57+
//!
58+
//! When multiprocessing is involved (either multiple CPU cores, or multiple
59+
//! physical CPUs), access to global variables (which are shared between threads)
60+
//! could lead to nondeterministic results, **even if** compiler optimizations
61+
//! are disabled.
62+
//!
63+
//! Note that thanks to Rust's safety guarantees, accessing global (static)
64+
//! variables requires `unsafe` code, assuming we don't use any of the
65+
//! synchronization primitives in this module.
66+
//!
67+
//! [constant folding]: https://en.wikipedia.org/wiki/Constant_folding
68+
//!
69+
//! ## Out-of-order execution
70+
//!
71+
//! Instructions can execute in a different order from the one we define, due to
72+
//! various reasons:
73+
//!
74+
//! - **Compiler** reordering instructions: if the compiler can issue an
75+
//! instruction at an earlier point, it will try to do so. For example, it
76+
//! might hoist memory loads at the top of a code block, so that the CPU can
77+
//! start [prefetching] the values from memory.
78+
//!
79+
//! In single-threaded scenarios, this can cause issues when writing signal handlers
80+
//! or certain kinds of low-level code.
81+
//! Use [compiler fences] to prevent this reordering.
82+
//!
83+
//! - **Single processor** executing instructions [out-of-order]: modern CPUs are
84+
//! capable of [superscalar] execution, i.e. multiple instructions might be
85+
//! executing at the same time, even though the machine code describes a
86+
//! sequential process.
87+
//!
88+
//! This kind of reordering is handled transparently by the CPU.
89+
//!
90+
//! - **Multiprocessor** system, where multiple hardware threads run at the same time.
91+
//! In multi-threaded scenarios, you can use two kinds of primitives to deal
92+
//! with synchronization:
93+
//! - [memory fences] to ensure memory accesses are made visibile to other
94+
//! CPUs in the right order.
95+
//! - [atomic operations] to ensure simultaneous access to the same memory
96+
//! location doesn't lead to undefined behavior.
97+
//!
98+
//! [prefetching]: https://en.wikipedia.org/wiki/Cache_prefetching
99+
//! [compiler fences]: atomic::compiler_fence
100+
//! [out-of-order]: https://en.wikipedia.org/wiki/Out-of-order_execution
101+
//! [superscalar]: https://en.wikipedia.org/wiki/Superscalar_processor
102+
//! [memory fences]: atomic::fence
103+
//! [atomics operations]: atomic
104+
//!
105+
//! ## Higher-level synchronization objects
106+
//!
107+
//! Most of the low-level synchronization primitives are quite error-prone and
108+
//! inconvenient to use, which is why the standard library also exposes some
109+
//! higher-level synchronization objects.
110+
//!
111+
//! These abstractions can be built out of lower-level primitives. For efficiency,
112+
//! the sync objects in the standard library are usually implemented with help
113+
//! from the operating system's kernel, which is able to reschedule the threads
114+
//! while they are blocked on acquiring a lock.
115+
//!
116+
//! ## Efficiency
117+
//!
118+
//! Higher-level synchronization mechanisms are usually heavy-weight.
119+
//! While most atomic operations can execute instantaneously, acquiring a
120+
//! [`Mutex`] can involve blocking until another thread releases it.
121+
//! For [`RwLock`], while! any number of readers may acquire it without
122+
//! blocking, each writer will have exclusive access.
123+
//!
124+
//! On the other hand, communication over [channels] can provide a fairly
125+
//! high-level interface without sacrificing performance, at the cost of
126+
//! somewhat more memory.
127+
//!
128+
//! The more synchronization exists between CPUs, the smaller the performance
129+
//! gains from multithreading will be.
130+
//!
131+
//! [channels]: mpsc
17132
18133
#![stable(feature = "rust1", since = "1.0.0")]
19134

0 commit comments

Comments
 (0)