|
10 | 10 |
|
11 | 11 | //! Useful synchronization primitives.
|
12 | 12 | //!
|
13 |
| -//! This module contains useful safe and unsafe synchronization primitives. |
14 |
| -//! Most of the primitives in this module do not provide any sort of locking |
15 |
| -//! and/or blocking at all, but rather provide the necessary tools to build |
16 |
| -//! other types of concurrent primitives. |
| 13 | +//! ## The need for synchronization |
| 14 | +//! |
| 15 | +//! On an ideal single-core CPU, the timeline of events happening in a program |
| 16 | +//! is linear, consistent with the order of operations in the code. |
| 17 | +//! |
| 18 | +//! Considering the following code, operating on some global static variables: |
| 19 | +//! |
| 20 | +//! ```rust |
| 21 | +//! # static mut A: u32 = 0; |
| 22 | +//! # static mut B: u32 = 0; |
| 23 | +//! # static mut C: u32 = 0; |
| 24 | +//! # unsafe { |
| 25 | +//! A = 3; |
| 26 | +//! B = 4; |
| 27 | +//! A = A + B; |
| 28 | +//! C = B; |
| 29 | +//! println!("{} {} {}", A, B, C); |
| 30 | +//! C = A; |
| 31 | +//! # } |
| 32 | +//! ``` |
| 33 | +//! |
| 34 | +//! It appears _as if_ some variables stored in memory are changed, an addition |
| 35 | +//! is performed, result is stored in A and the variable C is modified twice. |
| 36 | +//! When only a single thread is involved, the results are as expected: |
| 37 | +//! the line `7 4 4` gets printed. |
| 38 | +//! |
| 39 | +//! As for what happens behind the scenes, when an optimizing compiler is used |
| 40 | +//! the final generated machine code might look very different from the code: |
| 41 | +//! |
| 42 | +//! - first store to `C` might be moved before the store to `A` or `B`, |
| 43 | +//! _as if_ we had written `C = 4; A = 3; B = 4;` |
| 44 | +//! |
| 45 | +//! - last store to `C` might be removed, since we never read from it again. |
| 46 | +//! |
| 47 | +//! - assignment of `A + B` to `A` might be removed, the sum can be stored in a |
| 48 | +//! in a register until it gets printed, and the global variable never gets |
| 49 | +//! updated. |
| 50 | +//! |
| 51 | +//! - the final result could be determined just by looking at the code at compile time, |
| 52 | +//! so [constant folding] might turn the whole block into a simple `println!("7 4 4")` |
| 53 | +//! |
| 54 | +//! The compiler is allowed to perform any combination of these optimizations, as long |
| 55 | +//! as the final optimized code, when executed, produces the same results as the one |
| 56 | +//! without optimizations. |
| 57 | +//! |
| 58 | +//! When multiprocessing is involved (either multiple CPU cores, or multiple |
| 59 | +//! physical CPUs), access to global variables (which are shared between threads) |
| 60 | +//! could lead to nondeterministic results, **even if** compiler optimizations |
| 61 | +//! are disabled. |
| 62 | +//! |
| 63 | +//! Note that thanks to Rust's safety guarantees, accessing global (static) |
| 64 | +//! variables requires `unsafe` code, assuming we don't use any of the |
| 65 | +//! synchronization primitives in this module. |
| 66 | +//! |
| 67 | +//! [constant folding]: https://en.wikipedia.org/wiki/Constant_folding |
| 68 | +//! |
| 69 | +//! ## Out-of-order execution |
| 70 | +//! |
| 71 | +//! Instructions can execute in a different order from the one we define, due to |
| 72 | +//! various reasons: |
| 73 | +//! |
| 74 | +//! - **Compiler** reordering instructions: if the compiler can issue an |
| 75 | +//! instruction at an earlier point, it will try to do so. For example, it |
| 76 | +//! might hoist memory loads at the top of a code block, so that the CPU can |
| 77 | +//! start [prefetching] the values from memory. |
| 78 | +//! |
| 79 | +//! In single-threaded scenarios, this can cause issues when writing signal handlers |
| 80 | +//! or certain kinds of low-level code. |
| 81 | +//! Use [compiler fences] to prevent this reordering. |
| 82 | +//! |
| 83 | +//! - **Single processor** executing instructions [out-of-order]: modern CPUs are |
| 84 | +//! capable of [superscalar] execution, i.e. multiple instructions might be |
| 85 | +//! executing at the same time, even though the machine code describes a |
| 86 | +//! sequential process. |
| 87 | +//! |
| 88 | +//! This kind of reordering is handled transparently by the CPU. |
| 89 | +//! |
| 90 | +//! - **Multiprocessor** system, where multiple hardware threads run at the same time. |
| 91 | +//! In multi-threaded scenarios, you can use two kinds of primitives to deal |
| 92 | +//! with synchronization: |
| 93 | +//! - [memory fences] to ensure memory accesses are made visibile to other |
| 94 | +//! CPUs in the right order. |
| 95 | +//! - [atomic operations] to ensure simultaneous access to the same memory |
| 96 | +//! location doesn't lead to undefined behavior. |
| 97 | +//! |
| 98 | +//! [prefetching]: https://en.wikipedia.org/wiki/Cache_prefetching |
| 99 | +//! [compiler fences]: atomic::compiler_fence |
| 100 | +//! [out-of-order]: https://en.wikipedia.org/wiki/Out-of-order_execution |
| 101 | +//! [superscalar]: https://en.wikipedia.org/wiki/Superscalar_processor |
| 102 | +//! [memory fences]: atomic::fence |
| 103 | +//! [atomics operations]: atomic |
| 104 | +//! |
| 105 | +//! ## Higher-level synchronization objects |
| 106 | +//! |
| 107 | +//! Most of the low-level synchronization primitives are quite error-prone and |
| 108 | +//! inconvenient to use, which is why the standard library also exposes some |
| 109 | +//! higher-level synchronization objects. |
| 110 | +//! |
| 111 | +//! These abstractions can be built out of lower-level primitives. For efficiency, |
| 112 | +//! the sync objects in the standard library are usually implemented with help |
| 113 | +//! from the operating system's kernel, which is able to reschedule the threads |
| 114 | +//! while they are blocked on acquiring a lock. |
| 115 | +//! |
| 116 | +//! ## Efficiency |
| 117 | +//! |
| 118 | +//! Higher-level synchronization mechanisms are usually heavy-weight. |
| 119 | +//! While most atomic operations can execute instantaneously, acquiring a |
| 120 | +//! [`Mutex`] can involve blocking until another thread releases it. |
| 121 | +//! For [`RwLock`], while! any number of readers may acquire it without |
| 122 | +//! blocking, each writer will have exclusive access. |
| 123 | +//! |
| 124 | +//! On the other hand, communication over [channels] can provide a fairly |
| 125 | +//! high-level interface without sacrificing performance, at the cost of |
| 126 | +//! somewhat more memory. |
| 127 | +//! |
| 128 | +//! The more synchronization exists between CPUs, the smaller the performance |
| 129 | +//! gains from multithreading will be. |
| 130 | +//! |
| 131 | +//! [channels]: mpsc |
17 | 132 |
|
18 | 133 | #![stable(feature = "rust1", since = "1.0.0")]
|
19 | 134 |
|
|
0 commit comments