Expand the documentation for the std::sync module

GabrielMajeri · GabrielMajeri · commit f5e991bee011 · 2018-09-27T20:13:32.000+03:00
Provides an overview on why synchronization is required,
as well a short summary of what sync primitives are available.
diff --git a/src/libstd/sync/mod.rs b/src/libstd/sync/mod.rs
@@ -10,10 +10,125 @@
 
 //! Useful synchronization primitives.
 //!
-//! This module contains useful safe and unsafe synchronization primitives.
-//! Most of the primitives in this module do not provide any sort of locking
-//! and/or blocking at all, but rather provide the necessary tools to build
-//! other types of concurrent primitives.
+//! ## The need for synchronization
+//!
+//! On an ideal single-core CPU, the timeline of events happening in a program
+//! is linear, consistent with the order of operations in the code.
+//!
+//! Considering the following code, operating on some global static variables:
+//!
+//! ```rust
+//! # static mut A: u32 = 0;
+//! # static mut B: u32 = 0;
+//! # static mut C: u32 = 0;
+//! # unsafe {
+//! A = 3;
+//! B = 4;
+//! A = A + B;
+//! C = B;
+//! println!("{} {} {}", A, B, C);
+//! C = A;
+//! # }
+//! ```
+//!
+//! It appears _as if_ some variables stored in memory are changed, an addition
+//! is performed, result is stored in A and the variable C is modified twice.
+//! When only a single thread is involved, the results are as expected:
+//! the line `7 4 4` gets printed.
+//!
+//! As for what happens behind the scenes, when an optimizing compiler is used
+//! the final generated machine code might look very different from the code:
+//!
+//! - first store to `C` might be moved before the store to `A` or `B`,
+//!   _as if_ we had written `C = 4; A = 3; B = 4;`
+//!
+//! - last store to `C` might be removed, since we never read from it again.
+//!
+//! - assignment of `A + B` to `A` might be removed, the sum can be stored in a
+//!   in a register until it gets printed, and the global variable never gets
+//!   updated.
+//!
+//! - the final result could be determined just by looking at the code at compile time,
+//!   so [constant folding] might turn the whole block into a simple `println!("7 4 4")`
+//!
+//! The compiler is allowed to perform any combination of these optimizations, as long
+//! as the final optimized code, when executed, produces the same results as the one
+//! without optimizations.
+//!
+//! When multiprocessing is involved (either multiple CPU cores, or multiple
+//! physical CPUs), access to global variables (which are shared between threads)
+//! could lead to nondeterministic results, **even if** compiler optimizations
+//! are disabled.
+//!
+//! Note that thanks to Rust's safety guarantees, accessing global (static)
+//! variables requires `unsafe` code, assuming we don't use any of the
+//! synchronization primitives in this module.
+//!
+//! [constant folding]: https://en.wikipedia.org/wiki/Constant_folding
+//!
+//! ## Out-of-order execution
+//!
+//! Instructions can execute in a different order from the one we define, due to
+//! various reasons:
+//!
+//! - **Compiler** reordering instructions: if the compiler can issue an
+//!   instruction at an earlier point, it will try to do so. For example, it
+//!   might hoist memory loads at the top of a code block, so that the CPU can
+//!   start [prefetching] the values from memory.
+//!
+//!   In single-threaded scenarios, this can cause issues when writing signal handlers
+//!   or certain kinds of low-level code.
+//!   Use [compiler fences] to prevent this reordering.
+//!
+//! - **Single processor** executing instructions [out-of-order]: modern CPUs are
+//!   capable of [superscalar] execution, i.e. multiple instructions might be
+//!   executing at the same time, even though the machine code describes a
+//!   sequential process.
+//!
+//!   This kind of reordering is handled transparently by the CPU.
+//!
+//! - **Multiprocessor** system, where multiple hardware threads run at the same time.
+//!   In multi-threaded scenarios, you can use two kinds of primitives to deal
+//!   with synchronization:
+//!   - [memory fences] to ensure memory accesses are made visibile to other
+//!     CPUs in the right order.
+//!   - [atomic operations] to ensure simultaneous access to the same memory
+//!     location doesn't lead to undefined behavior.
+//!
+//! [prefetching]: https://en.wikipedia.org/wiki/Cache_prefetching
+//! [compiler fences]: atomic::compiler_fence
+//! [out-of-order]: https://en.wikipedia.org/wiki/Out-of-order_execution
+//! [superscalar]: https://en.wikipedia.org/wiki/Superscalar_processor
+//! [memory fences]: atomic::fence
+//! [atomics operations]: atomic
+//!
+//! ## Higher-level synchronization objects
+//!
+//! Most of the low-level synchronization primitives are quite error-prone and
+//! inconvenient to use, which is why the standard library also exposes some
+//! higher-level synchronization objects.
+//!
+//! These abstractions can be built out of lower-level primitives. For efficiency,
+//! the sync objects in the standard library are usually implemented with help
+//! from the operating system's kernel, which is able to reschedule the threads
+//! while they are blocked on acquiring a lock.
+//!
+//! ## Efficiency
+//!
+//! Higher-level synchronization mechanisms are usually heavy-weight.
+//! While most atomic operations can execute instantaneously, acquiring a
+//! [`Mutex`] can involve blocking until another thread releases it.
+//! For [`RwLock`], while! any number of readers may acquire it without
+//! blocking, each writer will have exclusive access.
+//!
+//! On the other hand, communication over [channels] can provide a fairly
+//! high-level interface without sacrificing performance, at the cost of
+//! somewhat more memory.
+//!
+//! The more synchronization exists between CPUs, the smaller the performance
+//! gains from multithreading will be.
+//!
+//! [channels]: mpsc
 
 #![stable(feature = "rust1", since = "1.0.0")]