describe region inference and member constraints in some detail

nikomatsakis · mark-i-m · commit 472f4e83670f · 2019-06-26T11:05:58.000-05:00
diff --git a/src/SUMMARY.md b/src/SUMMARY.md
@@ -77,6 +77,8 @@
         - [MIR type checker](./borrow_check/type_check.md)
         - [Region inference](./borrow_check/region_inference.md)
             - [Constraint propagation](./borrow_check/region_inference/constraint_propagation.md)
+            - [Lifetime parameters](./borrow_check/region_inference/lifetime_parameters.md)
+            - [Member constraints](./borrow_check/region_inference/member_constraints.md)
             - [Placeholders and universes](./borrow_check/region_inference/placeholders_and_universes.md)
             - [Closure constraints](./borrow_check/region_inference/closure_constraints.md)
             - [Errror reporting](./borrow_check/region_inference/error_reporting.md)
diff --git a/src/borrow_check/region_inference.md b/src/borrow_check/region_inference.md
@@ -71,6 +71,8 @@ TODO: write about _how_ these regions are computed.
 
 [`UniversalRegions`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_mir/borrow_check/nll/universal_regions/struct.UniversalRegions.html
 
+<a name="region-variables"></a>
+
 ## Region variables
 
 The value of a region can be thought of as a **set**. This set contains all
diff --git a/src/borrow_check/region_inference/constraint_propagation.md b/src/borrow_check/region_inference/constraint_propagation.md
@@ -1,88 +1,121 @@
 # Constraint propagation
 
-The main work of the region inference is **constraint
-propagation**. This means processing the set of constraints to compute
-the final values for all the region variables.
+The main work of the region inference is **constraint propagation**,
+which is done in the [`propagate_constraints`] function.  There are
+three sorts of constraints that are used in NLL, and we'll explain how
+`propagate_constraints` works by "layering" those sorts of constraints
+on one at a time (each of them is fairly independent from the others):
 
-## Kinds of constraints
+- liveness constraints (`R live at E`), which arise from liveness;
+- outlives constraints (`R1: R2`), which arise from subtyping;
+- [member constraints][m_c] (`member R_m of [R_c...]`), which arise from impl Trait.
 
-Each kind of constraint is handled somewhat differently by the region inferencer.
+[`propagate_constraints`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_mir/borrow_check/nll/region_infer/struct.RegionInferenceContext.html#method.propagate_constraints
+[m_c]: ./member_constraints.html
+
+In this chapter, we'll explain the "heart" of constraint propagation,
+covering both liveness and outlives constraints.
+
+## Notation and high-level concepts
+
+Conceptually, region inference is a "fixed-point" computation. It is
+given some set of constraints `{C}` and it computes a set of values
+`Values: R -> {E}` that maps each region `R` to a set of elements
+`{E}` (see [here][riv] for more notes on region elements):
+
+- Initially, each region is mapped to an empty set, so `Values(R) =
+  {}` for all regions `R`.
+- Next, we process the constraints repeatedly until a fixed-point is reached:
+  - For each constraint C:
+    - Update `Values` as needed to satisfy the constraint
 
-### Liveness constraints
+[riv]: ../region-inference.html#region-variables
+
+As a simple example, if we have a liveness constraint `R live at E`,
+then we can apply `Values(R) = Values(R) union {E}` to make the
+constraint be satisfied. Similarly, if we have an outlives constraints
+`R1: R2`, we can apply `Values(R1) = Values(R1) union Values(R2)`.
+(Member constraints are more complex and we discuss them below.)
+
+In practice, however, we are a bit more clever. Instead of applying
+the constraints in a loop, we can analyze the constraints and figure
+out the correct order to apply them, so that we only have to apply
+each constraint once in order to find the final result.
+
+Similarly, in the implementation, the `Values` set is stored in the
+`scc_values` field, but they are indexed not by a *region* but by a
+*strongly connected component* (SCC). SCCs are an optimization that
+avoids a lot of redundant storage and computation.  They are explained
+in the section on outlives constraints.
+
+## Liveness constraints
 
 A **liveness constraint** arises when some variable whose type
 includes a region R is live at some point P. This simply means that
 the value of R must include the point P. Liveness constraints are
 computed by the MIR type checker.
 
-We represent them by keeping a (sparse) bitset for each region
-variable, which is the field [`liveness_constraints`], of type
-[`LivenessValues`]
+A liveness constraint `R live at E` is satisfied if `E` is a member of
+`Values(R)`. So to "apply" such a constraint to `Values`, we just have
+to compute `Values(R) = Values(R) union {E}`.
+
+The liveness values are computed in the type-check and passes to the
+region inference upon creation in the `liveness_constraints` argument.
+These are not represented as individual constraints like `R live at E`
+though; instead, we store a (sparse) bitset per region variable (of
+type [`LivenessValues`]). This way we only need a single bit for each
+liveness constraint.
 
 [`liveness_constraints`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_mir/borrow_check/nll/region_infer/struct.RegionInferenceContext.html#structfield.liveness_constraints
 [`LivenessValues`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_mir/borrow_check/nll/region_infer/values/struct.LivenessValues.html
 
-### Outlives constraints
+One thing that is worth mentioning: All lifetime parameters are always
+considered to be live over the entire function body. This is because
+they correspond to some portion of the *caller's* execution, and that
+execution clearly includes the time spent in this function, since the
+caller is waiting for us to return.
 
-An outlives constraint `'a: 'b` indicates that the value of `'a` must
-be a **superset** of the value of `'b`. On creation, we are given a
-set of outlives constraints in the form of a
-[`ConstraintSet`]. However, to work more efficiently with outlives
-constraints, they are [converted into the form of a graph][graph-fn],
-where the nodes of the graph are region variables (`'a`, `'b`) and
-each constraint `'a: 'b` induces an edge `'a -> 'b`. This conversion
-happens in the [`RegionInferenceContext::new`] function that creates
-the inference context.
-
-[`ConstraintSet`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_mir/borrow_check/nll/constraints/struct.ConstraintSet.html
-[graph-fn]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_mir/borrow_check/nll/constraints/struct.ConstraintSet.html#method.graph
-[`RegionInferenceContext::new`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_mir/borrow_check/nll/region_infer/struct.RegionInferenceContext.html#method.new
+## Outlives constraints
 
-### Member constraints
+An outlives constraint `'a: 'b` indicates that the value of `'a` must
+be a **superset** of the value of `'b`. That is, an outlives
+constraint `R1: R2` is satisfied if `Values(R1)` is a superset of
+`Values(R2)`. So to "apply" such a constraint to `Values`, we just
+have to compute `Values(R1) = Values(R1) union Values(R2)`.
 
-A member constraint `'m member of ['c_1..'c_N]` expresses that the
-region `'m` must be *equal* to some **choice regions** `'c_i` (for
-some `i`). These constraints cannot be expressed by users, but they arise
-from `impl Trait` due to its lifetime capture rules. Consinder a function
-such as the following:
+One observation that follows from this is that if you have `R1: R2`
+and `R2: R1`, then `R1 = R2` must be true. Similarly, if you have:
 
-```rust
-fn make(a: &'a u32, b: &'b u32) -> impl Trait<'a, 'b> { .. }
 ```
-
-Here, the true return type (often called the "hidden type") is only
-permitted to capture the lifeimes `'a` or `'b`. You can kind of see
-this more clearly by desugaring that `impl Trait` return type into its
-more explicit form:
-
-```rust
-type MakeReturn<'x, 'y> = impl Trait<'x, 'y>;
-fn make(a: &'a u32, b: &'b u32) -> MakeReturn<'a, 'b> { .. }
+R1: R2
+R2: R3
+R3: R4
+R4: R1
 ```
 
-Here, the idea is that the hidden type must be some type that could
-have been written in place of the `impl Trait<'x, 'y>` -- but clearly
-such a type can only reference the regions `'x` or `'y` (or
-`'static`!), as those are the only names in scope. This limitation is
-then translated into a restriction to only access `'a` or `'b` because
-we are returning `MakeReturn<'a, 'b>`, where `'x` and `'y` have been
-replaced with `'a` and `'b` respectively.
+then `R1 = R2 = R3 = R4` follows. We take advantage of this to make things
+much faster, as described shortly.
 
-## SCCs in the outlives constraint graph
+In the code, the set of outlives constraints is given to the region
+inference context on creation in a parameter of type
+[`ConstraintSet`]. The constraint set is basically just a list of `'a:
+'b` constraints.
 
-The most common sort of constraint in practice are outlives
-constraints like `'a: 'b`. Such a cosntraint means that `'a` is a
-superset of `'b`.  So what happens if we have two regions `'a` and `'b`
-that mutually outlive one another, like so?
+### The outlives constraint graph and SCCs
 
-```
-'a: 'b
-'b: 'a
-```
+In order to work more efficiently with outlives constraints, they are
+[converted into the form of a graph][graph-fn], where the nodes of the
+graph are region variables (`'a`, `'b`) and each constraint `'a: 'b`
+induces an edge `'a -> 'b`. This conversion happens in the
+[`RegionInferenceContext::new`] function that creates the inference
+context.
+
+[`ConstraintSet`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_mir/borrow_check/nll/constraints/struct.ConstraintSet.html
+[graph-fn]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_mir/borrow_check/nll/constraints/struct.ConstraintSet.html#method.graph
+[`RegionInferenceContext::new`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_mir/borrow_check/nll/region_infer/struct.RegionInferenceContext.html#method.new
 
-In this case, we can conclude that `'a` and `'b` must be equal
-sets. In fact, it doesn't have to be just two regions. We could create
-an extended "chain" of outlives constraints:
+When using a graph representation, we can detect regions that must be equal
+by looking for cycles. That is, if you have a constraint like
 
 ```
 'a: 'b
@@ -91,11 +124,8 @@ an extended "chain" of outlives constraints:
 'd: 'a
 ```
 
-Here, we know that `'a..'d` are all equal to one another.
-
-As mentioned above, an outlives constraint like `'a: 'b` can be viewed
-as an edge in a graph `'a -> 'b`. Cycles in this graph indicate regions
-that mutually outlive one another and hence must be equal. 
+then this will correspond to a cycle in the graph containing the
+elements `'a...'d`.
 
 Therefore, one of the first things that we do in propagating region
 values is to compute the **strongly connected components** (SCCs) in
@@ -138,9 +168,55 @@ superset of the value of `S1`. One crucial thing is that this graph of
 SCCs is always a DAG -- that is, it never has cycles. This is because
 all the cycles have been removed to form the SCCs themselves.
 
-## How constraint propagation works
+### Applying liveness constraints to SCCs
+
+The liveness constraints that come in from the type-checker are
+expressed in terms of regions -- that is, we have a map like
+`Liveness: R -> {E}`.  But we want our final result to be expressed
+in terms of SCCs -- we can integrate these liveness constraints very
+easily just by taking the union:
+
+```
+for each region R:
+  let S by the SCC that contains R
+  Values(S) = Values(S) union Liveness(R)
+```
+
+In the region inferencer, this step is done in [`RegionInferenceContext::new`].
+
+### Applying outlives constraints
+
+Once we have computed the DAG of SCCs, we use that to structure out
+entire computation. If we have an edge `S1 -> S2` between two SCCs,
+that means that `Values(S1) >= Values(S2)` must hold. So, to compute
+the value of `S1`, we first compute the values of each successor `S2`.
+Then we simply union all of those values together. To use a
+quasi-iterator-like notation:
+
+```
+Values(S1) =
+  s1.successors()
+    .map(|s2| Values(s2))
+    .union()
+```
+
+In the code, this work starts in the [`propagate_constraints`]
+function, which iterates over all the SCCs. For each SCC S1, we
+compute its value by first computing the value of its
+successors. Since SCCs form a DAG, we don't have to be conecrned about
+cycles, though we do need to keep a set around to track whether we
+have already processed a given SCC or not. For each successor S2, once
+we have computed S2's value, we can union those elements into the
+value for S1. (Although we have to be careful in this process to
+properly handle [higher-ranked
+placeholders](./placeholders_and_universes.html). Note that the value
+for S1 already contains the liveness constraints, since they were
+added in [`RegionInferenceContext::new`].
+
+Once that process is done, we now have the "minimal value" for S1,
+taking into account all of the liveness and outlives
+constraints. However, in order to complete the process, we must also
+consider [member constraints][m_c], which are described in [a later
+section][m_c].
 
-The main work of constraint propagation is done in the
-`propagation_constraints` function.
 
-[`propagate_constraints`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_mir/borrow_check/nll/region_infer/struct.RegionInferenceContext.html#method.propagate_constraints
diff --git a/src/borrow_check/region_inference/lifetime_parameters.md b/src/borrow_check/region_inference/lifetime_parameters.md
@@ -0,0 +1,125 @@
+# Universal regions
+
+"Universal regions" is the name that the code uses to refer to "named
+lifetimes" -- e.g., lifetime parameters and `'static`. The name
+derives from the fact that such lifetimes are "universally quantified"
+(i.e., we must make sure the code is true for all values of those
+lifetimes). It is worth spending a bit of discussing how lifetime
+parameters are handled during region inference. Consider this example:
+
+```rust
+fn foo<'a, 'b>(x: &'a u32, y: &'b u32) -> &'b u32 {
+  x
+}
+```
+
+This example is intended not to compile, because we are returning `x`,
+which has type `&'a u32`, but our signature promises that we will
+return a `&'b u32` value. But how are lifetimes like `'a` and `'b
+integrated into region inference, and how this error wind up being
+detected?
+
+## Universal regions and their relationships to one another
+
+Early on in region inference, one of the first things we do is to
+construct a [`UniversalRegions`] struct. This struct tracks the
+various universal regions in scope on a particular function.  We also
+create a [`UniversalRegionRelations`] struct, which tracks their
+relationships to one another. So if you have e.g. `where 'a: 'b`, then
+the [`UniversalRegionRelations`] struct would track that `'a: 'b` is
+known to hold (which could be tested with the [`outlives`] function.
+
+[`UniversalRegions`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_mir/borrow_check/nll/universal_regions/struct.UniversalRegions.html
+[`UniversalRegionsRelations`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_mir/borrow_check/nll/type_check/free_region_relations/struct.UniversalRegionRelations.html
+[`outlives`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_mir/borrow_check/nll/type_check/free_region_relations/struct.UniversalRegionRelations.html#method.outlives
+
+## Everything is a region variable
+
+One important aspect of how NLL region inference works is that **all
+lifetimes** are represented as numbered variables. This means that the
+only variant of [`ty::RegionKind`] that we use is the [`ReVar`]
+variant. These region variables are broken into two major categories,
+based on their index:
+
+[`ty::RegionKind`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc/ty/enum.RegionKind.html
+[`ReVar`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc/ty/enum.RegionKind.html#variant.ReVar
+
+- 0..N: universal regions -- the ones we are discussing here. In this
+  case, the code must be correct with respect to any value of those
+  variables that meets the declared relationships.
+- N..M: existential regions -- inference variables where the region
+  inferencer is tasked with finding *some* suitable value.
+
+In fact, the universal regions can be further subdivided based on
+where they were brought into scope (see the [`RegionClassification`]
+type). These subdivions are not important for the topics discussed
+here, but become important when we consider [closure constraint
+propagation](./closure_constraints.html), so we discuss them there.
+
+[`RegionClassification`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_mir/borrow_check/nll/universal_regions/enum.RegionClassification.html#variant.Local
+
+## Universal lifetimes as the elements of a region's value
+
+As noted previously, the value that we infer for each region is a set
+`{E}`. The elements of this set can be points in the control-flow
+graph, but they can also be an element `end('a)` corresponding to each
+universal lifetime `'a`. If the value for some region `R0` includes
+`end('a`), then this implies that R0 must extend until the end of `'a`
+in the caller.
+
+## The "value" of a universal region
+
+During region inference, we compute a value for each universal region
+in the same way as we compute values for other regions. This value
+represents, effectively, the **lower bound** on that universal region
+-- the things that it must outlive. We now describe how we use this
+value to check for errors.
+
+## Liveness and universal regions
+
+All universal regions have an initial liveness constraint that
+includes the entire function body. This is because lifetime parameters
+are defined in the caller and must include the entirety of the
+function call that invokes this particular function. In addition, each
+universal region `'a` includes itself (that is, `end('a)`) in its
+liveness constraint (i.e., `'a` must extend until the end of
+itself). In the code, these liveness constraints are setup in
+[`init_free_and_bound_regions`].
+
+[`init_free_and_bound_regions`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_mir/borrow_check/nll/region_infer/struct.RegionInferenceContext.html#method.init_free_and_bound_regions
+
+## Propagating outlives constraints for universal regions
+
+So, consider the first example of this section:
+
+```rust
+fn foo<'a, 'b>(x: &'a u32, y: &'b u32) -> &'b u32 {
+  x
+}
+```
+
+Here, returning `x` requires that `&'a u32 <: &'b u32`, which gives
+rise to an outlives constraint `'a: 'b`. Combined with our default liveness
+constraints we get:
+
+```
+'a live at {B, end('a)} // B represents the "bunction body"
+'b live at {B, end('b)}
+'a: 'b
+```
+
+When we process the `'a: 'b` constraint, therefore, we will add
+`end('b)` into the value for `'a`, resulting in a final value of `{B,
+end('a), end('b)}`.
+
+## Detecting errors
+
+Once we have finished constraint propagation, we then enforce a
+constraint that if some universal region `'a` includes an element
+`end('b)`, then `'a: 'b` must be declared in the function's bounds. If
+not, as in our example, that is an error. This check is done in the
+[`check_universal_regions`] function, which simply iterates over all
+universal regions, inspects their final value, and tests against the
+declared [`UniversalRegionRelations`].
+
+[`check_universal_regions`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_mir/borrow_check/nll/region_infer/struct.RegionInferenceContext.html#method.check_universal_regions
diff --git a/src/borrow_check/region_inference/member_constraints.md b/src/borrow_check/region_inference/member_constraints.md