Skip to content
This repository was archived by the owner on May 28, 2025. It is now read-only.

Commit 2a541ce

Browse files
committed
Completely rework the explanation of the algorithm
1 parent 43d445c commit 2a541ce

File tree

2 files changed

+324
-285
lines changed

2 files changed

+324
-285
lines changed

compiler/rustc_mir_build/src/thir/pattern/deconstruct_pat.rs

Lines changed: 113 additions & 51 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,47 @@
1-
//! This module provides functions to deconstruct and reconstruct patterns into a constructor
2-
//! applied to some fields. This is used by the `_match` module to compute pattern
3-
//! usefulness/exhaustiveness.
1+
//! [`super::usefulness`] explains most of what is happening in this file. As explained there,
2+
//! values and patterns are made from constructors applied to fields. This file defines a
3+
//! `Constructor` enum, a `Fields` struct, and various operations to manipulate them and convert
4+
//! them from/to patterns.
5+
//!
6+
//! There's one idea that is not detailed in [`super::usefulness`] because the details are not
7+
//! needed there: _constructor splitting_.
8+
//!
9+
//! # Constructor splitting
10+
//!
11+
//! The idea is as follows: given a constructor `c` and a matrix, we want to specialize in turn
12+
//! with all the value constructors that are covered by `c`, and compute usefulness for each.
13+
//! Instead of listing all those constructors (which is intractable), we group those value
14+
//! constructors together as much as possible. Example:
15+
//!
16+
//! ```
17+
//! match (0, false) {
18+
//! (0 ..=100, true) => {} // `p_1`
19+
//! (50..=150, false) => {} // `p_2`
20+
//! (0 ..=200, _) => {} // `q`
21+
//! }
22+
//! ```
23+
//!
24+
//! The naive approach would try all numbers in the range `0..=200`. But we can be a lot more
25+
//! clever: `0` and `1` for example will match the exact same rows, and return equivalent
26+
//! witnesses. In fact all of `0..50` would. We can thus restrict our exploration to 4
27+
//! constructors: `0..50`, `50..=100`, `101..=150` and `151..=200`. That is enough and infinitely
28+
//! more tractable.
29+
//!
30+
//! We capture this idea in a function `split(p_1 ... p_n, c)` which returns a list of constructors
31+
//! `c'` covered by `c`. Given such a `c'`, we require that all value ctors `c''` covered by `c'`
32+
//! return an equivalent set of witnesses after specializing and computing usefulness.
33+
//! In the example above, witnesses for specializing by `c''` covered by `0..50` will only differ
34+
//! in their first element.
35+
//!
36+
//! We usually also ask that the `c'` together cover all of the original `c`. However we allow
37+
//! skipping some constructors as long as it doesn't change whether the resulting list of witnesses
38+
//! is empty of not. We use this in the wildcard `_` case.
39+
//!
40+
//! Splitting is implemented in the [`Constructor::split`] function. We don't do splitting for
41+
//! or-patterns; instead we just try the alternatives one-by-one. For details on splitting
42+
//! wildcards, see [`SplitWildcard`]; for integer ranges, see [`SplitIntRange`]; for slices, see
43+
//! [`SplitVarLenSlice`].
44+
445
use self::Constructor::*;
546
use self::SliceKind::*;
647

@@ -260,13 +301,13 @@ enum IntBorder {
260301
AfterMax,
261302
}
262303

263-
/// A range of integers that is partitioned into disjoint subranges.
264-
///
265-
/// This is fed an input of multiple ranges, and returns an output that covers the union of the
266-
/// inputs but is split so that an output range only intersects an input range by being a subrange
267-
/// of it. No output range straddles the boundary of one of the inputs. This does constructor
304+
/// A range of integers that is partitioned into disjoint subranges. This does constructor
268305
/// splitting for integer ranges as explained at the top of the file.
269306
///
307+
/// This is fed multiple ranges, and returns an output that covers the input, but is split so that
308+
/// the only intersections between an output range and a seen range are inclusions. No output range
309+
/// straddles the boundary of one of the inputs.
310+
///
270311
/// The following input:
271312
/// ```
272313
/// |-------------------------| // `self`
@@ -405,54 +446,67 @@ impl Slice {
405446
}
406447
}
407448

408-
/// The exhaustiveness-checking paper does not include any details on checking variable-length
409-
/// slice patterns. However, they may be matched by an infinite collection of fixed-length array
410-
/// patterns.
411-
///
412-
/// Checking the infinite set directly would take an infinite amount of time. However, it turns out
413-
/// that for each finite set of patterns `P`, all sufficiently large array lengths are equivalent:
414-
///
415-
/// Each slice `s` with a "sufficiently-large" length `l ≥ L` that applies to exactly the subset
416-
/// `Pₜ` of `P` can be transformed to a slice `sₘ` for each sufficiently-large length `m` that
417-
/// applies to exactly the same subset of `P`.
418-
///
419-
/// Because of that, each witness for reachability-checking of one of the sufficiently-large
420-
/// lengths can be transformed to an equally-valid witness of any other length, so we only have to
421-
/// check slices of the "minimal sufficiently-large length" and less.
422-
///
423-
/// Note that the fact that there is a *single* `sₘ` for each `m` not depending on the specific
424-
/// pattern in `P` is important: if you look at the pair of patterns `[true, ..]` `[.., false]`
425-
/// Then any slice of length ≥1 that matches one of these two patterns can be trivially turned to a
426-
/// slice of any other length ≥1 that matches them and vice-versa, but the slice of length 2
427-
/// `[false, true]` that matches neither of these patterns can't be turned to a slice from length 1
428-
/// that matches neither of these patterns, so we have to consider slices from length 2 there.
449+
/// This computes constructor splitting for variable-length slices, as explained at the top of the
450+
/// file.
429451
///
430-
/// Now, to see that that length exists and find it, observe that slice patterns are either
431-
/// "fixed-length" patterns (`[_, _, _]`) or "variable-length" patterns (`[_, .., _]`).
452+
/// A slice pattern `[x, .., y]` behaves like the infinite or-pattern `[x, y] | [x, _, y] | [x, _,
453+
/// _, y] | ...`. The corresponding value constructors are fixed-length array constructors above a
454+
/// given minimum length. We obviously can't list all of this infinity of constructors. Thankfully,
455+
/// it turns out that for each finite set of slice patterns, all sufficiently large array lengths
456+
/// are equivalent.
432457
///
433-
/// For fixed-length patterns, all slices with lengths *longer* than the pattern's length have the
434-
/// same outcome (of not matching), so as long as `L` is greater than the pattern's length we can
435-
/// pick any `sₘ` from that length and get the same result.
458+
/// Let's look at an example, where we are trying to split the last pattern:
459+
/// ```
460+
/// match x {
461+
/// [true, true, ..] => {}
462+
/// [.., false, false] => {}
463+
/// [..] => {}
464+
/// }
465+
/// ```
466+
/// Here are the results of specialization for the first few lengths:
467+
/// ```
468+
/// // length 0
469+
/// [] => {}
470+
/// // length 1
471+
/// [_] => {}
472+
/// // length 2
473+
/// [true, true] => {}
474+
/// [false, false] => {}
475+
/// [_, _] => {}
476+
/// // length 3
477+
/// [true, true, _ ] => {}
478+
/// [_, false, false] => {}
479+
/// [_, _, _ ] => {}
480+
/// // length 4
481+
/// [true, true, _, _ ] => {}
482+
/// [_, _, false, false] => {}
483+
/// [_, _, _, _ ] => {}
484+
/// // length 5
485+
/// [true, true, _, _, _ ] => {}
486+
/// [_, _, _, false, false] => {}
487+
/// [_, _, _, _, _ ] => {}
488+
/// ```
436489
///
437-
/// For variable-length patterns, the situation is more complicated, because as seen above the
438-
/// precise value of `sₘ` matters.
490+
/// If we went above length 5, we would simply be inserting more columns full of wildcards in the
491+
/// middle. This means that the set of witnesses for length `l >= 5` if equivalent to the set for
492+
/// any other `l' >= 5`: simply add or remove wildcards in the middle to convert between them.
439493
///
440-
/// However, for each variable-length pattern `p` with a prefix of length `plₚ` and suffix of
441-
/// length `slₚ`, only the first `plₚ` and the last `slₚ` elements are examined.
494+
/// This applies to any set of slice patterns: there will be a length `L` above which all length
495+
/// behave the same. This is exactly what we need for constructor splitting. Therefore a
496+
/// variable-length slice can be split into a variable-length slice of minimal length `L`, and many
497+
/// fixed-length slices of lengths `< L`.
442498
///
443-
/// Therefore, as long as `L` is positive (to avoid concerns about empty types), all elements after
444-
/// the maximum prefix length and before the maximum suffix length are not examined by any
445-
/// variable-length pattern, and therefore can be added/removed without affecting them - creating
446-
/// equivalent patterns from any sufficiently-large length.
499+
/// For each variable-length pattern `p` with a prefix of length `plₚ` and suffix of length `slₚ`,
500+
/// only the first `plₚ` and the last `slₚ` elements are examined. Therefore, as long as `L` is
501+
/// positive (to avoid concerns about empty types), all elements after the maximum prefix length
502+
/// and before the maximum suffix length are not examined by any variable-length pattern, and
503+
/// therefore can be added/removed without affecting them - creating equivalent patterns from any
504+
/// sufficiently-large length.
447505
///
448506
/// Of course, if fixed-length patterns exist, we must be sure that our length is large enough to
449507
/// miss them all, so we can pick `L = max(max(FIXED_LEN)+1, max(PREFIX_LEN) + max(SUFFIX_LEN))`
450508
///
451509
/// `max_slice` below will be made to have arity `L`.
452-
///
453-
/// For example, with the above pair of patterns, all elements but the first and last can be
454-
/// added/removed, so any witness of length ≥2 (say, `[false, false, true]`) can be turned to a
455-
/// witness from any other length ≥2.
456510
#[derive(Debug)]
457511
struct SplitVarLenSlice {
458512
/// If the type is an array, this is its size.
@@ -787,11 +841,19 @@ impl<'tcx> Constructor<'tcx> {
787841

788842
/// A wildcard constructor that we split relative to the constructors in the matrix, as explained
789843
/// at the top of the file.
790-
/// For splitting wildcards, there are two groups of constructors: there are the constructors
791-
/// actually present in the matrix (`matrix_ctors`), and the constructors not present. Two
792-
/// constructors that are not in the matrix will either both be covered (by a wildcard), or both
793-
/// not be covered by any given row. Therefore we can keep the missing constructors grouped
794-
/// together.
844+
///
845+
/// A constructor that is not present in the matrix rows will only be covered by the rows that have
846+
/// wildcards. Thus we can group all of those constructors together; we call them "missing
847+
/// constructors". Splitting a wildcard would therefore list all present constructors individually
848+
/// (or grouped if they are integers or slices), and then all missing constructors together as a
849+
/// group.
850+
///
851+
/// However we can go further: since any constructor will match the wildcard rows, and having more
852+
/// rows can only reduce the amount of usefulness witnesses, we can skip the present constructors
853+
/// and only try the missing ones.
854+
/// This will not preserve the whole list of witnesses, but will preserve whether the list is empty
855+
/// or not. In fact this is quite natural from the point of view of diagnostics too. This is done
856+
/// in `to_ctors`: in some cases we only return `Missing`.
795857
#[derive(Debug)]
796858
pub(super) struct SplitWildcard<'tcx> {
797859
/// Constructors seen in the matrix.

0 commit comments

Comments
 (0)