Update 1211-mir.md

oliver · web-flow · commit 92592d580ac7 · 2021-08-31T21:08:36.000Z
looked at those references to comments which imply LLVM is the only option and added formatting for clarity around trans and borrowck
diff --git a/text/1211-mir.md b/text/1211-mir.md
@@ -12,7 +12,7 @@ well-suited to type-checking and translation.
 # Motivation
 
 The current compiler uses a single AST from the initial parse all the
-way to the final generation of LLVM. While this has some advantages,
+way to the final generation of bitcode. While this has some advantages,
 there are also a number of distinct downsides. 
 
 1. The complexity of the compiler is increased because all passes must
@@ -49,19 +49,19 @@ there are also a number of distinct downsides.
    
 3. The reliability of safety analyses is reduced because the gap
    between what is being analyzed (the AST) and what is being executed
-   (LLVM bitcode) is very wide. The MIR is very low-level and hence the
-   translation to LLVM should be straightforward.
+   (bitcode) is very wide. The MIR is very low-level and hence the
+   translation to bitcode should be straightforward.
    
 4. The reliability of safety proofs, when we have some, would be
    reduced because the formal language we are modeling is so far from
    the full compiler AST. The MIR is simple enough that it should be
    possible to (eventually) make safety proofs based on the MIR
    itself.
 
-5. Rust-specific optimizations, and optimizing trans output, are very
+5. Rust-specific optimizations, and optimizing `trans` output, are very
    challenging. There are numerous cases where it would be nice to be
-   able to do optimizations *before* translating to LLVM bitcode, or
-   to take advantage of Rust-specific knowledge of which LLVM is
+   able to do optimizations *before* translating to bitcode, or
+   to take advantage of Rust-specific knowledge of which a backend may be
    unaware. Currently, we are forced to do these optimizations as part
    of lowering to bitcode, which can get quite complex. Having an
    intermediate form improves the situation because:
@@ -72,11 +72,9 @@ there are also a number of distinct downsides.
       would be safe.
    
    c. In all cases, whatever we can do on the MIR will be helpful for other
-      targets beyond LLVM (see next bullet).
+      targets beyond existing backends (see next bullet).
       
-6. Migrating away from LLVM is nearly impossible. In the future, it
-   may be advantageous to provide a choice of backends beyond
-   LLVM. Currently though this is infeasible, since so much of the
+6. Migrating away from LLVM is nearly impossible. Since so much of the
    semantics of Rust itself are embedded in the `trans` step which
    converts to LLVM IR. Under the MIR design, those semantics are
    instead described in the translation from AST to MIR, and the LLVM
@@ -155,8 +153,8 @@ using a MIR:
 - **borrow and move checking**: the borrow checker already uses a
   combination of the CFG and `ExprUseVisitor` to try and achieve a
   similarly low-level of detail.
-- **translation to LLVM IR**: the MIR is much closer than the AST to
-  the desired end-product.
+- **translation to IR**: the MIR is much closer than the AST to
+  the desired bitcode end-product.
   
 Some other passes would probably work equally well on the MIR or an
 AST, but they will likely find the MIR somewhat easier to work with
@@ -442,7 +440,7 @@ fact not be the correct representation.
 
 In any case, after the move and correctness checking is done, it is
 easy enough to remove these aggregate rvalues and replace them with
-assignments. This could potentially be done during LLVM lowering, or
+assignments. This could potentially be done during lowering, or
 as a pre-pass that transforms MIR statements like:
 
     x = ...x;
@@ -469,13 +467,12 @@ within a single assignment (and nowhere else):
 Going further, once type-checking is done, it is plausible to do
 further lowering within the MIR purely for optimization purposes. For
 example, we could introduce intermediate references to cache the
-results of common lvalue computations and so forth. This may well be
-better left to LLVM (or at least to the lowering pass).
+results of common lvalue computations and so forth.
 
 ### Bounds checking
 
 Because bounds checks are fallible, it's important to encode them in
-the MIR whenever we do indexing. Otherwise the trans code would have
+the MIR whenever we do indexing. Otherwise the `trans` code would have
 to figure out on its own how to do unwinding at that point. Because
 the MIR doesn't "desugar" fat pointers, we include a special rvalue
 `LEN` that extracts the length from an array value whose type matches
@@ -602,8 +599,7 @@ that instrumentation will be done as needed to prevent double
 drops. Currently, this signaling is done by zeroing out memory at
 runtime, but we are in the process of introducing stack flags for this
 purpose: the MIR offers the opportunity to reify those flags if we
-wanted, and rewrite drops to be more narrow (versus leaving that work
-for LLVM).
+wanted, and rewrite drops to be more narrow.
 
 To illustrate how drop works, let's work through a simple
 example. Imagine that we have a snippet of code like:
@@ -686,7 +682,7 @@ scoping information should go away.
 
 ### Monomorphization
 
-Currently, we do monomorphization at LLVM translation time. If we ever
+Currently, we do monomorphization at translation time. If we ever
 chose to do it at a MIR level, that would be fine, but one thing to be
 careful of is that we may be able to elide `Drop` nodes based on the
 specific types.
@@ -721,7 +717,7 @@ check.
 
 **Converting from AST to a MIR will take some compilation time.**
 Expectations are that constructing the MIR will be quite fast, and
-that follow-on code (such as trans and borrowck) will execute faster,
+that follow-on code (such as `trans` and `borrowck`) will execute faster,
 because they will operate over a simpler and more compact
 representation. However, this needs to be measured.
 
@@ -747,9 +743,8 @@ versus having the stack modeled as allocas. The current model is also
 helpful for generating debuginfo.
 
 SSA representation can be helpful for more sophisticated backend
-optimizations. However, we tend to leave those optimizations to LLVM,
-and hence it makes more sense to have the MIR be based on lvalues
-instead. There are some cases where it might make sense to do analyses
+optimizations. However, it makes more sense to have the MIR be based on 
+lvalues. There are some cases where it might make sense to do analyses
 on the MIR that would benefit from SSA, such as bounds check elision.
 In those cases, we could either quickly identify those temporaries
 that are not mutably borrowed (and which therefore act like SSA
@@ -760,7 +755,7 @@ both with and without SSA, honestly.)
 
 **Exclude unwinding.** Excluding unwinding from the MIR would allow us
 to elide annoying details like bounds and overflow checking. These are
-not particularly interesting to borrowck, so that is somewhat
+not particularly interesting to `borrowck`, so that is somewhat
 appealing. But that would mean that consumers of MIR would have to
 reconstruct the order of drops and so forth on unwinding paths, which
 would require them reasoning about scopes and other rather complex
@@ -786,8 +781,8 @@ which desugars to a temporary and a constant reference:
     x = tmp0(tmp1)
     
 There is no particular *harm* in such constants: it would be very easy
-to optimize them away when reducing to LLVM bitcode, and if we do not
-do so, LLVM will do it. However, we could also expand the scope of
+to optimize them away when reducing to bitcode, and if we do not
+do so, a backend may do it. However, we could also expand the scope of
 operands to include both lvalues and some simple rvalues like
 constants. The main advantage of this is that it would reduce the
 total number of statements and hence might help with memory