Skip to content

Commit 92592d5

Browse files
author
oliver
authored
Update 1211-mir.md
looked at those references to comments which imply LLVM is the only option and added formatting for clarity around trans and borrowck
1 parent d373fff commit 92592d5

File tree

1 file changed

+21
-26
lines changed

1 file changed

+21
-26
lines changed

text/1211-mir.md

Lines changed: 21 additions & 26 deletions
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,7 @@ well-suited to type-checking and translation.
1212
# Motivation
1313

1414
The current compiler uses a single AST from the initial parse all the
15-
way to the final generation of LLVM. While this has some advantages,
15+
way to the final generation of bitcode. While this has some advantages,
1616
there are also a number of distinct downsides.
1717

1818
1. The complexity of the compiler is increased because all passes must
@@ -49,19 +49,19 @@ there are also a number of distinct downsides.
4949

5050
3. The reliability of safety analyses is reduced because the gap
5151
between what is being analyzed (the AST) and what is being executed
52-
(LLVM bitcode) is very wide. The MIR is very low-level and hence the
53-
translation to LLVM should be straightforward.
52+
(bitcode) is very wide. The MIR is very low-level and hence the
53+
translation to bitcode should be straightforward.
5454

5555
4. The reliability of safety proofs, when we have some, would be
5656
reduced because the formal language we are modeling is so far from
5757
the full compiler AST. The MIR is simple enough that it should be
5858
possible to (eventually) make safety proofs based on the MIR
5959
itself.
6060

61-
5. Rust-specific optimizations, and optimizing trans output, are very
61+
5. Rust-specific optimizations, and optimizing `trans` output, are very
6262
challenging. There are numerous cases where it would be nice to be
63-
able to do optimizations *before* translating to LLVM bitcode, or
64-
to take advantage of Rust-specific knowledge of which LLVM is
63+
able to do optimizations *before* translating to bitcode, or
64+
to take advantage of Rust-specific knowledge of which a backend may be
6565
unaware. Currently, we are forced to do these optimizations as part
6666
of lowering to bitcode, which can get quite complex. Having an
6767
intermediate form improves the situation because:
@@ -72,11 +72,9 @@ there are also a number of distinct downsides.
7272
would be safe.
7373

7474
c. In all cases, whatever we can do on the MIR will be helpful for other
75-
targets beyond LLVM (see next bullet).
75+
targets beyond existing backends (see next bullet).
7676

77-
6. Migrating away from LLVM is nearly impossible. In the future, it
78-
may be advantageous to provide a choice of backends beyond
79-
LLVM. Currently though this is infeasible, since so much of the
77+
6. Migrating away from LLVM is nearly impossible. Since so much of the
8078
semantics of Rust itself are embedded in the `trans` step which
8179
converts to LLVM IR. Under the MIR design, those semantics are
8280
instead described in the translation from AST to MIR, and the LLVM
@@ -155,8 +153,8 @@ using a MIR:
155153
- **borrow and move checking**: the borrow checker already uses a
156154
combination of the CFG and `ExprUseVisitor` to try and achieve a
157155
similarly low-level of detail.
158-
- **translation to LLVM IR**: the MIR is much closer than the AST to
159-
the desired end-product.
156+
- **translation to IR**: the MIR is much closer than the AST to
157+
the desired bitcode end-product.
160158

161159
Some other passes would probably work equally well on the MIR or an
162160
AST, but they will likely find the MIR somewhat easier to work with
@@ -442,7 +440,7 @@ fact not be the correct representation.
442440

443441
In any case, after the move and correctness checking is done, it is
444442
easy enough to remove these aggregate rvalues and replace them with
445-
assignments. This could potentially be done during LLVM lowering, or
443+
assignments. This could potentially be done during lowering, or
446444
as a pre-pass that transforms MIR statements like:
447445

448446
x = ...x;
@@ -469,13 +467,12 @@ within a single assignment (and nowhere else):
469467
Going further, once type-checking is done, it is plausible to do
470468
further lowering within the MIR purely for optimization purposes. For
471469
example, we could introduce intermediate references to cache the
472-
results of common lvalue computations and so forth. This may well be
473-
better left to LLVM (or at least to the lowering pass).
470+
results of common lvalue computations and so forth.
474471

475472
### Bounds checking
476473

477474
Because bounds checks are fallible, it's important to encode them in
478-
the MIR whenever we do indexing. Otherwise the trans code would have
475+
the MIR whenever we do indexing. Otherwise the `trans` code would have
479476
to figure out on its own how to do unwinding at that point. Because
480477
the MIR doesn't "desugar" fat pointers, we include a special rvalue
481478
`LEN` that extracts the length from an array value whose type matches
@@ -602,8 +599,7 @@ that instrumentation will be done as needed to prevent double
602599
drops. Currently, this signaling is done by zeroing out memory at
603600
runtime, but we are in the process of introducing stack flags for this
604601
purpose: the MIR offers the opportunity to reify those flags if we
605-
wanted, and rewrite drops to be more narrow (versus leaving that work
606-
for LLVM).
602+
wanted, and rewrite drops to be more narrow.
607603

608604
To illustrate how drop works, let's work through a simple
609605
example. Imagine that we have a snippet of code like:
@@ -686,7 +682,7 @@ scoping information should go away.
686682

687683
### Monomorphization
688684

689-
Currently, we do monomorphization at LLVM translation time. If we ever
685+
Currently, we do monomorphization at translation time. If we ever
690686
chose to do it at a MIR level, that would be fine, but one thing to be
691687
careful of is that we may be able to elide `Drop` nodes based on the
692688
specific types.
@@ -721,7 +717,7 @@ check.
721717

722718
**Converting from AST to a MIR will take some compilation time.**
723719
Expectations are that constructing the MIR will be quite fast, and
724-
that follow-on code (such as trans and borrowck) will execute faster,
720+
that follow-on code (such as `trans` and `borrowck`) will execute faster,
725721
because they will operate over a simpler and more compact
726722
representation. However, this needs to be measured.
727723

@@ -747,9 +743,8 @@ versus having the stack modeled as allocas. The current model is also
747743
helpful for generating debuginfo.
748744

749745
SSA representation can be helpful for more sophisticated backend
750-
optimizations. However, we tend to leave those optimizations to LLVM,
751-
and hence it makes more sense to have the MIR be based on lvalues
752-
instead. There are some cases where it might make sense to do analyses
746+
optimizations. However, it makes more sense to have the MIR be based on
747+
lvalues. There are some cases where it might make sense to do analyses
753748
on the MIR that would benefit from SSA, such as bounds check elision.
754749
In those cases, we could either quickly identify those temporaries
755750
that are not mutably borrowed (and which therefore act like SSA
@@ -760,7 +755,7 @@ both with and without SSA, honestly.)
760755

761756
**Exclude unwinding.** Excluding unwinding from the MIR would allow us
762757
to elide annoying details like bounds and overflow checking. These are
763-
not particularly interesting to borrowck, so that is somewhat
758+
not particularly interesting to `borrowck`, so that is somewhat
764759
appealing. But that would mean that consumers of MIR would have to
765760
reconstruct the order of drops and so forth on unwinding paths, which
766761
would require them reasoning about scopes and other rather complex
@@ -786,8 +781,8 @@ which desugars to a temporary and a constant reference:
786781
x = tmp0(tmp1)
787782

788783
There is no particular *harm* in such constants: it would be very easy
789-
to optimize them away when reducing to LLVM bitcode, and if we do not
790-
do so, LLVM will do it. However, we could also expand the scope of
784+
to optimize them away when reducing to bitcode, and if we do not
785+
do so, a backend may do it. However, we could also expand the scope of
791786
operands to include both lvalues and some simple rvalues like
792787
constants. The main advantage of this is that it would reduce the
793788
total number of statements and hence might help with memory

0 commit comments

Comments
 (0)