@@ -12,7 +12,7 @@ well-suited to type-checking and translation.
12
12
# Motivation
13
13
14
14
The current compiler uses a single AST from the initial parse all the
15
- way to the final generation of LLVM . While this has some advantages,
15
+ way to the final generation of bitcode . While this has some advantages,
16
16
there are also a number of distinct downsides.
17
17
18
18
1 . The complexity of the compiler is increased because all passes must
@@ -49,19 +49,19 @@ there are also a number of distinct downsides.
49
49
50
50
3 . The reliability of safety analyses is reduced because the gap
51
51
between what is being analyzed (the AST) and what is being executed
52
- (LLVM bitcode) is very wide. The MIR is very low-level and hence the
53
- translation to LLVM should be straightforward.
52
+ (bitcode) is very wide. The MIR is very low-level and hence the
53
+ translation to bitcode should be straightforward.
54
54
55
55
4 . The reliability of safety proofs, when we have some, would be
56
56
reduced because the formal language we are modeling is so far from
57
57
the full compiler AST. The MIR is simple enough that it should be
58
58
possible to (eventually) make safety proofs based on the MIR
59
59
itself.
60
60
61
- 5 . Rust-specific optimizations, and optimizing trans output, are very
61
+ 5 . Rust-specific optimizations, and optimizing ` trans ` output, are very
62
62
challenging. There are numerous cases where it would be nice to be
63
- able to do optimizations * before* translating to LLVM bitcode, or
64
- to take advantage of Rust-specific knowledge of which LLVM is
63
+ able to do optimizations * before* translating to bitcode, or
64
+ to take advantage of Rust-specific knowledge of which a backend may be
65
65
unaware. Currently, we are forced to do these optimizations as part
66
66
of lowering to bitcode, which can get quite complex. Having an
67
67
intermediate form improves the situation because:
@@ -72,11 +72,9 @@ there are also a number of distinct downsides.
72
72
would be safe.
73
73
74
74
c. In all cases, whatever we can do on the MIR will be helpful for other
75
- targets beyond LLVM (see next bullet).
75
+ targets beyond existing backends (see next bullet).
76
76
77
- 6 . Migrating away from LLVM is nearly impossible. In the future, it
78
- may be advantageous to provide a choice of backends beyond
79
- LLVM. Currently though this is infeasible, since so much of the
77
+ 6 . Migrating away from LLVM is nearly impossible. Since so much of the
80
78
semantics of Rust itself are embedded in the ` trans ` step which
81
79
converts to LLVM IR. Under the MIR design, those semantics are
82
80
instead described in the translation from AST to MIR, and the LLVM
@@ -155,8 +153,8 @@ using a MIR:
155
153
- ** borrow and move checking** : the borrow checker already uses a
156
154
combination of the CFG and ` ExprUseVisitor ` to try and achieve a
157
155
similarly low-level of detail.
158
- - ** translation to LLVM IR** : the MIR is much closer than the AST to
159
- the desired end-product.
156
+ - ** translation to IR** : the MIR is much closer than the AST to
157
+ the desired bitcode end-product.
160
158
161
159
Some other passes would probably work equally well on the MIR or an
162
160
AST, but they will likely find the MIR somewhat easier to work with
@@ -442,7 +440,7 @@ fact not be the correct representation.
442
440
443
441
In any case, after the move and correctness checking is done, it is
444
442
easy enough to remove these aggregate rvalues and replace them with
445
- assignments. This could potentially be done during LLVM lowering, or
443
+ assignments. This could potentially be done during lowering, or
446
444
as a pre-pass that transforms MIR statements like:
447
445
448
446
x = ...x;
@@ -469,13 +467,12 @@ within a single assignment (and nowhere else):
469
467
Going further, once type-checking is done, it is plausible to do
470
468
further lowering within the MIR purely for optimization purposes. For
471
469
example, we could introduce intermediate references to cache the
472
- results of common lvalue computations and so forth. This may well be
473
- better left to LLVM (or at least to the lowering pass).
470
+ results of common lvalue computations and so forth.
474
471
475
472
### Bounds checking
476
473
477
474
Because bounds checks are fallible, it's important to encode them in
478
- the MIR whenever we do indexing. Otherwise the trans code would have
475
+ the MIR whenever we do indexing. Otherwise the ` trans ` code would have
479
476
to figure out on its own how to do unwinding at that point. Because
480
477
the MIR doesn't "desugar" fat pointers, we include a special rvalue
481
478
` LEN ` that extracts the length from an array value whose type matches
@@ -602,8 +599,7 @@ that instrumentation will be done as needed to prevent double
602
599
drops. Currently, this signaling is done by zeroing out memory at
603
600
runtime, but we are in the process of introducing stack flags for this
604
601
purpose: the MIR offers the opportunity to reify those flags if we
605
- wanted, and rewrite drops to be more narrow (versus leaving that work
606
- for LLVM).
602
+ wanted, and rewrite drops to be more narrow.
607
603
608
604
To illustrate how drop works, let's work through a simple
609
605
example. Imagine that we have a snippet of code like:
@@ -686,7 +682,7 @@ scoping information should go away.
686
682
687
683
### Monomorphization
688
684
689
- Currently, we do monomorphization at LLVM translation time. If we ever
685
+ Currently, we do monomorphization at translation time. If we ever
690
686
chose to do it at a MIR level, that would be fine, but one thing to be
691
687
careful of is that we may be able to elide ` Drop ` nodes based on the
692
688
specific types.
@@ -721,7 +717,7 @@ check.
721
717
722
718
** Converting from AST to a MIR will take some compilation time.**
723
719
Expectations are that constructing the MIR will be quite fast, and
724
- that follow-on code (such as trans and borrowck) will execute faster,
720
+ that follow-on code (such as ` trans ` and ` borrowck ` ) will execute faster,
725
721
because they will operate over a simpler and more compact
726
722
representation. However, this needs to be measured.
727
723
@@ -747,9 +743,8 @@ versus having the stack modeled as allocas. The current model is also
747
743
helpful for generating debuginfo.
748
744
749
745
SSA representation can be helpful for more sophisticated backend
750
- optimizations. However, we tend to leave those optimizations to LLVM,
751
- and hence it makes more sense to have the MIR be based on lvalues
752
- instead. There are some cases where it might make sense to do analyses
746
+ optimizations. However, it makes more sense to have the MIR be based on
747
+ lvalues. There are some cases where it might make sense to do analyses
753
748
on the MIR that would benefit from SSA, such as bounds check elision.
754
749
In those cases, we could either quickly identify those temporaries
755
750
that are not mutably borrowed (and which therefore act like SSA
@@ -760,7 +755,7 @@ both with and without SSA, honestly.)
760
755
761
756
** Exclude unwinding.** Excluding unwinding from the MIR would allow us
762
757
to elide annoying details like bounds and overflow checking. These are
763
- not particularly interesting to borrowck, so that is somewhat
758
+ not particularly interesting to ` borrowck ` , so that is somewhat
764
759
appealing. But that would mean that consumers of MIR would have to
765
760
reconstruct the order of drops and so forth on unwinding paths, which
766
761
would require them reasoning about scopes and other rather complex
@@ -786,8 +781,8 @@ which desugars to a temporary and a constant reference:
786
781
x = tmp0(tmp1)
787
782
788
783
There is no particular * harm* in such constants: it would be very easy
789
- to optimize them away when reducing to LLVM bitcode, and if we do not
790
- do so, LLVM will do it. However, we could also expand the scope of
784
+ to optimize them away when reducing to bitcode, and if we do not
785
+ do so, a backend may do it. However, we could also expand the scope of
791
786
operands to include both lvalues and some simple rvalues like
792
787
constants. The main advantage of this is that it would reduce the
793
788
total number of statements and hence might help with memory
0 commit comments