Skip to content

Commit 9fad10f

Browse files
tshortKeno
authored andcommitted
Dev-doc updates for the SSAIR section (#30622)
* Dev-doc updates for the SSAIR section This mainly fixes typos. Note that I think the `foo` function on line 114 is wrong. It errors when run (`y` not defined), and it doesn't match the IR shown below it (there's no `bar` function for example). I don't have a fix for that. * Fix foo example Co-authored-by: Keno Fischer <keno@alumni.harvard.edu>
1 parent 5ef65dc commit 9fad10f

File tree

1 file changed

+44
-38
lines changed

1 file changed

+44
-38
lines changed

doc/src/devdocs/ssair.md

Lines changed: 44 additions & 38 deletions
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@ Beginning in Julia 0.7, parts of the compiler use a new [SSA-form](https://en.wi
66
intermediate representation. Historically, the compiler used to directly generate LLVM IR, from a lowered form of the Julia
77
AST. This form had most syntactic abstractions removed, but still looked a lot like an abstract syntax tree.
88
Over time, in order to facilitate optimizations, SSA values were introduced to this IR and the IR was
9-
linearized (i.e. a form where function arguments may only be SSA values or constants). However, non-ssa values
9+
linearized (i.e. a form where function arguments may only be SSA values or constants). However, non-SSA values
1010
(slots) remained in the IR due to the lack of Phi nodes in the IR (necessary for back-edges and re-merging of
1111
conditional control flow), negating much of the usefulfulness of the SSA form representation to perform
1212
middle end optimizations. Some heroic effort was put into making these optimizations work without a complete SSA
@@ -33,13 +33,13 @@ if edge has an entry of `15`, there must be a `goto`, `gotoifnot` or implicit fa
3333
statement `15` that targets this phi node). Values are either SSA values or constants. It is also
3434
possible for a value to be unassigned if the variable was not defined on this path. However, undefinedness
3535
checks get explicitly inserted and represented as booleans after middle end optimizations, so code generators
36-
may assume that any use of a phi node will have an assigned value in the corresponding slot. It is also legal
37-
for the mapping to be incomplete, i.e. for a phi node to have missing incoming edges. In that case, it must
36+
may assume that any use of a Phi node will have an assigned value in the corresponding slot. It is also legal
37+
for the mapping to be incomplete, i.e. for a Phi node to have missing incoming edges. In that case, it must
3838
be dynamically guaranteed that the corresponding value will not be used.
3939

4040
PiNodes encode statically proven information that may be implicitly assumed in basic blocks dominated by a given
4141
pi node. They are conceptually equivalent to the technique introduced in the paper
42-
"ABCD: Eliminating Array Bounds Checks on Demand" or the predicate info nodes in LLVM. To see how they work, consider,
42+
[ABCD: Eliminating Array Bounds Checks on Demand](https://dl.acm.org/citation.cfm?id=358438.349342) or the predicate info nodes in LLVM. To see how they work, consider,
4343
e.g.
4444

4545
```julia
@@ -51,7 +51,7 @@ else
5151
end
5252
```
5353

54-
we can perform predicate insertion and turn this into:
54+
We can perform predicate insertion and turn this into:
5555

5656
```julia
5757
%x::Union{Int, Float64} # %x is some Union{Int, Float64} typed ssa value
@@ -96,25 +96,29 @@ hand, every catch basic block would have `n*m` phi node arguments (`n`, the numb
9696
in the critical region, `m` the number of live values through the catch block). To work around
9797
this, we use a combination of `Upsilon` and `PhiC` (the C standing for `catch`,
9898
written `φᶜ` in the IR pretty printer, because
99-
unicode subscript c is not available) nodes. There is several ways to think of these nodes, but
99+
unicode subscript c is not available) nodes. There are several ways to think of these nodes, but
100100
perhaps the easiest is to think of each `PhiC` as a load from a unique store-many, read-once slot,
101101
with `Upsilon` being the corresponding store operation. The `PhiC` has an operand list of all the
102102
upsilon nodes that store to its implicit slot. The `Upsilon` nodes however, do not record which `PhiC`
103103
node they store to. This is done for more natural integration with the rest of the SSA IR. E.g.
104-
if there are no more uses of a `PhiC` node, it is safe to delete is and the same is true of an
105-
`Upsilon` node. In most IR passes, `PhiC` nodes can be treated similar to `Phi` nodes. One can follow
106-
use-def chains through them, and they can be lifted to new `PhiC` nodes and new Upsilon nodes (in the
104+
if there are no more uses of a `PhiC` node, it is safe to delete it, and the same is true of an
105+
`Upsilon` node. In most IR passes, `PhiC` nodes can be treated like `Phi` nodes. One can follow
106+
use-def chains through them, and they can be lifted to new `PhiC` nodes and new `Upsilon` nodes (in the
107107
same places as the original `Upsilon` nodes). The result of this scheme is that the number of
108-
Upsilon nodes (and `PhiC` arguments) is proportional to the number of assigned values to a particular
108+
`Upsilon` nodes (and `PhiC` arguments) is proportional to the number of assigned values to a particular
109109
variable (before SSA conversion), rather than the number of statements in the critical region.
110110

111111
To see this scheme in action, consider the function
112112

113113
```julia
114+
@noinline opaque() = invokelatest(identity, nothing) # Something opaque
114115
function foo()
116+
local y
115117
x = 1
116118
try
117119
y = 2
120+
opaque()
121+
y = 3
118122
error()
119123
catch
120124
end
@@ -125,24 +129,26 @@ end
125129
The corresponding IR (with irrelevant types stripped) is:
126130

127131
```
128-
ir = Code
129-
1 ─ nothing
130-
2 ─ $(Expr(:enter, 5))
131-
3 ─ %3 = ϒ (#undef)
132-
│ %4 = ϒ (1)
133-
│ %5 = ϒ (2)
134-
│ Main.bar()
135-
│ %7 = ϒ (3)
132+
1 ─ nothing::Nothing
133+
2 ─ %2 = $(Expr(:enter, #4))
134+
3 ─ %3 = ϒ (false)
135+
│ %4 = ϒ (#undef)
136+
│ %5 = ϒ (1)
137+
│ %6 = ϒ (true)
138+
│ %7 = ϒ (2)
139+
│ invoke Main.opaque()::Any
140+
│ %9 = ϒ (true)
141+
│ %10 = ϒ (3)
142+
│ invoke Main.error()::Union{}
143+
└── $(Expr(:unreachable))::Union{}
144+
4 ┄ %13 = φᶜ (%3, %6, %9)::Bool
145+
│ %14 = φᶜ (%4, %7, %10)::Core.Compiler.MaybeUndef(Int64)
146+
│ %15 = φᶜ (%5)::Core.Compiler.Const(1, false)
136147
└── $(Expr(:leave, 1))
137-
4 ─ goto 6
138-
5 ─ %10 = φᶜ (%3, %5)
139-
│ %11 = φᶜ (%4, %7)
140-
└── $(Expr(:leave, 1))
141-
6 ┄ %13 = φ (4 => 2, 5 => %10)::NotInferenceDontLookHere.MaybeUndef(NotInferenceDontLookHere.Const(2, false))
142-
│ %14 = φ (4 => 3, 5 => %11)::Int64
143-
│ $(Expr(:undefcheck, :y, Core.SSAValue(13)))
144-
│ %16 = Core.tuple(%14, %13)
145-
└── return %17
148+
5 ─ $(Expr(:pop_exception, :(%2)))::Any
149+
│ $(Expr(:throw_undef_if_not, :y, :(%13)))::Any
150+
│ %19 = Core.tuple(%15, %14)
151+
└── return %19
146152
```
147153

148154
Note in particular that every value live into the critical region gets
@@ -155,34 +161,34 @@ catch blocks, and all incoming values have to come through a `φᶜ` node.
155161

156162
The main `SSAIR` data structure is worthy of discussion. It draws inspiration from LLVM and Webkit's B3 IR.
157163
The core of the data structure is a flat vector of statements. Each statement is implicitly assigned
158-
an SSA values based on its position in the vector (i.e. the result of the statement at idx 1 can be
164+
an SSA value based on its position in the vector (i.e. the result of the statement at idx 1 can be
159165
accessed using `SSAValue(1)` etc). For each SSA value, we additionally maintain its type. Since, SSA values
160166
are definitionally assigned only once, this type is also the result type of the expression at the corresponding
161-
index. However, while this representation is rather efficient (since the assignments don't need to be explicitly)
162-
encoded, if of course carries the drawback that order is semantically significant, so reorderings and insertions
167+
index. However, while this representation is rather efficient (since the assignments don't need to be explicitly
168+
encoded), it of course carries the drawback that order is semantically significant, so reorderings and insertions
163169
change statement numbers. Additionally, we do not keep use lists (i.e. it is impossible to walk from a def to
164-
all its uses without explicitly computing this map - def lists however are trivial since you can lookup the
170+
all its uses without explicitly computing this map--def lists however are trivial since you can look up the
165171
corresponding statement from the index), so the LLVM-style RAUW (replace-all-uses-with) operation is unavailable.
166172

167173
Instead, we do the following:
168174

169175
- We keep a separate buffer of nodes to insert (including the position to insert them at, the type of the
170176
corresponding value and the node itself). These nodes are numbered by their occurrence in the insertion
171-
buffer, allowing their values to be immediately used elesewhere in the IR (i.e. if there is 12 statements in
172-
the original statement list, the first new statement will be accessible as `SSAValue(13)`)
177+
buffer, allowing their values to be immediately used elesewhere in the IR (i.e. if there are 12 statements in
178+
the original statement list, the first new statement will be accessible as `SSAValue(13)`).
173179
- RAUW style operations are performed by setting the corresponding statement index to the replacement
174180
value.
175181
- Statements are erased by setting the corresponding statement to `nothing` (this is essentially just a special-case
176-
convention of the above
177-
- if there are any uses of the statement being erased they will be set to `nothing`)
182+
convention of the above.
183+
- If there are any uses of the statement being erased, they will be set to `nothing`.
178184

179-
There is a `compact!` function that compacts the above data structure by performing the insertion of nodes in the appropriate place, trivial copy propagation and renaming of uses to any changed SSA values. However, the clever part
185+
There is a `compact!` function that compacts the above data structure by performing the insertion of nodes in the appropriate place, trivial copy propagation, and renaming of uses to any changed SSA values. However, the clever part
180186
of this scheme is that this compaction can be done lazily as part of the subsequent pass. Most optimization passes
181187
need to walk over the entire list of statements, performing analysis or modifications along the way. We provide an
182-
`IncrementalCompact` iterator that can be used to iterate over the statement list. It will perform any necessary compaction,
188+
`IncrementalCompact` iterator that can be used to iterate over the statement list. It will perform any necessary compaction
183189
and return the new index of the node, as well as the node itself. It is legal at this point to walk def-use chains,
184190
as well as make any modifications or deletions to the IR (insertions are disallowed however).
185191

186-
The idea behind this arrangement is that, since the optimization passes need to touch the corresponding memory anyway,
192+
The idea behind this arrangement is that, since the optimization passes need to touch the corresponding memory anyway
187193
and incur the corresponding memory access penalty, performing the extra housekeeping should have comparatively little
188194
overhead (and save the overhead of maintaining these data structures during IR modification).

0 commit comments

Comments
 (0)