You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
* Dev-doc updates for the SSAIR section
This mainly fixes typos.
Note that I think the `foo` function on line 114 is wrong. It errors when run (`y` not defined), and it doesn't match the IR shown below it (there's no `bar` function for example). I don't have a fix for that.
* Fix foo example
Co-authored-by: Keno Fischer <keno@alumni.harvard.edu>
Copy file name to clipboardExpand all lines: doc/src/devdocs/ssair.md
+44-38Lines changed: 44 additions & 38 deletions
Original file line number
Diff line number
Diff line change
@@ -6,7 +6,7 @@ Beginning in Julia 0.7, parts of the compiler use a new [SSA-form](https://en.wi
6
6
intermediate representation. Historically, the compiler used to directly generate LLVM IR, from a lowered form of the Julia
7
7
AST. This form had most syntactic abstractions removed, but still looked a lot like an abstract syntax tree.
8
8
Over time, in order to facilitate optimizations, SSA values were introduced to this IR and the IR was
9
-
linearized (i.e. a form where function arguments may only be SSA values or constants). However, non-ssa values
9
+
linearized (i.e. a form where function arguments may only be SSA values or constants). However, non-SSA values
10
10
(slots) remained in the IR due to the lack of Phi nodes in the IR (necessary for back-edges and re-merging of
11
11
conditional control flow), negating much of the usefulfulness of the SSA form representation to perform
12
12
middle end optimizations. Some heroic effort was put into making these optimizations work without a complete SSA
@@ -33,13 +33,13 @@ if edge has an entry of `15`, there must be a `goto`, `gotoifnot` or implicit fa
33
33
statement `15` that targets this phi node). Values are either SSA values or constants. It is also
34
34
possible for a value to be unassigned if the variable was not defined on this path. However, undefinedness
35
35
checks get explicitly inserted and represented as booleans after middle end optimizations, so code generators
36
-
may assume that any use of a phi node will have an assigned value in the corresponding slot. It is also legal
37
-
for the mapping to be incomplete, i.e. for a phi node to have missing incoming edges. In that case, it must
36
+
may assume that any use of a Phi node will have an assigned value in the corresponding slot. It is also legal
37
+
for the mapping to be incomplete, i.e. for a Phi node to have missing incoming edges. In that case, it must
38
38
be dynamically guaranteed that the corresponding value will not be used.
39
39
40
40
PiNodes encode statically proven information that may be implicitly assumed in basic blocks dominated by a given
41
41
pi node. They are conceptually equivalent to the technique introduced in the paper
42
-
"ABCD: Eliminating Array Bounds Checks on Demand" or the predicate info nodes in LLVM. To see how they work, consider,
42
+
[ABCD: Eliminating Array Bounds Checks on Demand](https://dl.acm.org/citation.cfm?id=358438.349342) or the predicate info nodes in LLVM. To see how they work, consider,
43
43
e.g.
44
44
45
45
```julia
@@ -51,7 +51,7 @@ else
51
51
end
52
52
```
53
53
54
-
we can perform predicate insertion and turn this into:
54
+
We can perform predicate insertion and turn this into:
55
55
56
56
```julia
57
57
%x::Union{Int, Float64}# %x is some Union{Int, Float64} typed ssa value
@@ -96,25 +96,29 @@ hand, every catch basic block would have `n*m` phi node arguments (`n`, the numb
96
96
in the critical region, `m` the number of live values through the catch block). To work around
97
97
this, we use a combination of `Upsilon` and `PhiC` (the C standing for `catch`,
98
98
written `φᶜ` in the IR pretty printer, because
99
-
unicode subscript c is not available) nodes. There is several ways to think of these nodes, but
99
+
unicode subscript c is not available) nodes. There are several ways to think of these nodes, but
100
100
perhaps the easiest is to think of each `PhiC` as a load from a unique store-many, read-once slot,
101
101
with `Upsilon` being the corresponding store operation. The `PhiC` has an operand list of all the
102
102
upsilon nodes that store to its implicit slot. The `Upsilon` nodes however, do not record which `PhiC`
103
103
node they store to. This is done for more natural integration with the rest of the SSA IR. E.g.
104
-
if there are no more uses of a `PhiC` node, it is safe to delete is and the same is true of an
105
-
`Upsilon` node. In most IR passes, `PhiC` nodes can be treated similar to`Phi` nodes. One can follow
106
-
use-def chains through them, and they can be lifted to new `PhiC` nodes and new Upsilon nodes (in the
104
+
if there are no more uses of a `PhiC` node, it is safe to delete it, and the same is true of an
105
+
`Upsilon` node. In most IR passes, `PhiC` nodes can be treated like`Phi` nodes. One can follow
106
+
use-def chains through them, and they can be lifted to new `PhiC` nodes and new `Upsilon` nodes (in the
107
107
same places as the original `Upsilon` nodes). The result of this scheme is that the number of
108
-
Upsilon nodes (and `PhiC` arguments) is proportional to the number of assigned values to a particular
108
+
`Upsilon` nodes (and `PhiC` arguments) is proportional to the number of assigned values to a particular
109
109
variable (before SSA conversion), rather than the number of statements in the critical region.
110
110
111
111
To see this scheme in action, consider the function
Note in particular that every value live into the critical region gets
@@ -155,34 +161,34 @@ catch blocks, and all incoming values have to come through a `φᶜ` node.
155
161
156
162
The main `SSAIR` data structure is worthy of discussion. It draws inspiration from LLVM and Webkit's B3 IR.
157
163
The core of the data structure is a flat vector of statements. Each statement is implicitly assigned
158
-
an SSA values based on its position in the vector (i.e. the result of the statement at idx 1 can be
164
+
an SSA value based on its position in the vector (i.e. the result of the statement at idx 1 can be
159
165
accessed using `SSAValue(1)` etc). For each SSA value, we additionally maintain its type. Since, SSA values
160
166
are definitionally assigned only once, this type is also the result type of the expression at the corresponding
161
-
index. However, while this representation is rather efficient (since the assignments don't need to be explicitly)
162
-
encoded, if of course carries the drawback that order is semantically significant, so reorderings and insertions
167
+
index. However, while this representation is rather efficient (since the assignments don't need to be explicitly
168
+
encoded), it of course carries the drawback that order is semantically significant, so reorderings and insertions
163
169
change statement numbers. Additionally, we do not keep use lists (i.e. it is impossible to walk from a def to
164
-
all its uses without explicitly computing this map - def lists however are trivial since you can lookup the
170
+
all its uses without explicitly computing this map--def lists however are trivial since you can look up the
165
171
corresponding statement from the index), so the LLVM-style RAUW (replace-all-uses-with) operation is unavailable.
166
172
167
173
Instead, we do the following:
168
174
169
175
- We keep a separate buffer of nodes to insert (including the position to insert them at, the type of the
170
176
corresponding value and the node itself). These nodes are numbered by their occurrence in the insertion
171
-
buffer, allowing their values to be immediately used elesewhere in the IR (i.e. if there is 12 statements in
172
-
the original statement list, the first new statement will be accessible as `SSAValue(13)`)
177
+
buffer, allowing their values to be immediately used elesewhere in the IR (i.e. if there are 12 statements in
178
+
the original statement list, the first new statement will be accessible as `SSAValue(13)`).
173
179
- RAUW style operations are performed by setting the corresponding statement index to the replacement
174
180
value.
175
181
- Statements are erased by setting the corresponding statement to `nothing` (this is essentially just a special-case
176
-
convention of the above
177
-
-if there are any uses of the statement being erased they will be set to `nothing`)
182
+
convention of the above.
183
+
-If there are any uses of the statement being erased, they will be set to `nothing`.
178
184
179
-
There is a `compact!` function that compacts the above data structure by performing the insertion of nodes in the appropriate place, trivial copy propagation and renaming of uses to any changed SSA values. However, the clever part
185
+
There is a `compact!` function that compacts the above data structure by performing the insertion of nodes in the appropriate place, trivial copy propagation, and renaming of uses to any changed SSA values. However, the clever part
180
186
of this scheme is that this compaction can be done lazily as part of the subsequent pass. Most optimization passes
181
187
need to walk over the entire list of statements, performing analysis or modifications along the way. We provide an
182
-
`IncrementalCompact` iterator that can be used to iterate over the statement list. It will perform any necessary compaction,
188
+
`IncrementalCompact` iterator that can be used to iterate over the statement list. It will perform any necessary compaction
183
189
and return the new index of the node, as well as the node itself. It is legal at this point to walk def-use chains,
184
190
as well as make any modifications or deletions to the IR (insertions are disallowed however).
185
191
186
-
The idea behind this arrangement is that, since the optimization passes need to touch the corresponding memory anyway,
192
+
The idea behind this arrangement is that, since the optimization passes need to touch the corresponding memory anyway
187
193
and incur the corresponding memory access penalty, performing the extra housekeeping should have comparatively little
188
194
overhead (and save the overhead of maintaining these data structures during IR modification).
0 commit comments