Lesson 6: Static Single Assignment #454

sampsyo · 2025-01-21T20:31:15Z

sampsyo
Jan 21, 2025
Maintainer

⚠️ Warning: Implementing the into SSA and out of SSA transformations can be trickier than it looks!

UnsignedByte · 2025-03-03T23:28:41Z

UnsignedByte
Mar 3, 2025

Into SSA

I decided to use the new get and set version of SSA, which also meant that some of the algorithm (mainly rename had to be slightly modified to use this method. Mainly, I needed a way to generate both unique shadow names as well as unique non-shadow names. For this, I used the fact that a get instruction may only occur once in a basic block for a given base variable, and just generated unique shadow names based on the block they were generated in.

In my renaming, I modified it slightly by first adding v.shadow: <type> = get; for all the phi nodes in the basic block. Then, instead of generating phi nodes, I just did set v.shadow v; in the successors. Additionally, here was where I tracked whether a variable was undefined. This was done by checking if the source of the set was already in the name stack, as otherwise this means that the variable was not defined in any of its dominators, so an undef is necessary.

When handling undef, I decided to just collect a set of undefined variables and define them all in the entry block. I could have chosen to set them as undef right before the set instruction, but doing this in the entry block meant that at most one undef will be added to the program for each variable, which would lower the total number of executed instructions.

There was also one more nuance - if a function argument needs a phi node in the entry block (assuming the entry block is named) this leads to a problem where we generate a get with no corresponding set. Therefore, I needed to generate an empty entry block above all other blocks to prevent this from happening if the entry block has a label.

From SSA

Because I used the get/set method, from-ssa became incredibly simple - I just removed all gets and changed all set x y to x: ty = id y.

Testing

I tested my implementations by checking the outputs of the program execution against the unoptimized execution, to make sure that changing to SSA did not affect behavior. I also tested using brench against all the benchmarks in bril, where I found a bunch of different bugs in my implementation but also an interesting one within dead-branch.bril where immediately jumping to loop_end leaves v4 undefined, which brilirs for some reason catches and errors despite this branch not actually being run (I assume it does some well formedness checks?). This errors in non-ssa but because we add the v4 = undef in the SSA conversion brilirs no longer errors.

I also implemented a quick is-ssa sanity check that simply runs through each function and makes sure that no value is ever written to more than once (keeping a running set of written values).

Summary

As you warned, implementing into and out of SSA was more complicated than I expected (mainly for into SSA). I spent most of my time fixing bugs and dealing with edge cases like phi nodes in entry blocks, and undefined values. Overall, I would say that I deserve a star as I made sure the SSA tests worked on the full suite of benchmarks, which ended up finding a lot of edge cases I would not have thought of myself.

1 reply

sampsyo Mar 5, 2025
Maintainer Author

Awesome; nice (and prompt) work on this!

Mainly, I needed a way to generate both unique shadow names as well as unique non-shadow names.

Interesting—in my version, it ended up being sufficient to just rename the non-shadow names, and then reuse the "name on entry" as the name of the shadow variable. Your approach sounds cool too, and maybe it made your "out of SSA" pass simpler, if you assume that the names were disjoint.

When handling undef, I decided to just collect a set of undefined variables and define them all in the entry block. I could have chosen to set them as undef right before the set instruction, but doing this in the entry block meant that at most one undef will be added to the program for each variable, which would lower the total number of executed instructions.

Entertainingly, I made exactly the same decision partway through my implementation (when I updated my pass to use the new undef instruction). Here's where that change happened: sampsyo/bril@dbd6c1d

neel-patel-1 · 2025-03-06T17:39:40Z

neel-patel-1
Mar 6, 2025

Into/Out-Of SSA
I used the deprecated phi node instructions instead of the get/set methods. Mainly because I started working on this before the lecture where they were introduced. I believe that those may have led to an easier into-SSA conversion and debugging experience, but most of my challenges came from oversights that required testing on the benchmarks to uncover.

Verification && Performance
I used turnt and the is_ssa.py script to validate my to_ssa conversion on all core benchmarks. Then I used brench to measure the overhead of converting to/from SSA. Without applying the local and global optimizations from the prior lessons there is up to 3.8x overhead. With the optimizations there is sometimes a speedup over the baseline, but also sometimes a slowdown. Three of the benchmarks -- collatz, digital-root, and sum-divisors -- are timing out with my implementations, but have not figured out what I am missing yet.

Challenges
I realized my dominator tree construction algorithm from lesson 5 -- used here for recursively calling rename -- did not implement the immediately dominates relation and needed to correct it.

I needed to make sure to insert instructions before the terminator of the basic block when going out of SSA

Dealing with function arguments.

1 reply

sampsyo Mar 14, 2025
Maintainer Author

Sounds good overall; nice work! I hope it worked to just revert to an older version of the reference interpreter to make your tests work. And it's cool that your SSA work uncovered some bugs in your previous CFG analysis stuff.

lisarli · 2025-03-06T20:00:45Z

lisarli
Mar 6, 2025

In collaboration with @bryantpark04 and @dhan0779
source

To SSA: We implemented the dominance-frontier-based get/set version of SSA. Our implementation uses three passes to perform the full conversion: finding names of the variables which need phi-nodes in each block, computing the get/set/undef instructions needed for each block (rename from the original algorithm), and actually inserting the new instructions. We mostly followed the logic discussed in class, although one change we needed to support the get/set version was delaying the construction of the set instructions until the third pass. This was because we did not generate names for shadow variables until we processed (called rename on) the block in which the shadow variable was read, so in our second pass, we record sets by simply recording the id of the block where the corresponding get is located, and we resolve the name of the shadow variable in the third pass.

Most of the issues we ran into with to_ssa were easy to detect (for instance, we initially forgot to push arguments onto their corresponding stacks), and we were able to recognize these issues early on while testing on a couple basic programs. When running over the full suite of benchmarks, we ran into a couple more nuanced issues. In particular, is_decreasing.bril had an unreachable block created by a ret followed by a jmp, which resulted in the jmp block having no predecessors in our CFG, breaking our dominance sets computation. We fixed this by automatically detecting and removing these dead blocks when constructing the CFG.

From SSA: Converting out of SSA was very straightforward; we first perform a pass to get the type of every variable, then we replace every set with an id operation and remove all get instructions.

Testing: We tested to_ssa by converting each of the bril benchmarks to SSA and checking that the resulting program was actually in SSA form using the provided is_ssa.py script. We also checked the program behavior was not affected by the conversion by running with turnt over the benchmarks. Similarly, we tested from_ssa by doing a roundtrip conversion to and from SSA and checking the outputs matched those of the original programs with turnt.

Analysis: We tested the performance of our implementation on every benchmark in the bril repository using brench. Our SSA pipeline ended with the transformation into SSA form. To count static instructions, we piped the output of bril2txt into grep -v "^@\|^}$" to filter out non-instruction lines of code. Below are our results:

	static instructions	dynamic instructions
baseline	393	27,267,247
ssa	434	62,900,641
roundtrip	471	39,140,623

We found that the SSA form had a 10% static overhead and a 130% dynamic overhead over the baseline. The full round-trip including optimizations made while in SSA form resulted in a 20% static overhead and a 43% dynamic overhead over the baseline.

Conclusions: Deciding how to modify to_ssa for get/set required some thinking, and it was interesting realizing the need for undef as we tried to construct set instructions for blocks which did not have a definition for a phi variable in a later block. We believe we deserve a Michelin star since we thoroughly tested and analyzed our to_ssa and from_ssa implementations.

1 reply

sampsyo Mar 14, 2025
Maintainer Author

Awesome! I'm glad this went well! Thanks for the very clear description of the ways you needed to diverge from the "classic" algorithm in class.

scober · 2025-03-06T21:55:10Z

scober
Mar 6, 2025

code

Into SSA

This was tricky and took me a while to get right. I had a misguided first attempt to do this conversion in multiple, independent passes (one for adding gets, one for adding sets, one for locally renaming variables, etc.). Unsurprisingly, this did not work out and I implemented the Cytron et. al. style algorithm given on the course website. I also had a bit of a tricky time properly converting that algorithm to bril-style upsilon/phi ssa.

My funniest bug was that I accidentally created instructions like this:

{'op': set}

instead of

{'op': 'set'}

Because set is a built-in type in Python this is a valid Python dictionary. But it leads to the inscrutable error

Object of type type is not JSON serializable

when I tried to write my output to stdout. I had a bunch of other bugs too, especially related to unreachable cfg nodes, but none of them were very funny.

I also decided to do the naive thing and just undef every variable at the beginning of the entry block, with the reasoning that a "real" compiler would be doing a bunch of dead code elimination passes on the resulting SSA program anyway.

Out of SSA

In upsilon/phi style, this is pretty straightforward! sets turn into ids and gets just go away!

I went back and forth on whether "non-ssa" form meant all undefs had to be removed. I tried a bunch of strategies including block-local undef removing, one based on reaching definitions, and one based on a thing I made up that I called "dominating definitions". None of them worked and when I told Adrian I was having trouble removing all the undefs from my programs he said (and I quote) "don't do that". So I decided to give up on that plan.

Testing

I tested my correctness by using the is_ssa script and by comparing the original bril program, the ssa bril program, and the de-ssa-ed bril program in turnt with a shared output file. I ran all of my tests on every bril core benchmark and a few handcrafted test cases.

On that set of programs my ssa round trip took me from 1734 total static instructions to 4271 total static instructions, a ~2.5X increase. The round trip also took me from 8555030 total dynamic instructions to 22561140 total dynamic instructions, a ~2.6X increase. That is a big increase, but it is heartening that the dynamic increase was not substantially more than the static increase, which suggests I didn't do anything pathological.

1 reply

sampsyo Mar 14, 2025
Maintainer Author

That is indeed a hilarious Python-flavored bug! Clearly the right solution would be to put

set = 'set'

at the top level of the module.

And yeah, a technique for fully removing all the undefs remains beyond my feeble comprehension. I am actually not sure it's possible in general, but I also don't have a real counter-example that convincingly makes that case either. So if anybody has any insights about what makes this possible/impossible, I'm all ears…

parthsarkar17 · 2025-03-06T21:58:22Z

parthsarkar17
Mar 6, 2025

Code

This directory contains my SSA implementation, modules for basic block and CFG construction, and also a little statistics collection script. This is a correctness script I wrote to make sure my into-SSA actually produced SSA code and to make sure the out-of-SSA has the same behavior as the original.

Summary + How It Works

I implemented the naive implementation of SSA this week since I was a little short on time. As motivated by the in-class description, I implemented a map from each (og_var, i) pair to the unique, local version of og_var used by the block, indexed by block ID i. This unique version (let's call it og_var.i.1) will be the thing we "get" at the beginning of the basic block. So, for the entire contents of the original basic block, I assumed we could access og_var.i.1 for every og_var ever written to in the entire function.

Now, here's how I implemented sets. While going through each instruction of a block, I made sure to count the number of writes to og_var; for every write j, I made sure that all references to og_var (before the write after j) referred to og_var.i.{j+1}. At the end of the block, for every successor indexed by s, I would make sure to shadow-set the latest "version" of og_var.i.j into og_var.s.1. Then, we can repeat the same two processes for block s.

There were a couple more corner cases to accurately get the program into SSA. For one, I had to make sure to shadow-set the arguments to the function. Further, I realized that, since every basic block had m x n get instructions before its actual instructions, each shadow-variable should be shadow-set before the block is run. This isn't the case for the entry block to the program (but is the case for all other blocks because block i will shadow-set all variables for every successor s). So, I had to make sure to shadow-set all variables of the form var.1.1 to undef.

Then, transforming out of SSA was really simple. I just deleted all get instructions and transformed every set x y instruction into a x : type = id y instruction. I realized I hadn't maintained the types of the variables being set, so, at this point, I had to go back and maintain a hashmap from the set x y instruction to the Bril_type.t value of y. This was a pain to do go back and do functionally because my abstractions were already laid out in way that made it inconvenient to fold together another data structure-- resorted to using a mutable Hashtbl.t. Pick your battles wisely, or something.

Testing

For testing, I wrote a script to do the following for every program in /benchmarks:

I sent the program through my roundtrip transformation, and interpreted it. If the stdout was different than benchmark.out, I wrote the diff to the console. So, if everything was the same, nothing would be written to console
I did the same for just transforming into SSA (not coupling it with out of SSA)
After transforming into SSA, I piped the output into python bril/examples/is_ssa.py. I got a bunch of "yes"s in the console output, and no "no"s, so I was satisfied that my programs were in SSA

I had to skip two designs. For some reason long/function_call.bril did not terminate with brili even without my transformation. Further, I had to skip mixed/random_walk.bril because it had an op that my OCaml library did not yet support. I'll get around to adding this for next time, but since it was only one design, I figured it was OK to ignore for this assignment.

Performance

Here are my statistics in terms of percentage increase in dynamic instruction count:

maximum: 4708.7040619
minimum: 100.
mean: 601.814589181
stdev: 491.872928093
median: 520.192307692

So, on average, my naive SSA made the dynamic instruction count 600% or 7x worse. This is expected: for every basic block, I had on the order of m x n set instructions, which I then turned into id instructions when transforming out of SSA.

Hardest Part

The hardest part was debugging. More specifically, for every variable var in the entire function, my original implementation would insert set var.1.1 undef (not an actual instruction) into the beginning of the entry basic block. Normally, this would be fine. However, I ran into an issue when running a benchmark program where I would get nonsense numbers. It turns out, my assumption that entry would have no predecessors was wrong. What was happening was, there was a loop later on the program that reassigned control to entry, and I would inadvertently reset many of my variables. I fixed this my inserting a dummy basic block that I knew for sure did not have any predecessors.

Star

I'd say my work deserves a star!

1 reply

sampsyo Mar 14, 2025
Maintainer Author

Very cool; nice work on all counts! interesting find about the couple of programs that broke—given infinite time, it would be fun to try to narrow down what it was in those benchmarks that triggered a problem. In any case, well done!

zihan0822 · 2025-03-07T00:27:50Z

zihan0822
Mar 7, 2025

source

Into SSA

I ended up implementing a dominance free version of ssa transform with set and get. And I did not use the provided undef instruction. Main steps of my algorithm are:

Identify Reaching Definition:
I implemented some sort of reaching definition algorithm with the generic worklist solver to keep track of the nearest definition of each live-in variable. The term nearest really means that there is no other definition of the same variable in any path from the definition block and the current block. If there is a conflict of variable definition, which means that the variable comes from multiple places and this variable is used in the current block, we update its source to the current block, because later on we will insert a get instruction for that particular variable in this block.
merge:

in[b] = U out[p] for all pred p
for v in in[b]:
    if v has conflict and v is used in b:
          in[b][v] = b

transfer: update variable source in Kill(b)

Renaming: I used the scheme${block_name}.${variable_name}.${count}
Insert set and get:
For every block, if a live-in variable (say v) has conflicting definition, we insert a get at the start of this block. And for all possible sources of v, say S, we insert a set at the end of 'S', using the last name we assigned to v in S. This is one of the invariant we maintain in the worklist algorithm. Some special care needs to be taken for function parameters.

From SSA

This is quite straight-forward, we just delete all get and change all set to id.

Test & Perf

I ran is_ssa.py against ssa generated for all benchmarks in bril/benchmarks/core to make sure they are indeed in ssa form. I used brench to make sure the final program after ssa round-trip was still correct. I also compared my implementation a bit with the dominance tree based implementation into_ssa.py and from_ssa.py in bril/examples

The following is the relative increase of the number of dyn inst executed compared to baseline of two algos, there is no dce involved in between the round trip. There seems to be a consistent decrease in the number of set/get inserted with my algo.

I noticed that in bril/examples/ssa_brench.toml, some extra dce passes are inserted in between the ssa round-trip. I also tried to compare those two in this setting. In this case, the gap between those two algorithms are reduced a lot.

Conclusion

In hindsight, I found my implementation, especially the first step, is in some sense similar to the recursive predecessor querying process in the paper we discussed. We both handle cyclic cfgs by inserting new source when conflict on live-in happens. It seems that after the round-trip, less dce opportunities can be exploited with my algo compared to the example impl (I used the dce program provided in examples as well). I have not looked too much into the potential reason of that. The hardest part of this lesson I would say is debugging. I think I deserve a Michelin Star because I tried some new stuff and tested my program thoroughly.

1 reply

sampsyo Mar 14, 2025
Maintainer Author

Nice, that sounds great! It's a creative solution, which sounds like the "use-side" conversion algorithm theorized by @UnsignedByte that I discussed somewhat farther on Zulip. And it's very interesting that there is a measurable difference between the two approaches! Nice work here!

mt-xing · 2025-03-07T01:56:31Z

mt-xing
Mar 7, 2025

My implementation of SSA is here: https://github.com/mt-xing/cs6120/tree/main/l6

I implemented SSA using the dominance frontier algorithm seen in class. Coming out of SSA was trivial as I just replaced all the upsilon nodes with id and then deleted the phi nodes.

The hardest part by far was all the edge cases I ran into. For example, my dominance frontier implementation was strictly looking at successors of dominated nodes that were not dominated by the original node, without the edge case of allowing itself to be in the frontier. This caused some infinite loops on test cases that took a while to track down. Likewise, handling arguments to the function was a bit annoying as I needed to pretend that these were effectively set in a previous block and add handling for that. My implementation does conservatively set everything to undef at the start, and also throw everything into the shadow store. These can be easily removed via Dead Code Elimination in a later pass if they're not needed, and they do avoid a few other edge cases.

My implementation does not support speculative execution. The control flow complexities that come from speculate being able to theoretically jump to any target of any guard that we can't determine at compile-time was something I worked around by simply modifying my CFG to have every single block with speculate have every block pointed to by a guard be its successor. However, I noticed that when interpreting my SSA output, the execution would still print the wrong value, as if the rollback after guard did not reset the shadow environment. I'm not sure why this is happening, but since this is just an extension, I'm happy that my implementation works correctly against all the core test cases.

I tested my implementation against the entirety of the bril benchmarks and tests folders, as I have with all my previous assignments. The engineering work I put into the earlier test harness is really paying off here, as it helped me catch all of the bugs I mentioned above. My harness automatically runs my code in the interpreter and compares it to running the original through the interpreter. It also collects statistics as it goes. For these runs, I disabled the checks that require my "optimization" result in fewer instructions executed, since we expect SSA to increase the instruction count.

Here are the statistics reported for dynamic instruction count changes, running against the entire bril benchmarks and tests, as well as a small number of my hand-crafted test cases:

=========================================
Optimization Report for SSA:
Average Reduction: -676402.976331361
Minimum Reduction: -26500070
1st Quartile Reduction: -2557
Median Reduction: -233
3rd Quartile Reduction: -7
Maximum Reduction: 0
=========================================

Note the numbers are negative in terms of reduction (ie: my SSA pass added instructions, as expected).

Looks like my implementation added 233 dynamic instructions as its median, although there are definitely some outliers that were very bad. This is my SSA pass alone, without running the dead code elimination that would clean up a lot of unnecessary undefs and sets at the beginning of each function.

I do believe this is worthy of a Michelin star, as it uses the dominance frontier based algorithm, and is thoroughly tested against all of the core Bril, as well as every extension minus speculative execution.

1 reply

sampsyo Mar 14, 2025
Maintainer Author

Nice work!! This seems great!

Sounds like you were not the only one who revealed bugs in their previous dominance-related utilities by observing the consequences in their SSA conversion.

My implementation does not support speculative execution.

This is an extremely reasonable decision. I didn't anticipate that anyone would attempt to handle this in the SSA conversion!

samuelbreckenridge · 2025-03-07T03:02:26Z

samuelbreckenridge
Mar 7, 2025

Group (@ngernest, @katherinewu312, @samuelbreckenridge)

Code

For our conversion into SSA we use the basic approach of introducing a unique copy of each variable for every basic
block. We do not ever construct explicit phi nodes but rather iterate over basic blocks, adding the necessary get and
set instructions to each block. To handle variables that are undefined at certain basic blocks we explicitly set all
variables to undef at the beginning of the function. The trickiest part of this implementation was figuring out
how to handle function arguments correctly. We first tried to avoid renaming function arguments at all but found
this prevented us from fully reaching SSA form if the function argument variable name was reused as a dest in the
function body. Instead we copy the function argument into a renamed version at the beginning of the function.
However we found a nasty bug with this approach that caused our SSA conversion of the core/orders.bril benchmark to
enter an infinite loop, because the function argument variable names were being written to, however control flow
would pass back to the entry block and we would incorrectly copy the original function arguments back into the
variable. To fix this, we needed to add a dummy entry block for the function argument copies. To test our conversion
to SSA, we use Turnt to convert to SSA and then evaluate correctness of execution and whether the converted programs
are actually in SSA. We ran these checks on both handpicked test cases based on bugs we observed and all of the Bril
benchmarks, all of which pass.

From here, the out-of-SSA conversion was straightforward: we simply deleted all 'get' instructions and replaced all 'set x y' instructions with the instruction 'x: type = id y'. Before such deletions and substitutions we made, we first made sure to iterate through the SSA program to obtain a dictionary mapping dest (shadow) variable names to their type for those variables in the 'get' instructions. We tested our out-of-SSA implementation by performing SSA roundtrip tests. We first tested on the examples located in the bril/examples/ssa_roundtrip directory to verify that our outputs match with the outputs of the reference implementation. Generalizing, we then performed roundtrip tests on all core benchmarks in bril using brench, counting dynamic instructions to measure the overhead and making sure all these tests passed as well.

The scatter plot below illustrates the overhead (in terms of % increase in instruction count) after taking the program on a round-trip through SSA & back. The mean overhead from doing a crude roundtrip (no optimizations) is 413.91%, whereas if we perform TDCE (using our L3 code) after converting back to SSA, the mean overhead is reduced to 121.23%, with 8 of the 50 benchmarks having 0 overhead after the round-trip! We suspect that if we perform more optimizations (e.g. constant propagation), we could reduce the SSA round-trip overhead even more.

1 reply

sampsyo Mar 14, 2025
Maintainer Author

Really nice work! And your plot is also aesthetically pleasing on top of it all. :)

Instead we copy the function argument into a renamed version at the beginning of the function.

This seems like a perfectly reasonable approach! FWIW, another would be to set things up so the arguments act like assignments, in the sense that subsequent assignments to the same variables must get renamed.

mb64 · 2025-03-07T04:03:56Z

mb64
Mar 7, 2025

To SSA. (code) For converting to SSA, I implemented the dominance frontier method (Cytron et al). Here were the challenges I noted:

I talked in last week's implementation post about how reusable my dataflow framework was; that was wrong, and this week, I ended up replacing it with something else. (I needed live vars, and backwards analyses didn't fit into my framework well.)
There are some edge cases about strict dominance vs dominance in the definition of dominance frontier that I got wrong last week, and had to fix for my dominance frontier code.
If parameters are modified, and there's a jump back to the top of the function, you need an empty header basic block before that one, to put the set half of the phi's for the parameters. To deal with this, I conservatively did renamings + copying at the start of every function, to enforce that the parameters are disjoint from the variables assigned in the body of the function.

From SSA. (code) I had hoped to do a more interesting out-of-SSA transformation, with some coalescing, but I didn't have time so I did the basic one.

Testing and efficiency. I tested (on all core benchmarks) that to-SSA preserves behavior and produces SSA, and that from-SSA after to-SSA preserves behavior. It did introduce extra copies, and I measured the overhead at 1.37144x more (dynamic) instructions on average, and 0.74x geomean speedup. Then I implemented copy propagation on SSA (code), which brought it down to 1.05304x more (dynamic) instructions on average, and 1.01x geomean speedup. I think that merits a star.

1 reply

sampsyo Mar 14, 2025
Maintainer Author

Cool cool; this all sounds great! And yeah, it sounds like several people were in the same boat w/r/t finding slight inconsistencies in the definitions of dominance-related stuff that led to bugs uncovered in last week's work.

Really cool result with the post-SSA copy-propagation pass. It makes sense that this could eliminate much of the overhead.

ethanuppal · 2025-03-07T04:58:55Z

ethanuppal
Mar 7, 2025

Code: https://github.com/ethanuppal/cs6120/tree/main/lesson6/ssa
CI Passing: https://github.com/ethanuppal/cs6120/actions/runs/13746792269/job/38442710333

Relevant CI code so you can see what I tested:

      - name: Snapshot test SSA
        run: |
          cd lesson6/ssa/bril_to_ssa_copied
          turnt *.bril --diff
          turnt *.bril --diff
          cargo build --package tdce --bin tdce
          cargo build --package ssa --bin ssa
          brench brench.toml *.bril | python3 check_brench.py --allow-slower

      - name: Test SSA
        run: |
          cargo build --package tdce --bin tdce
          cargo build --package ssa --bin ssa
          cd lesson6/ssa/bril_to_ssa_copied
          brench brench.toml ../../../bril/benchmarks/core/*.bril | python3 check_brench.py --allow-slower

Essentially, I manually checked $\varphi$ insertion and correctness on the to_ssa example cases (I turnt twice to make sure the output ordering is stable), then I brenchd every benchmark:

extract = 'total_dyn_inst: (\d+)'
timeout = 200

[runs.baseline]
pipeline = ["bril2json", "brili -p {args}"]

[runs.into_ssa]
pipeline = [
  "bril2json",
  "../../../target/debug/ssa --into-ssa",
  "bril2json",
  "brili -p {args}",
]

[runs.through_ssa]
pipeline = [
  "bril2json",
  "../../../target/debug/ssa --into-ssa",
  "bril2json",
  "../../../target/debug/ssa --from-ssa",
  "bril2json",
  "brili -p {args}",
]

[runs.ssa_then_tdce]
pipeline = [
  "bril2json",
  "../../../target/debug/ssa --into-ssa",
  "bril2json",
  "../../../target/debug/tdce",
  "bril2json",
  "brili -p {args}",
]

As you can see, I tested into SSA, out of SSA, and SSA then TDCE.

I was very rushed on this because I was delayed by external health matters, and thus it does not represent the quality I'd like to have in my work.

I implemented the new get/set method in Rust, reusing the dominance tree/frontier code I wrote for lesson 5. I spent some time trying to figure out how to adapt the pseudocode presented in lecture to get/set. I ended up solving the biggest issue (undefined variables) with a dumb postprocessing step that defined them all in the entry block. I wish the Rust representation allowed optional types because keeping types around for everything was pretty annoying. To leave SSA, I made sets identities and removed gets.

Total time spent: 6 hours

I believe this merits a Michelin star because I implemented and tested the assigned tasks.

If I could keep going past 11:59 PM, I would use an insertion-set like approach in modifying basic block bodies to strip a factor of $O(n)$ of most such modifications and spend more time on creating a minimal output.

Edit: I decided to do it:

pub fn insert_phis(
    cfg: &mut FunctionCfg,
    phi_insertion_points: PhiInsertionPoints,
) {
    let mut phis_to_insert = SecondaryMap::new();
    for (variable, (ty, places_to_insert)) in phi_insertion_points.0 {
        for place_to_insert in places_to_insert {
            phis_to_insert
                .entry(place_to_insert)
                .unwrap()
                .or_insert_with(Vec::default)
                .push(Instruction::Value {
                    args: vec![],
                    dest: variable.clone(),
                    funcs: vec![],
                    labels: vec![],
                    op: ValueOps::Get,
                    pos: None,
                    op_type: ty.clone(),
                });
        }
    }
    for (block_idx, phis) in phis_to_insert {
        cfg.vertices[block_idx].instructions.splice(0..0, phis);
    }
}

Edit: I decided to rewrite the entire thing and now I can say I am proud of it.

Updated time: 12 hours.

3 replies

ethanuppal Mar 13, 2025

Thinking about this more, there are some other ways the Rust representation could be improved. The strings should definitely be Cows because you don't need to clone when parsing the JSON (Bril's text format does not support escape codes) and the fact that each thing requires an empty Vec is just an unnecessary allocation.

sampsyo Mar 14, 2025
Maintainer Author

Really nice work!! Thanks for the clear explanation.

I ended up solving the biggest issue (undefined variables) with a dumb postprocessing step that defined them all in the entry block.

I think this is probably the most sensible option, FWIW. I'm sure there are other ways to more lazily insert undefs only along the necessary paths, but it seems maybe subtle to get right.

And yes, it would be interesting to think about what a more efficient Rust implementation of a Bril program would look like.

ethanuppal Mar 15, 2025

New CI Passing: https://github.com/ethanuppal/cs6120/actions/runs/13875959147/job/38828273528

I don't know why I only tested SSA on core benchmarks --- I changed the core to ** in my CI script. Must have been a typo.
Fixed LVN to support get and to not try to local-value-rewrite (it could be easily extended to do this, though, but I chose to just not do it).
Fixed the swap problem in my out-of-SSA conversion (I explicitly allocate variables from the shadow environment)

I also added LVN to the optimized-within-SSA brench pipeline and then I started seeing performance improvements -- just running TDCE didn't yield any.

KabirSamsi · 2025-03-09T04:36:59Z

KabirSamsi
Mar 9, 2025

Code
Partner: @noschiff

Overview

For our implementation, we opted to proceed with the older version of SSA with $\phi$-nodes, but decided to stick with the better version utilizing dominance relations and the dominance frontier. We split up our tasks into efforts for implementing both enterSSA and leaveSSA functions, tests for support, and also took advantage of the situation to further tidy up the environment of helper functionality which we have been building up with each incremental assignment.

Implementation

Reorganization of Setup

This push saw a significant refactor in our setup, which greatly facilitated our work and hopefully will continue to for future projectshere.

The past two or three assignments have been building up off of the same CFG/Basic Block types and code, so we finally refactored these out. We ultimately ended up setting up an entire Graph class, which we then equipped with operations allowing us to work with CFGs, Dominator Trees and Dominance Frontiers. Subsequently, we refactored much of the basic block code, and then reworked our actual setup for this project,

Doing this, along with implementing intoSSA, also revealed some old bugs in our basic block code ... so a circular, but interesting way to realize that fix!

Into SSA

We adapted the two algorithms that had been mentioned in the lesson, and implemented a series of helper functions for the various used functionalities that had to be reconciled to work with our current setup. In the end, implementation was not difficult, but the two hardest parts were firstly being able to parse the high-level pseudocode into an implementation that worked with our setup; and then dealing with a handful of different edge cases. At a number of times, I was not totally convinced that our setup was even correct (several times I was right in this), but eventually we were able to just take the most straightforward approach with our code.

An interesting bug at first was dealing with function arguments, rather than just variable name declarations. Another one we ran into later on was that we were being a bit reckless with 'old variable names' and 'new variable names' – both of these were resolved fairly straightforwardly with time.

Leaving SSA

We adapted the general fromSSA algorithm here to develop functionality to transform a Bril program in SSA form to standard form. This was fairly straightforward – though we were able to make use of some higher-order functions to clean up our code a bit.

Testing

We have primarily run tests against the benchmarks outlined in the to_ssa and from_ssa directories. As we incrementally developed the process (especially for to_ssa), we first benchmarked our phi-node-listing algorithm implementation against the same files, exposing a number of issues, before we were able to come to a satisfactory approach.

We are currently still in the process of fine-tuning certain aspects of our implementation against all benchmarks with Brench, with our final goal being to conclusively say that we cover all edge cases.

Takeaways

This was a terrific exercise, both in developing our own mental understanding of the SSA philosophy/model, and also in thinking of a new paradigm to approach compiler translations. While we didn't fully implement the alternative get/set node SSA translation system, we had a great time discussing it and theorizing it, and it will be fun to explore more on SSA in the future!

1 reply

sampsyo Mar 14, 2025
Maintainer Author

Awesome! It is cool (although not altogether unexpected) that this task would force you to reconsider some of the program-representation basics about how you formed CFGs and such.

gerardogtn · 2025-03-10T19:11:59Z

gerardogtn
Mar 10, 2025

Code
Partner: @devv64 (i convinced dev to try out some kotlin!)

Summary

We implemented to ssa and from ssa using phi nodes. Our to ssa implementation used the dominance frontier version of the algorithm, and the to ssa used the basic algorithm that we saw in class. We tested our implementation with all the benchmark files in the bril repository and they were all passing.

How it works?

To SSA

To implement to ssa we used the dominance frontier algorithm that we saw in class it works in the following way:

Identify blocks that will need phi nodes inserted into them, and for which variables they will need a phi node.
Perform the dominance based frontier algorithm to add phi nodes.
- We had to use a few of bookkeeping to keep track of variables in the algorithm:
  - phiLocs: For each phi, variable know the position in the block where the instruction is located.
  - varCount: For each variable a count of how many times we've seen the variable locally, for creation of the new variable names.
  - blockToLabel/labelToBlock: As the implementation of cfg used nats and not labels, utility maps to translate to labels. We could've made our lives a bit simpler if we just renamed blocks to new names, but we aimed to keep the original names as much as possible.

From SSA

Our from ssa implementation utilized the basic algorithm. This means finds phi operations, propagating the assignments to the ends of the predecessor blocks and deleting the phi operations. There were some complications with concurrent modifications and placing the id instructions in the proper spots, but it was relatively straightforward compares to the to ssa algorithm.

Hardest part

Debugging was definitely the hardest part of the assignment to make sure that everything was working okay after the to ssa transformation. Here are a few of the categories where we had bugs:

Wrong input for the toSsa algorithm: A couple of times we passed the parameters to the toSsa function in the incorrect order (like swapping immediate dominance and the dominance frontier which yielded weird outputs and the compiler didn't catch the issue as parameters had the same type), another bug was related to the dominance frontier using strictly dominates instead of dominates relation which lead to missing info for the dominance frontier.
Not using function arguments in the stack: In a few files like ackermann.bril we had to make sure that phi nodes would default to the name of a function argument and not to undefined.
Cfg: In files like orders.bril we need to modify the cfg code to insert an entry node if there were back edges to the original, in other places we had to change the cfg to make sure that a return statement would not add an edge to the next consecutive block, or the predecessors not including itself as a node (as in mem/adler32.bril).
Inserting instructions at the right place: For both phi nodes and id nodes (in from ssa) make sure that nodes were inserted at the right location.

⭐❓

We had a lot of fun working on this task, we think we deserve a start due to all the effort paid and for arriving at a solution that works across most benchmarks.

1 reply

sampsyo Mar 14, 2025
Maintainer Author

Awesome; all looking good!

another bug was related to the dominance frontier using strictly dominates instead of dominates relation

Maybe surprisingly, you were not the only group who ran into exactly this issue! Seems interesting that this is such an easy inversion to do.

aw578 · 2025-03-11T02:22:39Z

aw578
Mar 11, 2025

code

For this assignment, I wrote naive and dominance-tree based implementations of to_ssa using the set and get functions.

To test it, I also wrote a shell script to check that the to_ssa implementations were in SSA form and that the outputs after converting to SSA and back to normal all match the original output. I tested against the core benchmarks, as usual. I figured that the benchmarks caught enough bugs in my implementations to be reliable here.

The hardest part of this assignment was probably just working through all the edge cases and the logic around them. The big things here were handling function arguments and phi nodes appearing before variables' first definitions. I handled function arguments by adding fake init statements at the beginning of the first block, then restoring them to point at the original function arguments after my passes. One subtle error was that were a bunch of other sets and phi functions in block headers, which needed to go after the init statements. Phi nodes can also appear before variables' first definitions due to how dominance frontiers work. I wasn't really sure how to solve this until I saw the idea of just initializing all the shadow variables to undefined here. This is incredibly janky and I still don't feel great about it but it seems to work. Reading through my outputs and drawing out the graphs to debug why they were incorrect was very helpful for debugging, even if it took forever.

I think my work deserves a Michelin star for actually getting to_ssa correct.

1 reply

sampsyo Mar 14, 2025
Maintainer Author

Looks good; nice work overall.

I handled function arguments by adding fake init statements at the beginning of the first block, then restoring them to point at the original function arguments after my passes.

This is a creative way to do it! This way, you do not need to treat arguments as "fake" definitions; you have actual definition instructions for every variable. Interesting!

ananyagoenka · 2025-03-11T20:31:10Z

ananyagoenka
Mar 11, 2025

Source Code

My SSA implementation converted Bril programs into SSA form using the get/set approach discussed in class, then converted them back into regular form by removing SSA instructions and normalizing variable names. Specifically, during the forward transformation into SSA, my algorithm renamed each variable uniquely within basic blocks, added get instructions at block entries for variables that were live-in, and placed corresponding set instructions at block exits for successor blocks. The reverse transformation removed these SSA-specific instructions (get, set, and undef) and simplified variable names by removing their SSA suffixes.

One of the trickiest pieces of the SSA transformation was dealing with function parameters. At first, I neglected to treat arguments as if they were already “defined” at the function’s beginning, which caused spurious undef instructions for parameters. Once I explicitly pushed each parameter onto its variable stack at the start, that issue went away.

To ensure correctness, I ran my resulting code through the is_ssa.py checker, which confirmed no variable was redefined, meaning the SSA form was valid. Additionally, I verified behavioral correctness by performing a complete round-trip transformation—original program → to SSA → from SSA—and comparing outputs to the original executions. I also measured the dynamic instruction counts for these round-trips, observing about a 210% overhead, which aligns with expectations given the naive insertion of SSA bookkeeping instructions.

1 reply

sampsyo Mar 14, 2025
Maintainer Author

Sounds good overall!

The reverse transformation removed these SSA-specific instructions (get, set, and undef) and simplified variable names by removing their SSA suffixes.

That's interesting—it occurs to me to mention that a couple of these transformations aren't actually necessary for "exiting" SSA. Namely, you don't need to remove undef, and it's OK to leave the mangled variable names. In fact, I would be sorta surprised if it worked to simply remove the undef instructions—did this cause any crashes when trying to run the round-tripped programs?

smd21 · 2025-03-12T05:48:20Z

smd21
Mar 12, 2025

code
This week, I decided to bite the bullet and reimplement things in Rust. I completely redid my old typing system and reworked my algorithms to use less memory. Unfortunately, this took way longer than I expected (partly due to my poor understanding of Rust). I honestly didn’t realize how much code I would need to port, so I ended up starting SSA a day or two later than I wanted.

For my SSA implementation, I decided to use the phi/upsilon instructions and the dominance frontier algorithm from class. This took a while, as I had to completely rewrite some of the dominance algorithms to handle the data structures I was using in Rust. For handling undefs, I used @UnsignedByte 's method of collecting all the variables that are undefined along certain paths and adding a set instruction for each to a "pre-entry" block. I collected undefined variables through dataflow analysis. I used symmetric difference for my merge function and out_b = in_b + defs_b for my transfer function. I'm still figuring out function arguments for ssa_into. Renaming everything also took me quite a while since I kept having to update my Block and context types to easily access the data I needed. Since I used the phi/upsilon variation, implementing ssa_out was pretty trivial. I just deleted all my set instructions and turned my get instructions into ids.

Testing
Since I functionally rewrote almost everything, I had to both test my SSA implementation and all of my old code. I tested on some small cases that I could manually verify, but haven't been able to run on the benchmarks and collect data yet due to an exam today. I probably need to do more work testing edge cases, but I was able to make sure that everything I implemented is working (mostly?) correctly.

Michelin Star
Before starting this task, I really wasn't that comfortable with Rust and wanted to get better. Switching everything to Rust definitely achieved this goal. It was really challenging at first, but I learned a lot and am pretty proud of actually sticking with moving to Rust (even though it was a lot of work). For that I think this deserves a Michelin star. I know I'm still not done, but I'm pretty proud of the work I've put in and how much I accomplished the past few days.

1 reply

sampsyo Mar 14, 2025
Maintainer Author

Wow, that is quite ambitious!! I hope you learned a lot of Rust in the process, even if it was sorta slow and painful. I think you should indeed be proud of the overhaul you've done here!

InnovativeInventor · 2025-03-13T03:55:30Z

InnovativeInventor
Mar 13, 2025

I implemented a slight improvement on the naive SSA algorithm presented in class. ¹ I used the set/get instructions in Bril to represent SSA. Along the way, I discoverd and fixed a bug in the Bril (TypeScript) reference interpreter (PR #412). This bug also appears to be present in the Rust reference interpreter (which seems to be fixed by others in PR #413).

Testing

I have tested converting in and out of SSA over the entire benchmark suite of Bril. It appears to correctly produce the same output over the entire benchmark suite/inputs.

Performance

Below are two plots measuring baseline (baseline) compared to the overhead of converting to SSA (to_ssa), converting back from SSA (from_ssa), and converting to and from followed by LVN + simple DCE (to_from_ssa_opt).

The first plot is a violin plot, showing the distribution of the overhead across the benchmarks, broken down by passes performed and normalized to the baseline (no passes run):

The second plot is a more detailed bar plot showing the specific (normalized) overhead produced by each benchmark:

The Jupyter notebook to produce these plots is publicly available. ²

1 reply

sampsyo Mar 14, 2025
Maintainer Author

Neato; all looking good!

You mentioned this:

I implemented a slight improvement on the naive SSA algorithm presented in class.

Care to say anything more about what that "slight improvement" was?

tean-lai · 2025-03-16T03:59:53Z

tean-lai
Mar 16, 2025

Code
I had a rough time debugging with this one, I implemented the Pizlo-form SSA with set/get as shown in class. The worst part was definitely finding the different kinds of edge cases there could be, like dealing with the original function arguments. Also took quite a while to realize I was throwing all my gets before the label at the start of a basic block, or after a branch at the end of a basic block. I handled the undef situation pretty poorly, I set an undef for every get instruction, unless it was defined in the function argument. This definitely incurs a lot of necessary overhead, but at a certian point my programming biased way more on correctness than efficiency.

The overall functionality of to_ssa is: (fresh names were "var" -> "var.3" if there were 3 instances of var already)

First pass: find all variable definitions and blocks defined from
Second pass: based on variable definitions, find out where phi_nodes need to be placed based on dominance frontier, and give it a fresh name.
Third pass: rename variables by dfs on dominance tree. when processing a block, I first prepend the block with all the necessary get instructions, based on the phi_nodes computed before. after processing every instruction, I append set instructions for every relevant definition in it's dominance frontier. The stack is initialized with function arguments mapped to themselves.
Add undefs for every variable defined, unless it came in the function argument. Then I set every phi-node to these initial vars. Definitely not optimal. This part also set the function arguments to all the phi-nodes that share the same name. This helped with the case where phi nodes needed to be placed in the very beginning, but shared a name with a function argument.

from_ssa:
Thankfully, so much easier: delete sets, turn gets to ids. So all the pain was on debugging to_ssa.

Testing

I tested my implementation by running it through all the core benchmarks and seeing the outputs were the same. It's the same except for just two benchmarks: primes-between and totient, and I cannot figure out why. Everything else has the same output, so I'm fairly happy with the results. I might revisit this in the future for the two cases, but I'm going to rest with this for now. Also handwrote some direct tests for this one. Ended up adding a ton of assert statements throughout, and more assert statements for previous implementations like for dominance trees and stuff; I was starting to get really paranoid.

Retrospective

Don't think I deserve a star on this one, it came quite late. But I definitely learned a lot. Ended up spending >10 hours, but I think I could look back on this experience fondly. Definitely became more interested in different kinds of SSA construction now though.

1 reply

sampsyo Apr 29, 2025
Maintainer Author

Cool; sounds good! It's too bad that the correctness didn't quite come together for just a couple of benchmarks… it would be interesting to try to distill those ones into unit tests someday. :)

I handled the undef situation pretty poorly, I set an undef for every get instruction, unless it was defined in the function argument.

That's true that this is a lot of undefs; FWIW, a simple way to reduce the number would be to do it once per variable at the very top of the function (instead of once per get).

calciiium · 2025-05-09T04:19:38Z

calciiium
May 9, 2025

code
I implemented both into SSA and out of SSA functionalities, in which the into SSA uses the dominance frontiers based version. I checked using the python script is_ssa.py and all the outputs are in valid SSA forms. I also did testing using core benchmarks by converting the bril programs to SSA and back to non-SSA form.

I encoutered two major difficulties. The first is that I found out some complicated benchmark tests failed and the problem eventually traced back to l5's dominator/dominance tree, implying l5 has some serious hidden bugs. The second one is that I'm not sure if it's caused by an incorrect setup, but my interpreter brili doesn't seem to accept 'undef' as an value (or that's intentionally designed?) To get around with this, I substitute all undef int type variable to const 0, and undef bool type variable to const true. In this way, the implementation can successfully passed is_ssa tests, and a partial part of benchmark tests.

The overhead for core benchmarks is huge (2x+). I think it can probably be improved by running other optimization, such as LCV and DCE, as some get and set are not used during the execution.

1 reply

sampsyo May 18, 2025
Maintainer Author

my interpreter brili doesn't seem to accept 'undef' as an value (or that's intentionally designed?)

No, that's not intentional. You can see the support for undef in the source code:
https://github.com/sampsyo/bril/blob/175580e85cb6e4f78e82cdc8c26bfacfb0ef6c22/brili.ts#L757-L760

It would be helpful to know exactly what error message you got.

The overhead for core benchmarks is huge (2x+)

By this, do you mean the change in dynamic instruction count after doing an SSA "round trip"? Details here would be helpful.

Jonahcb · 2025-05-17T21:34:46Z

Jonahcb
May 17, 2025

Code

I implemented functions for converting in and out of SSA using the naive algorithm and the 'sets' and 'gets'. Initially, I started by implementing it with phis but I felt I know the concept of phis well enough from LLVM so I wanted to learn 'sets' and 'gets'. I prefer phis more as it makes more sense to my brain.

I encountered many issues with the entry block, so I added initial undef and set instructions into it. But this didn't work because I kept inserting it after a terminator. I tried to insert it before the terminator but it wasn't working so I just wholly placed all these instructions before the entry block and this solve the issue.

I also had (and still have) issues with arguments to the functions.

For testing, I manually tested on some handcrafted programs because I was short on time, but it has a few bugs with arguments to functions but other than that it works well.

1 reply

sampsyo May 18, 2025
Maintainer Author

Looks good! Your approach to inserting undefs before the entry block makes sense.

Lesson 6: Static Single Assignment #454

Uh oh!

sampsyo Jan 21, 2025 Maintainer

Replies: 19 comments · 21 replies

Uh oh!

Summary

Uh oh!

sampsyo Mar 5, 2025 Maintainer Author

Uh oh!

Uh oh!

sampsyo Mar 14, 2025 Maintainer Author

Uh oh!

Uh oh!

Uh oh!

sampsyo Mar 14, 2025 Maintainer Author

Uh oh!

Uh oh!

sampsyo Mar 14, 2025 Maintainer Author

Uh oh!

Uh oh!

Code

Summary + How It Works

Testing

Performance

Hardest Part

Star

Uh oh!

sampsyo Mar 14, 2025 Maintainer Author

Uh oh!

Uh oh!

Into SSA

From SSA

Test & Perf

Conclusion

Uh oh!

sampsyo Mar 14, 2025 Maintainer Author

Uh oh!

Uh oh!

sampsyo Mar 14, 2025 Maintainer Author

Uh oh!

Uh oh!

sampsyo Mar 14, 2025 Maintainer Author

Uh oh!

Uh oh!

sampsyo Mar 14, 2025 Maintainer Author

Uh oh!

Uh oh!

Uh oh!

Uh oh!

sampsyo Mar 14, 2025 Maintainer Author

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Overview

Implementation

Testing

Takeaways

Uh oh!

sampsyo Mar 14, 2025 Maintainer Author

sampsyo
Jan 21, 2025
Maintainer

Replies: 19 comments 21 replies

sampsyo Mar 5, 2025
Maintainer Author

sampsyo Mar 14, 2025
Maintainer Author

sampsyo Mar 14, 2025
Maintainer Author

sampsyo Mar 14, 2025
Maintainer Author

sampsyo Mar 14, 2025
Maintainer Author

sampsyo Mar 14, 2025
Maintainer Author

sampsyo Mar 14, 2025
Maintainer Author

sampsyo Mar 14, 2025
Maintainer Author

sampsyo Mar 14, 2025
Maintainer Author

sampsyo Mar 14, 2025
Maintainer Author

sampsyo Mar 14, 2025
Maintainer Author