Replies: 19 comments 21 replies
-
Into SSAI decided to use the new In my renaming, I modified it slightly by first adding When handling There was also one more nuance - if a function argument needs a phi node in the entry block (assuming the entry block is named) this leads to a problem where we generate a From SSABecause I used the get/set method, TestingI tested my implementations by checking the outputs of the program execution against the unoptimized execution, to make sure that changing to SSA did not affect behavior. I also tested using brench against all the benchmarks in I also implemented a quick SummaryAs you warned, implementing into and out of SSA was more complicated than I expected (mainly for into SSA). I spent most of my time fixing bugs and dealing with edge cases like phi nodes in entry blocks, and undefined values. Overall, I would say that I deserve a star as I made sure the SSA tests worked on the full suite of benchmarks, which ended up finding a lot of edge cases I would not have thought of myself. |
Beta Was this translation helpful? Give feedback.
-
Beta Was this translation helpful? Give feedback.
-
In collaboration with @bryantpark04 and @dhan0779 To SSA: We implemented the dominance-frontier-based Most of the issues we ran into with From SSA: Converting out of SSA was very straightforward; we first perform a pass to get the type of every variable, then we replace every Testing: We tested Analysis: We tested the performance of our implementation on every benchmark in the bril repository using brench. Our SSA pipeline ended with the transformation into SSA form. To count static instructions, we piped the output of
We found that the SSA form had a 10% static overhead and a 130% dynamic overhead over the baseline. The full round-trip including optimizations made while in SSA form resulted in a 20% static overhead and a 43% dynamic overhead over the baseline. Conclusions: Deciding how to modify |
Beta Was this translation helpful? Give feedback.
-
Into SSA This was tricky and took me a while to get right. I had a misguided first attempt to do this conversion in multiple, independent passes (one for adding My funniest bug was that I accidentally created instructions like this:
instead of
Because
when I tried to write my output to stdout. I had a bunch of other bugs too, especially related to unreachable cfg nodes, but none of them were very funny. I also decided to do the naive thing and just undef every variable at the beginning of the entry block, with the reasoning that a "real" compiler would be doing a bunch of dead code elimination passes on the resulting SSA program anyway. Out of SSA In upsilon/phi style, this is pretty straightforward! I went back and forth on whether "non-ssa" form meant all Testing I tested my correctness by using the is_ssa script and by comparing the original bril program, the ssa bril program, and the de-ssa-ed bril program in turnt with a shared output file. I ran all of my tests on every bril core benchmark and a few handcrafted test cases. On that set of programs my ssa round trip took me from 1734 total static instructions to 4271 total static instructions, a ~2.5X increase. The round trip also took me from 8555030 total dynamic instructions to 22561140 total dynamic instructions, a ~2.6X increase. That is a big increase, but it is heartening that the dynamic increase was not substantially more than the static increase, which suggests I didn't do anything pathological. |
Beta Was this translation helpful? Give feedback.
-
CodeThis directory contains my SSA implementation, modules for basic block and CFG construction, and also a little statistics collection script. This is a correctness script I wrote to make sure my into-SSA actually produced SSA code and to make sure the out-of-SSA has the same behavior as the original. Summary + How It WorksI implemented the naive implementation of SSA this week since I was a little short on time. As motivated by the in-class description, I implemented a map from each Now, here's how I implemented There were a couple more corner cases to accurately get the program into SSA. For one, I had to make sure to shadow-set the arguments to the function. Further, I realized that, since every basic block had Then, transforming out of SSA was really simple. I just deleted all TestingFor testing, I wrote a script to do the following for every program in
I had to skip two designs. For some reason PerformanceHere are my statistics in terms of percentage increase in dynamic instruction count:
So, on average, my naive SSA made the dynamic instruction count 600% or 7x worse. This is expected: for every basic block, I had on the order of Hardest PartThe hardest part was debugging. More specifically, for every variable StarI'd say my work deserves a star! |
Beta Was this translation helpful? Give feedback.
-
Beta Was this translation helpful? Give feedback.
-
My implementation of SSA is here: https://github.com/mt-xing/cs6120/tree/main/l6 I implemented SSA using the dominance frontier algorithm seen in class. Coming out of SSA was trivial as I just replaced all the upsilon nodes with The hardest part by far was all the edge cases I ran into. For example, my dominance frontier implementation was strictly looking at successors of dominated nodes that were not dominated by the original node, without the edge case of allowing itself to be in the frontier. This caused some infinite loops on test cases that took a while to track down. Likewise, handling arguments to the function was a bit annoying as I needed to pretend that these were effectively set in a previous block and add handling for that. My implementation does conservatively set everything to undef at the start, and also throw everything into the shadow store. These can be easily removed via Dead Code Elimination in a later pass if they're not needed, and they do avoid a few other edge cases. My implementation does not support speculative execution. The control flow complexities that come from I tested my implementation against the entirety of the bril benchmarks and tests folders, as I have with all my previous assignments. The engineering work I put into the earlier test harness is really paying off here, as it helped me catch all of the bugs I mentioned above. My harness automatically runs my code in the interpreter and compares it to running the original through the interpreter. It also collects statistics as it goes. For these runs, I disabled the checks that require my "optimization" result in fewer instructions executed, since we expect SSA to increase the instruction count. Here are the statistics reported for dynamic instruction count changes, running against the entire bril benchmarks and tests, as well as a small number of my hand-crafted test cases:
Note the numbers are negative in terms of reduction (ie: my SSA pass added instructions, as expected). Looks like my implementation added 233 dynamic instructions as its median, although there are definitely some outliers that were very bad. This is my SSA pass alone, without running the dead code elimination that would clean up a lot of unnecessary undefs and sets at the beginning of each function. I do believe this is worthy of a Michelin star, as it uses the dominance frontier based algorithm, and is thoroughly tested against all of the core Bril, as well as every extension minus speculative execution. |
Beta Was this translation helpful? Give feedback.
-
Group (@ngernest, @katherinewu312, @samuelbreckenridge) For our conversion into SSA we use the basic approach of introducing a unique copy of each variable for every basic From here, the out-of-SSA conversion was straightforward: we simply deleted all 'get' instructions and replaced all 'set x y' instructions with the instruction 'x: type = id y'. Before such deletions and substitutions we made, we first made sure to iterate through the SSA program to obtain a dictionary mapping dest (shadow) variable names to their type for those variables in the 'get' instructions. We tested our out-of-SSA implementation by performing SSA roundtrip tests. We first tested on the examples located in the The scatter plot below illustrates the overhead (in terms of % increase in instruction count) after taking the program on a round-trip through SSA & back. The mean overhead from doing a crude roundtrip (no optimizations) is 413.91%, whereas if we perform TDCE (using our L3 code) after converting back to SSA, the mean overhead is reduced to 121.23%, with 8 of the 50 benchmarks having 0 overhead after the round-trip! We suspect that if we perform more optimizations (e.g. constant propagation), we could reduce the SSA round-trip overhead even more. |
Beta Was this translation helpful? Give feedback.
-
To SSA. (code) For converting to SSA, I implemented the dominance frontier method (Cytron et al). Here were the challenges I noted:
From SSA. (code) I had hoped to do a more interesting out-of-SSA transformation, with some coalescing, but I didn't have time so I did the basic one. Testing and efficiency. I tested (on all core benchmarks) that to-SSA preserves behavior and produces SSA, and that from-SSA after to-SSA preserves behavior. It did introduce extra copies, and I measured the overhead at 1.37144x more (dynamic) instructions on average, and 0.74x geomean speedup. Then I implemented copy propagation on SSA (code), which brought it down to 1.05304x more (dynamic) instructions on average, and 1.01x geomean speedup. I think that merits a star. |
Beta Was this translation helpful? Give feedback.
-
Code: https://github.com/ethanuppal/cs6120/tree/main/lesson6/ssa Relevant CI code so you can see what I tested: - name: Snapshot test SSA
run: |
cd lesson6/ssa/bril_to_ssa_copied
turnt *.bril --diff
turnt *.bril --diff
cargo build --package tdce --bin tdce
cargo build --package ssa --bin ssa
brench brench.toml *.bril | python3 check_brench.py --allow-slower
- name: Test SSA
run: |
cargo build --package tdce --bin tdce
cargo build --package ssa --bin ssa
cd lesson6/ssa/bril_to_ssa_copied
brench brench.toml ../../../bril/benchmarks/core/*.bril | python3 check_brench.py --allow-slower Essentially, I manually checked extract = 'total_dyn_inst: (\d+)'
timeout = 200
[runs.baseline]
pipeline = ["bril2json", "brili -p {args}"]
[runs.into_ssa]
pipeline = [
"bril2json",
"../../../target/debug/ssa --into-ssa",
"bril2json",
"brili -p {args}",
]
[runs.through_ssa]
pipeline = [
"bril2json",
"../../../target/debug/ssa --into-ssa",
"bril2json",
"../../../target/debug/ssa --from-ssa",
"bril2json",
"brili -p {args}",
]
[runs.ssa_then_tdce]
pipeline = [
"bril2json",
"../../../target/debug/ssa --into-ssa",
"bril2json",
"../../../target/debug/tdce",
"bril2json",
"brili -p {args}",
] As you can see, I tested into SSA, out of SSA, and SSA then TDCE. I was very rushed on this because I was delayed by external health matters, and thus it does not represent the quality I'd like to have in my work. I implemented the new Total time spent: 6 hours I believe this merits a Michelin star because I implemented and tested the assigned tasks. If I could keep going past 11:59 PM, I would use an insertion-set like approach in modifying basic block bodies to strip a factor of Edit: I decided to do it: pub fn insert_phis(
cfg: &mut FunctionCfg,
phi_insertion_points: PhiInsertionPoints,
) {
let mut phis_to_insert = SecondaryMap::new();
for (variable, (ty, places_to_insert)) in phi_insertion_points.0 {
for place_to_insert in places_to_insert {
phis_to_insert
.entry(place_to_insert)
.unwrap()
.or_insert_with(Vec::default)
.push(Instruction::Value {
args: vec![],
dest: variable.clone(),
funcs: vec![],
labels: vec![],
op: ValueOps::Get,
pos: None,
op_type: ty.clone(),
});
}
}
for (block_idx, phis) in phis_to_insert {
cfg.vertices[block_idx].instructions.splice(0..0, phis);
}
} Edit: I decided to rewrite the entire thing and now I can say I am proud of it. Updated time: 12 hours. |
Beta Was this translation helpful? Give feedback.
-
OverviewFor our implementation, we opted to proceed with the older version of SSA with ImplementationReorganization of Setup This push saw a significant refactor in our setup, which greatly facilitated our work and hopefully will continue to for future projectshere. The past two or three assignments have been building up off of the same CFG/Basic Block types and code, so we finally refactored these out. We ultimately ended up setting up an entire Doing this, along with implementing intoSSA, also revealed some old bugs in our basic block code ... so a circular, but interesting way to realize that fix! Into SSA We adapted the two algorithms that had been mentioned in the lesson, and implemented a series of helper functions for the various used functionalities that had to be reconciled to work with our current setup. In the end, implementation was not difficult, but the two hardest parts were firstly being able to parse the high-level pseudocode into an implementation that worked with our setup; and then dealing with a handful of different edge cases. At a number of times, I was not totally convinced that our setup was even correct (several times I was right in this), but eventually we were able to just take the most straightforward approach with our code. An interesting bug at first was dealing with function arguments, rather than just variable name declarations. Another one we ran into later on was that we were being a bit reckless with 'old variable names' and 'new variable names' – both of these were resolved fairly straightforwardly with time. Leaving SSA We adapted the general TestingWe have primarily run tests against the benchmarks outlined in the We are currently still in the process of fine-tuning certain aspects of our implementation against all benchmarks with Brench, with our final goal being to conclusively say that we cover all edge cases. TakeawaysThis was a terrific exercise, both in developing our own mental understanding of the SSA philosophy/model, and also in thinking of a new paradigm to approach compiler translations. While we didn't fully implement the alternative get/set node SSA translation system, we had a great time discussing it and theorizing it, and it will be fun to explore more on SSA in the future! |
Beta Was this translation helpful? Give feedback.
-
Code SummaryWe implemented to ssa and from ssa using phi nodes. Our to ssa implementation used the dominance frontier version of the algorithm, and the to ssa used the basic algorithm that we saw in class. We tested our implementation with all the benchmark files in the bril repository and they were all passing. How it works?To SSATo implement to ssa we used the dominance frontier algorithm that we saw in class it works in the following way:
From SSAOur from ssa implementation utilized the basic algorithm. This means finds phi operations, propagating the assignments to the ends of the predecessor blocks and deleting the phi operations. There were some complications with concurrent modifications and placing the id instructions in the proper spots, but it was relatively straightforward compares to the to ssa algorithm. Hardest partDebugging was definitely the hardest part of the assignment to make sure that everything was working okay after the to ssa transformation. Here are a few of the categories where we had bugs:
⭐❓We had a lot of fun working on this task, we think we deserve a start due to all the effort paid and for arriving at a solution that works across most benchmarks. |
Beta Was this translation helpful? Give feedback.
-
For this assignment, I wrote naive and dominance-tree based implementations of to_ssa using the set and get functions. To test it, I also wrote a shell script to check that the to_ssa implementations were in SSA form and that the outputs after converting to SSA and back to normal all match the original output. I tested against the core benchmarks, as usual. I figured that the benchmarks caught enough bugs in my implementations to be reliable here. The hardest part of this assignment was probably just working through all the edge cases and the logic around them. The big things here were handling function arguments and phi nodes appearing before variables' first definitions. I handled function arguments by adding fake init statements at the beginning of the first block, then restoring them to point at the original function arguments after my passes. One subtle error was that were a bunch of other sets and phi functions in block headers, which needed to go after the init statements. Phi nodes can also appear before variables' first definitions due to how dominance frontiers work. I wasn't really sure how to solve this until I saw the idea of just initializing all the shadow variables to undefined here. This is incredibly janky and I still don't feel great about it but it seems to work. Reading through my outputs and drawing out the graphs to debug why they were incorrect was very helpful for debugging, even if it took forever. I think my work deserves a Michelin star for actually getting to_ssa correct. |
Beta Was this translation helpful? Give feedback.
-
My SSA implementation converted Bril programs into SSA form using the get/set approach discussed in class, then converted them back into regular form by removing SSA instructions and normalizing variable names. Specifically, during the forward transformation into SSA, my algorithm renamed each variable uniquely within basic blocks, added get instructions at block entries for variables that were live-in, and placed corresponding set instructions at block exits for successor blocks. The reverse transformation removed these SSA-specific instructions (get, set, and undef) and simplified variable names by removing their SSA suffixes. One of the trickiest pieces of the SSA transformation was dealing with function parameters. At first, I neglected to treat arguments as if they were already “defined” at the function’s beginning, which caused spurious undef instructions for parameters. Once I explicitly pushed each parameter onto its variable stack at the start, that issue went away. To ensure correctness, I ran my resulting code through the is_ssa.py checker, which confirmed no variable was redefined, meaning the SSA form was valid. Additionally, I verified behavioral correctness by performing a complete round-trip transformation—original program → to SSA → from SSA—and comparing outputs to the original executions. I also measured the dynamic instruction counts for these round-trips, observing about a 210% overhead, which aligns with expectations given the naive insertion of SSA bookkeeping instructions. |
Beta Was this translation helpful? Give feedback.
-
For my SSA implementation, I decided to use the phi/upsilon instructions and the dominance frontier algorithm from class. This took a while, as I had to completely rewrite some of the dominance algorithms to handle the data structures I was using in Rust. For handling undefs, I used @UnsignedByte 's method of collecting all the variables that are undefined along certain paths and adding a set instruction for each to a "pre-entry" block. I collected undefined variables through dataflow analysis. I used symmetric difference for my merge function and out_b = in_b + defs_b for my transfer function. I'm still figuring out function arguments for ssa_into. Renaming everything also took me quite a while since I kept having to update my Block and context types to easily access the data I needed. Since I used the phi/upsilon variation, implementing ssa_out was pretty trivial. I just deleted all my set instructions and turned my get instructions into ids. Testing Michelin Star |
Beta Was this translation helpful? Give feedback.
-
I implemented a slight improvement on the naive SSA algorithm presented in class. 1 I used the TestingI have tested converting in and out of SSA over the entire benchmark suite of Bril. It appears to correctly produce the same output over the entire benchmark suite/inputs. PerformanceBelow are two plots measuring baseline ( The first plot is a violin plot, showing the distribution of the overhead across the benchmarks, broken down by passes performed and normalized to the baseline (no passes run): The second plot is a more detailed bar plot showing the specific (normalized) overhead produced by each benchmark: The Jupyter notebook to produce these plots is publicly available. 2 Footnotes |
Beta Was this translation helpful? Give feedback.
-
Code The overall functionality of to_ssa is: (fresh names were "var" -> "var.3" if there were 3 instances of var already)
from_ssa: TestingI tested my implementation by running it through all the core benchmarks and seeing the outputs were the same. It's the same except for just two benchmarks: RetrospectiveDon't think I deserve a star on this one, it came quite late. But I definitely learned a lot. Ended up spending >10 hours, but I think I could look back on this experience fondly. Definitely became more interested in different kinds of SSA construction now though. |
Beta Was this translation helpful? Give feedback.
-
code I encoutered two major difficulties. The first is that I found out some complicated benchmark tests failed and the problem eventually traced back to l5's dominator/dominance tree, implying l5 has some serious hidden bugs. The second one is that I'm not sure if it's caused by an incorrect setup, but my interpreter brili doesn't seem to accept 'undef' as an value (or that's intentionally designed?) To get around with this, I substitute all undef int type variable to const 0, and undef bool type variable to const true. In this way, the implementation can successfully passed is_ssa tests, and a partial part of benchmark tests. The overhead for core benchmarks is huge (2x+). I think it can probably be improved by running other optimization, such as LCV and DCE, as some get and set are not used during the execution. |
Beta Was this translation helpful? Give feedback.
-
I implemented functions for converting in and out of SSA using the naive algorithm and the 'sets' and 'gets'. Initially, I started by implementing it with phis but I felt I know the concept of phis well enough from LLVM so I wanted to learn 'sets' and 'gets'. I prefer phis more as it makes more sense to my brain. I encountered many issues with the entry block, so I added initial undef and set instructions into it. But this didn't work because I kept inserting it after a terminator. I tried to insert it before the terminator but it wasn't working so I just wholly placed all these instructions before the entry block and this solve the issue. I also had (and still have) issues with arguments to the functions. For testing, I manually tested on some handcrafted programs because I was short on time, but it has a few bugs with arguments to functions but other than that it works well. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Beta Was this translation helpful? Give feedback.
All reactions