Simple and Efficient SSA Construction #488
Replies: 18 comments 26 replies
-
I thought the paper generally did a good job of explaining the proposed SSA construction, but I had a hard time understanding what distinguished the construction from previous approaches, mostly because I was unfamiliar with the Cytron et al. algorithm. My current understanding is that a main difference is the Cytron et al. algorithm operates on a CFG, whereas the proposed approach does away with the inefficiency of first producing a CFG then converting by simultaneously constructing the CFG and converting. The construction of the CFG was not very explicit in the paper's algorithm (in particular in the pseudocode) so it took me a few passes to realize that phi node and CFG construction were simultaneous. I'm still a bit confused how we can know when to seal a block without a completed CFG. For instance in the Figure 3 example the authors write "Since the jump from the body exit needs to be added later, we cannot seal the while header yet", but how can we know about all jumps from other blocks without having constructed the CFG? My confusion makes me think that maintaining a clear delineation between CFG and SSA construction could be an advantage of the Cytron et al. approach in terms of interpretability and maybe as a consequence extensibility. Discussion question: In section 3.1 the paper discusses making IR optimizations during the SSA construction and mentions that their algorithm allows conservative (local) optimizations. Why only conservative optimizations? Is this also true of other algorithms? It seems like in general many optimizations are possible both on-the-fly and after SSA construction, how should we choose when to apply an optimization? |
Beta Was this translation helpful? Give feedback.
-
This paper provides a seemingly strictly superior method of constructing SSA compared to the approach we saw in class of using the dominator trees. Of note, the paper's benchmarks seem to show marginally better final compiled code in terms of number of In terms of critiques, I had some trouble understanding what the paper was trying to say in some of its more technical aspects. For instance, the paper describes its algorithm as not needing a CFG to be constructed ahead of time, but then I struggled to understand how it could perform basic CFG operations like finding predecessors, for instance. It seems like the pseudocode given in places like section 2.2 already assumes the CFG was computed. There were likewise some tables I struggled to understand at first, namely Table 1, where it took me multiple minutes to figure out which numbers were from the LLVM implementation and which ones were from the paper's. The label on the table says it compares the two, but there are three groupings of columns, not two, and the headings were not intuitive to me. Maybe just a skill issue on my end, but I felt there was opportunity to describe core aspects of the paper more accessibly. |
Beta Was this translation helpful? Give feedback.
-
In Braun et al's paper an algorithm is presented to build SSA form from an AST representation that is efficient and minimal, the algorithm is contrasted with Citron's popular algorithm which converts a non-SSA CFG into an SSA CFG, and the goal is to avoid the detour from AST -> non-SSA CFG -> SSA CFG by avoiding the middle transformation. As others in this thread have mentioned (Michael, Neel, Samuel) I found the presentation of the algorithm confusing as it felt to me that information from the CFG would be needed to determine if a block was sealed or not; but the clarification that from the semantics of the nodes we can guess the number of expected predecessors helped clarify the idea. Finally, I appreciated the description of minimality in SSA Forms. In class we've talked about the process of SSA conversion is to first insert phi nodes and then prune them, by using the dominance frontier. But I found the use of a recursive algorithm on strongly connected components a clever alternative. Discussion questions: What's the tradeoff in scalability/maintainability between Braun's and Citron's algorithms? Although Citron relies on a middle step to build the SSA CFG, the separation in layers can make it easier to understand and easier to maintain from an engineering perspective. Are the benefits strong enough to prefer this over Citron's approach? Similarly, I wonder if the use of on-the-fly optimizations while building the SSA CFG could lead to subtle bugs being introduced in the optimizations, which could make it difficult to debug or even verify correctness for. Once again, are the benefits of building the SSA in one pass worth it in this context? How would the dominance frontier vs strongly connected components algorithms for pruning CFGs compare? I.e. what would be the difference between Citron's pruning a SSA-CFG using the dominance frontier vs using strongly connected components? Both in terms of number of phi nodes and dynamic instructions and in complexity of the code. |
Beta Was this translation helpful? Give feedback.
-
I understood this paper as essentially computing SSA form from an AST without needing to do the intermediate analyses that we had to (basic blocks, CFGs, Domination) - which led to a faster conversion to SSA that also had comparable performance in terms of minimality to other algorithms. I think the mentioned goal at the end of JIT compilers really helped me see why this could be quite useful, as for most of the paper I was honestly quite confused as to why so many different optimizations were being merged into SSA conversion. I think the core algorithm and its main benefit (getting rid of unnecessary intermediate IRs) is quite useful. My main critique is the paper's categorization of this algorithm as "simple" - I think the paper overall seemed to me more of an exercise in optimizing for a specific use case rather than necessarily creating the better "general" algorithm. The whole section on merging constant propogation and other optimizations into SSA for example seemed like a recipe for very spaghettified AST conversion code (and AST conversion is already hard enough as-is). Especially when targeting something meant to be very extensible (with target-specific optimizations, etc) like LLVM, many of the optimizations like constant propogation will likely be run again on LLVM IR anyway, so it seems somewhat not worth the developer cost to me. Additionally, I felt that the algorithm felt somewhat frankensteined together - mainly with the multiple passes to get rid of trivial phi nodes, and the optional additions to generate minimal SSA form for irreducible CFGs. I know that the paper proved the algorithm to be correct, but I sort of felt that the algorithm was just not very clean, and it made me wonder if this was truly the best way to be doing SSA. I don't know if I believe the paper's claim that this general algorithm is comparable to Cytron's and could be helpful in JIT compilation - maybe I missed something, but the paper doesn't seem to show raw compile time comparisons between Cytron and their proposed algorithm, and I don't know if AST-IR conversion is even the bottleneck in most JIT compilers anyway. The added speedups to on-the-fly optimizations also seemed quite small, saving only ~1 second out of ~2 minutes seems like it might fail a Discussion QuestionsWhen is it worth it to design what amounts (to me) to an "inlined" compiler implementation, where we have minimal scalability but maximum performance, and are there more scalable/easier to understand methods to design these compiler passes? One of the big wins of this paper is skipping the necessity of generating CFGs/Dominators/etc, while maintaining comparable performance. I wonder if this is actually as big of a win in practice - how often could the "work" spent generating CFGs/Dominators be reused in other passes, and is that factored into this comparison? |
Beta Was this translation helpful? Give feedback.
-
Critique: The paper did an excellent job presenting the algorithm. It does come across as a simple algorithm, as they claim. However, they should have tuned their algorithm to the level that Cytron is tuned before comparing instead of speculating that if they did tune it, then we would see significant improvements over Cytron. Question: I am curious about what threshold of improvement (over a current implementation) a new compiler pass algorithm must achieve before industry and academia will use the algorithm. For example, are the 0.28% fewer instructions of Marker compared with Cytron enough to modify current compilers? Is a 1.49 s faster overall compilation time on the SPEC CINT2000 benchmark suite significant enough? |
Beta Was this translation helpful? Give feedback.
-
I like that this paper presented an alternative method of constructing SSA to what we learned in class, it was fun to read and think about. I never really considered that there is overhead we want to avoid when we construct a non-SSA IR and then convert it into SSA. So, at first I bought the argument that we might want to use this on-the-fly algorithm for JIT compilers where the overhead is perhaps a significant bottleneck. However, having finished the paper, I'm kind of disappointed there were no experiments done to expand on this point. The authors only mention it in passing in the conclusion. I'm also a bit skeptical about when we would want to use this algorithm. For me, it was harder to understand than the algorithm we discussed in class. One of the main advantages the authors cite is that they can avoid computation of dominator sets/frontiers. I don't think I buy this argument; while it is true that this algorithm avoids it, I imagine that any compiler will end up computing the dominance relation at some point -- any loop optimization would probably need it for the purpose of identifying loops? So I imagine that any cost of computing the dominator relation (which can't be that high) would be amortized across the many analyses/optimizations that use it. The other advantage is eliminating overhead, which I can see vaguely as an argument but again, I don't really buy it without some sort of experiment to determine if this overhead is actually critical. I think the algorithm was easy enough to understand, but I wish the tables were also easy to understand, especially Table 1. It wasn't clear to me what each column meant -- after reading the section a few times, I think "Before SSA Constr" means the default non-SSA LLVM code, "After SSA Constr" means the code generated by LLVM's SSA-converting pass, and "Marker" is the novel algorithm. The table is also just cluttered. Also I'm not sure what "Time" and "IR Time" are in Table 3. Discussion: In the general case, what are some reasons compiler users/developers might prefer either the "traditional" approach (AST -> non-SSA CFG -> SSA CFG -> optimized SSA CFG), or the one described in the paper (AST -> optimized SSA CFG)? To me, it seems there is quite a large gap between an AST and an optimized SSA CFG, so going there in one step feels like it can be full of non-obvious bugs. I wonder if the slight performance benefits are worth the cost in developing/maintaining it. And a user definitely doesn't want a compiler that's prone to bugs. |
Beta Was this translation helpful? Give feedback.
-
I found this paper interesting because their approach seemed to operate very differently from how we've been approaching compilers in this class, skipping the steps of constructing a CFG and dominance frontiers in favor of directly translating an AST to SSA form. It seems like the main idea was to use information encoded in the AST as a substitute for information encoded in a CFG and dominance frontiers, one example being that jumps come from language constructs with specific semantics, so we can infer when to declare a block as sealed. I wish that the paper had actually compared overall compile times between the on-the-fly optimized Marker algorithm and a tuned version of the Cytron et al. algorithm. The only compile time comparison I saw was between the optimized and unoptimized Marker algorithm. One question I had while reading this paper was that although using this approach to constructing the SSA form of a program skips the CFG and dominance frontier construction, could those structures be useful for other types of optimizations or program analysis? Either by making optimizations/analyses possible or just by making them easier to reason about. I think Cytron et al.'s algorithm is still easier to reason about, since there feels to be a stronger separation of concerns between the different program forms. |
Beta Was this translation helpful? Give feedback.
-
I found this paper to be quite interesting overall, skipping a couple passes in a compiler does seem like it could be useful, but that would depend on how many passes are required by modern compilers. My impression is it would only be truly useful if it was substantially faster than currently-used algorithms, because it doesn't seem to provide anything extra for analyses or reasoning. I think this paper would have heavily benefited from a bigger evaluation section. Others have pointed out that there is a lack of evaluation on compilation speed, which I find a little disappointing, what other benefits are there to having such a construction algorithm if not for the fact it's faster, and if so wouldn't you want to show it? Time comparisons seems to be a bad measurement to exclude if a claim of this paper's benefits include reducing overhead. Based on the evaluation data they do have, it seems that you could argue this algorithm is not worse than Cytron et al's, but I find it hard to be convinced that it's necessarily faster. Overall, I do like the idea of cutting out a non-SSA CFG, because I see little benefit of constructing it directly if a SSA CFG is more useful in every way. Cutting out dominators on the other hand, I'm not so sure about. (Discussion question) Are there properties of dominators that we might care about that is not encoded in a SSA CFG? I'm also having a hard time trying to find an industry-level compiler online that uses this algorithm, I wonder why this might be the case. Does anyone know if any compiler actually uses it? I can't tell if the algorithm is too new to be considered or if it is not that useful over other methods. But I also hear in general that LLVM-based compilers are often quite slow, so maybe there is some merit to doing optimizations on-the-fly and skipping compiler passes when you can. |
Beta Was this translation helpful? Give feedback.
-
Critique Discussion Question |
Beta Was this translation helpful? Give feedback.
-
CritiqueI thought this paper was really awesome. The highlight of the paper for me was their discussion of strongly connected components and how they used these to replace and optimize the number of phi nodes. Intuitively, this makes sense: if you have an "inner" subset of an SCC, in which all phi nodes only reference other phi nodes in this "inner" set, and you have exactly one other phi node in the SCC that isn't a part of "inner", then all information for the "inner" nodes must come from the one outer node. Really cool! I also think it was cool how they derived all these strong conditions in Section 4. First, it was neat how they got "pruned SSA form" for free (which might seem trivial but probably can't be said about our naive, Questions + ConfusionThere were some things I was confused about. Out of the two drawbacks they mentioned in the introduction, the second one (how Cytron et al. needed several other algorithms to construct minimality of More importantly, I was also confused by their sentence in Section 3, "This implies a definition of minimality that is independent of the source program and more strict than the definition by Cytron et al. [10]." Even after reading their proofs in Section 4, I wasn't sure how their definition of minimality was stronger than the existing one. They seem to have set up the definition of "necessary" phi nodes, but AFAIK didn't compare these definitions. Maybe I missed it? |
Beta Was this translation helpful? Give feedback.
-
Critique: At the end of section 1 (introduction), the authors mention that their SSA algorithm is the only algorithm to construct a minimal and pruned SSA program on reducible CFGs without relying on other analyses. However, then in section 3, they describe multiple different optimizations which, to me, seem like non-SSA analyses (i.e. common subexpression elimination, copy propagation, etc). I'm a bit confused on what, then, is unique between their algorithm and the Cytron algorithm; what exactly is meant by their algorithm "not relying on other analyses"? Question: I'm still a bit confused about the undefined value, which is assigned to the case where the phi node contains only itself as an operand (pg 5). Why is this phi node necessary? Why can't we just drop it? |
Beta Was this translation helpful? Give feedback.
-
I found this paper really interesting because it takes a fresh approach to SSA construction that avoids some of the traditional inefficiencies we talked about in class. The main idea is to construct SSA form directly from an AST or bytecode, rather than first building a CFG in non-SSA form. That immediately seemed like a smart idea because it skips unnecessary intermediate transformations, which can be expensive in compilers that are already SSA-based. The contrast with Cytron et al.’s classic SSA construction algorithm was particularly insightful. Cytron’s approach is widely used and ensures minimal φ-function placement, but it comes with a lot of overhead—requiring dominance frontiers and liveness analysis. This paper’s method flips that by working backward in a lazy manner: instead of precomputing where to insert φ-functions, it only places them when a variable’s definition is needed. That makes the process more dynamic and, in a way, feels more natural than the eager approach Cytron’s algorithm takes. One thing that clicked for me pretty easily was the way the authors handle variable definitions. The local and global value numbering parts made a lot of sense—local value numbering works fine within a single block, and if a definition isn’t found, they search backward through predecessors to determine where to place φ-functions. The lazy placement of φ-functions was a neat trick, especially with the memoization step to avoid redundant computations. That said, some parts of the paper took a bit longer to sink in. For example, the handling of incomplete CFGs felt a little abstract at first. The idea of “sealing” blocks—where a block is marked as final so no new predecessors can be added—wasn’t something I had seen before in SSA construction. It makes sense after thinking about it, though: when processing an AST directly, some control flow edges don’t exist yet, so you need a way to handle missing predecessors and later fill in the blanks when the structure is complete. Another part that stood out was the emphasis on minimizing unnecessary φ-functions. The algorithm doesn’t just avoid placing them blindly but also actively removes redundant ones, even in irreducible control flow graphs (which Cytron’s algorithm doesn’t do as cleanly). I also liked how the paper connected SSA form to functional programming ideas like CPS. The connection between φ-functions and function parameters in CPS wasn’t something I had thought about before, but it makes sense that they’re just two ways of encoding control flow dependencies. One thing I kept wondering was how this method compares to more modern compiler optimizations. The benchmarks showed that it’s competitive with LLVM’s implementation of Cytron’s algorithm, but I’d be curious to see how it scales in large, industry-scale compilers. Would just-in-time compilers benefit from this, or are there cases where the eager approach might still be better? Also, since this algorithm relies on backward search for definitions, I wonder if it ever leads to worst-case performance scenarios in deeply nested control structures. |
Beta Was this translation helpful? Give feedback.
-
This paper was a good read, as it introduced a way to create SSA form that seems lightweight yet intuitive. Instead of the algorithms we described in class which rely on having already created a dominance graph or create many extraneous definitions, this simply uses local value numbering and global value numbering and lazily works backwards from reads to find all writes that could have been the most recent write and uses a phi node to switch between those values. I like the introduction of LVN and GVN into this because it easily handles situations where there would be multiple writes on different branches but all containing the same value. This prevents additional phi nodes from being created in those cases in a way that a naive implementation wouldn't. I am a bit confused regarding the lack of a non-SSA CFG. The paper touts that their algorithm doesn't require creating a non-SSA CFG, but their algorithm requires knowing the predecessors to a block, and if you have those, then you might as well have constructed a non-SSA CFG (or at least, the transpose to one). Is there some level of nuance here that I missed? Discussion question: |
Beta Was this translation helpful? Give feedback.
-
Overall, I found this paper to be interesting. It presents an efficient and straightforward algorithm for SSA construction, one that is directly constructed from the source language without passing through a non-SSA CFG (unlike Cytron's algorithm) and that is well suited to perform several standard optimizations. I found that the authors substantiated their claim well by providing proofs that the SSA construction algorithm constructs pruned SSA form for all programs and minimal SSA form for programs with reducible control flow. For the on-the-fly optimizations section, I think it would be better if the authors were more explicit about the caching technique that is used to speed up the triviality checks. The caching mechanism assumes that if the two witnesses remain distinct, the phi function remains non-trivial, however it does not address the possibility of how optimizations elsewhere in the program could later make these operands identical, and thus make the phi function trivial. With this, the mechanism might require invalidating and recomputing witnesses under such circumstances, which is not explicitly discussed. Also, with regards to the variant that only optimizes the unique operand that does not guarantee minimal SSA form for all programs, I think maybe a brief discussion about the specific classes of programs where this simplification is particularly effective or problematic could be useful, or how often this approach fails to produce minimal SSA form in practice. My discussion questions relate more so to the algorithm design of SSA construction algorithms. In the paper, the authors state that one reason the Cytron algorithm is still very popular is that it guarantees a form of minimality on the number of placed phi functions. They also state that there are a range of construction algorithms which aim for simplicity instead, the goal of their algorithm essentially. I was wondering what the trade-offs between simplicity and minimality in SSA construction algorithms are. How can compilers dynamically decide when to prioritize simplicity over minimality, and are there specific scenarios where the cost of achieving one outweighs the benefits of the other? |
Beta Was this translation helpful? Give feedback.
-
Critique: I found the discussion on on-the-fly optimizations to be pretty interesting, as it seems like this is one of the areas where the authors' implementation offers an advantage over Cytron's formulation. However, like many others, I am not fully convinced by the performance evaluation of this algorithm since they only briefly state the non-optimized algorithm is "as fast as Cytron et al.'s algorithm in practice," without any formal comparison. They mention that using on-the-fly optimizations could further improve this and that it offers a few seconds of improvement in total compilation time, but it's hard to tell whether this slight improvement is worth the more complicated algorithm design. I imagined that removing extra passes and exchanging them for this more "integrated" algorithm should come with a significant performance improvement to make the less clean design worth it. Discussion Question: The authors describe their algorithm as simple and efficient. While they provide a complexity and minimality argument to argue for efficiency, I'm not entirely convinced about its simplicity. It does avoid auxiliary computations like finding dominance frontiers, but it seems they still recursively trace through program flow in a way that could be seen as indirectly performing the same computations. In practice, what is the "simplicity" advantage of this SSA construction algorithm? How well does this algorithm align with the practical concerns such as ease of implementation and maintainability? |
Beta Was this translation helpful? Give feedback.
-
Critique. This was a great paper, indeed presenting a fascinating insight into how to work with SSA, and in fact combining principles from across the several topics that we have covered throughout this course. While we have principally been concerned with working from a basic block -> CFG -> analysis approach, this one seems to skip several steps and focus more on directly reaching SSA. At the same time though, the intelligent usage of LVN, along with the introduction of GVN, makes for a fascinating parallel. This definitely further opened by eyes to the power of SSA. As a final point, I think the paper did a great job of clearly outlining both the algorithm and any relevant information needed for the process. Discussion Question. A big part of benchmarking previous algorithms (i.e Cytron et al) against one such as this, seems to revolve around a point I brought up earlier – distinguishing between thinking of compilers in terms of SSA, or the more primitive constructs from earlier. Barring aside efficiency, which poses a nicer mental model? |
Beta Was this translation helpful? Give feedback.
-
This one was one of the most mind-blowing papers I've read. I was super impressed by how they computed phi nodes lazily etc. and optimized the result to be provably minimal. I thought their presentation was extremely lucid and most of my confusions usually stemmed from me misreading or glossing over a particular core definition rather than any failure of content or style. I still don't fully understand the proof for Lemma 1 because they don't establish the nontriviality of |
Beta Was this translation helpful? Give feedback.
-
Like others have mentioned, it would be nice if there was some comparison of the interpretability / maintainability of their algorithm vs Cytron’s. It seems unclear from section 2 when the CFG (in SSA form) is actually constructed -- the pseudocode presented in sections 2.1 & 2.2 seems to assume that a half-built CFG is already present, since the functions they present take in blocks as arguments and refer to the block’s predecessors. (After reading some of the other comments here, I realized that the SSA CFG is built on the fly.) However, one might argue that 1.) the fact that their algorithm minimizes the no. of φ functions (since trivial ones are optimized away and the authors bake-in various optimizations that only require local analysis like CSE, per section 3.1), and 2.) their definition of minimality is more strict than Cytron’s (section 3.2) suggests that their algorithm produces more “readable” SSA code with fewer φ functions. Discussion question: how does one balance the trade-off between interpretability of an algorithm that produces optimized code, versus the interpretability of the code that was generated? As the authors point out in section 1, Cytron’s algorithm produces minimal SSA code, but the fact that it requires computing dominator trees/frontiers and performing liveness analysis over a non-SSA CFG makes the algorithm hard to use within SSA-centric compiler pipelines. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Discussion Leads: @InnovativeInventor @devv64 @neel-patel-1
Beta Was this translation helpful? Give feedback.
All reactions