Replies: 19 comments 22 replies
-
TracingFor tracing I modified the The tracer runs on every executed instruction, and automatically converts branches to guards and handles function scopes and returns. I used a pretty messy method of doing this - I keep a stack of all the function names, and simply prepend the stack to all names inside. Returns and function arguments are generated by creating All guards lead to a @main(x: int) {
speculate;
one: int = const 1;
hundred: int = const 100;
y: int = add x one;
cond: bool = lt x hundred;
guard cond .__trace_failed; # Make sure x < 100
__trace_f_r: int = sub y one; # Inlined @f
z: int = id __trace_f_r;
commit;
jmp .__trace_succeeded; # Jump on success
.__trace_failed:
one: int = const 1; # Redo operations on failure
hundred: int = const 100;
y: int = add x one;
cond: bool = lt x hundred;
br cond .then .else;
.then:
z: int = call @f y;
jmp .exit;
.else:
z: int = call @g y;
jmp .exit;
.exit:
.__trace_succeeded:
print z; # Print was not traced
}
@f(a: int): int {
one: int = const 1;
r: int = sub a one;
ret r;
} TestingMy implementation succeeds on all the benchmarks in the bril repository. I also tested specific cases to make sure the speculation works whether or not the speculated code succeeds, by modifying the SummaryI think my main takeaway from this was mostly how messy implementing traces felt. Obviously this was not helped by the fact that |
Beta Was this translation helpful? Give feedback.
-
Code: https://github.com/mt-xing/cs6120/tree/main/l11 This was another unambitious week as I get increasingly swamped by end-of-semester stuff. My tracing interepreter modifies My code will invoke the modified interpreter with arguments it is given and read the trace. It then reconstructs a new program, with the entire path of the trace as the first thing executed, wrapped in a speculate block. If a guard fails, it jumps to a label at the bottom of the new program, with the full text of the original function pasted there. Each guard is composed of the same condition as a branch encountered in the original trace. If the branch in the trace was false, an extra instruction is emitted to first I tested my implementation against the entirety of the bril core benchmarks for correctness. On average, my implementation causes an additional 8.036 dynamic instructions to be executed, although the minimum was a decrease of 7 dynamic instructions - so there is at least one program for which my "optimization" actually optimized something. We'll just ignore the max of 289 additional dynamic instructions, and the median of +3. I also hand-crafted test cases, deliberately causing the trace to take a different path then when I execute the final code. These too exhibited the same behavior as the reference interpreter on the original program, although they do not improve the performance at all, for obvious reasons. The hardest part of this assignment was figuring out conceptually what information exactly I needed from my tracing interpreter to be able to reconstruct not just the trace itself but also all the jumps and labels needed to resume execution when the trace either ends or needs to abort. The implementation ended up being relatively easy once I conceptually figured out what I needed, although admittedly my extremely unambitious implementation probably helped a lot. Still, this is a working tracing interpreter, and there's even a non-zero number of programs that it truly does optimize, so I'll consider that a successful implementation. |
Beta Was this translation helpful? Give feedback.
-
CodeHere and here are my modifications to the Bril typescript interpreter, and here is my trace-injector code. How it worksGetting the TraceI modified the Typescript interpreter to write instructions (in Filtering the TraceSince I was a little short on time this week, I made the simplifying assumption that all of my input programs would be standalone, call-less I then replaced the old Testing + ResultsI tested correctness on two designs that met my assumptions, For For All in all, this was a really cool exercise! Given more time, I would have liked to improve the trace code and support interprocedural traces. That way, my code would hopefully be able to bail out of the trace earlier, saving me some of the performance decreases I just mentioned. Hardest PartThis is a little dull, but I was having some trouble getting the trace to be correctly emitted by the interpreter. I kept running into JSON formatting errors, even though I thought I was emitting stuff correctly. It always turned out to be something minor, and I eventually got it working, but it took a lot of printing and debugging. Other than that, I spent a while thinking about interprocedural tracing, which got a little convoluted. Eventually, I got a good idea of what I thought I should do, but ultimately decided to stick with the simplification for this week. StarI think my work deserves a star! |
Beta Was this translation helpful? Give feedback.
-
For this week's assignment, I modified the interpreter to add tracing by printing out every instruction and label read, starting at the top of main. For branch statements, I also recorded which path the branch took. I then took the trace up to the first instruction with a side effect or backedge. When I encountered a branch, I added a not statement (depending on which branch was taken) and then a guard. Failed executions went to a label I added at the top of the original main statement, and successful executions went to a label I inserted before the original location of the last instruction in the trace. The one edge case I found was that I needed to ignore the last instruction if it was a branch. I tested the code against the core benchmarks, with both the default inputs and the default inputs with 1 subtracted from every numeric input. Unfortunately, stopping the code at the first backedge kinda meant that I was unable to really improve performance, as the number of instructions executed for every program besides pythagorean_triple increased. I don't think anything stood out as particularly difficult this week, but there were a lot of errors when trying to match the instructions output by the interpreter to the original instructions in the Bril program. I used Copilot to generate the initial code from a description and sample trace, then passed in the documentation for the Bril speculate extension. I still had to fix a few issues and figure out all the edge cases myself, but it did fine for a first draft. I think I deserve a Michelin star this week for fully supporting and testing against the core benchmarks! |
Beta Was this translation helpful? Give feedback.
-
Group (@ngernest, @katherinewu312, @samuelbreckenridge) Our modified reference interpreter records traces from the beginning of the Here is essentially a high-level overview of how we stitch the trace back into the program: @main(num: int) {
# code 1...
# hottest trace...
# code 2...
}
@main(num: int) {
# code 1...
speculate;
# hottest trace...
# something about branching to .hotpathfail
commit;
jmp .hotpathsuccess;
.hotpathfail:
# hottest trace...
.hotpathsuccess:
# code 2...
} We tested our implemention on several core bril benchmarks. Specifically, we tested on
default total_dyn_inst: 24
default total_dyn_inst: 13
default total_dyn_inst: 22
|
Beta Was this translation helpful? Give feedback.
-
Beta Was this translation helpful? Give feedback.
-
I tried to do the simplest version of tracing I could. For me, that meant inserting the full execution of the program as a speculative trace at the top of main and then bailing to the normal program if any guards failed. In some ways, that is maybe too simple because one way to implement that is to just guard on the inputs and recreate any observed prints with constants (because bril is fully deterministic AFAIK). I didn't want to do that since I felt like it was against the spirit of the assignment so I tried to do some "real" tracing on the honor system. The trickiest part ended up being handling function calls. I realized that tracing in an interpreter you can use the interpreter to maintain proper lexical scopes for all of your variables, but ahead of time tracing requires some trickery to handle calling functions with colliding variable names (recursive functions, for example). I ended up basically prepending the traced call stack to each variable name to avoid collisions. Testing wise, I did not work very hard. I hand-tested for correctness and performance on a handful of bril programs. My tracing optimization preserved correctness on the programs I tested it for and it made everything slower. I guess the good news is that it didn't make everything much slower? Because I guard on the inputs at the top of main, when the trace fails it fails early. And the slowdowns for executions where the trace completes are due to extra instructions I add to guard on the program inputs and to guard on false values in the trace body. So the result is that the slowdown is generally a few 10s of dynamic instructions regardless of the original runtime. |
Beta Was this translation helpful? Give feedback.
-
I thought this task was not too bad! I admittedly was not the most ambitious with it, but I found the speculative execution pretty easy to understand, and most of my grief came from the finicky details of having to move instructions around. I only did one speculation, for the first branch found in the first function; I tried for a bit originally to do nested speculations, but got quite muddled in the details of moving basic blocks inside of other basic blocks, and eventually just scaled it back to the one speculation. That much was quite feasible to do. For testing, I ran it against a few benchmarks to check for correctness (and found a couple bugs-- I was infinite looping at first because I also copied over labels into the speculation code, so the control flow coming from non-speculation code was spontaneously jumping into speculation code when it wasn't supposed to be-- oops! deleting the labels inside the speculation code was enough to fix this). I also used the total_dyn_inst flag to check for performance and saw an increase of ~4 new instructions (makes sense, since I statically added 4 new instructions), and I also checked for clock time, and found a very slight decrease in my clock time for my predicted path (e.g. 0.11 secs to 0.08 secs for predicted input in check-primes), but an increase on non-predicted paths (e.g. 0.08 to 0.09, 0.24 to 0.27, on various size non-predicted inputs to the check-primes benchmark in core). This matches expectation; I'm doing a bit more unnecessary work on non-predicted inputs, but get a performance increase (slight, but would likely be bigger if I did more speculations) on predicted paths. |
Beta Was this translation helpful? Give feedback.
-
I implemented a pretty simple form of tracing. I modified the interpreter to print a JSON containing every instruction executed during a run of the program. Then, I pass that to my program which prunes it down to the first 20 instructions from the start of main (this can be any number, I just chose 20). I also create CFGs for all the functions in the input program. Then, I further prune the trace by removing calls (the trace is nice because you kinda get inlining "for free") and empty returns. To make the inlining work, whenever I encounter a call I push the destination to a stack and then pop off the stack when I hit the corresponding return, and replace the return with an instruction like I also needed to find how to stitch this trace back into the rest of the program, which was a bit tricky. I had a function to compute where this point is, more specifically where we should jump after a successful trace. Jumping out of a trace where a guard fails is easy, you just jump back to whatever the start of I didn't do terribly thorough testing since it was a busy week for me. I tested it on some hand-written programs that aren't super contrived, which I thought was pretty ok. For example, this program
gets transformed to this:
I added all those random const instructions just so the trace wouldn't just be the entire program. Overall I thought this went pretty well. It made me think about all the changes I'd make to this infrastructure if I set out to write a JIT compiler. I think mostly I would want to add a lot more instrumentation to the interpreter, so that you could make smarter choices about when to start/stop tracing. |
Beta Was this translation helpful? Give feedback.
-
Here is the code: link I am recording a trace for a specified function (-f parameter for brili) until the first Here is the comparison of performance. For the first set of arguments, the modified code performs a bit slower because of the additional For the rest of the arguments, the modified code performs slower because it cannot reuse the trace. I routinely use Copilot when coding. I used it this time as well. I did not use any other GenAI tools. |
Beta Was this translation helpful? Give feedback.
-
Offline Trace-Based OptimizerThis task was a collaboration between @arnavm30 and @neel-patel-1. We implemented an ahead-of-time, interprocedural, trace-based optimizer which uses intruction counts for hotness detection.
One interesting aspect of our implementation that highlights a challenge with trace insertion in an ahead-of-time, trace-based optimizer is shown in the below example when the optimizer is applied to code containing a loop. Before optimization hot_loop.bril's loop body executes five times:
on the second iteration, the loop header and body become hot, so
Such a trace would cause the program to produce incorrect results. To address this, we pass traces through
We run our optimizer on three of our own examples and one core bril benchmark.
|
Beta Was this translation helpful? Give feedback.
-
In collaboration with @bryantpark04 and @dhan0779 Constructing Traces: We construct traces by modifying Performance: We checked the correctness of our stitched programs by ensuring the outputs matched the original program for various inputs. To evaluate performance, we used a handwritten program Conclusions: This assignment was pretty enjoyable, as it was cool to work with JIT concepts even in an AOT manner. The main challenge we faced was determining how to track where to stitch the trace back into the original program, particularly how to track the location in which to insert the |
Beta Was this translation helpful? Give feedback.
-
I extended the reference Bril interpreter (brili.ts) to support a runtime trace (--trace N) and an injection tool (apply_trace.ts) that turns the first N dynamic operations of the main loop into a speculative “fast path.” At runtime we:
I ran it on a handful of core benchmarks—Armstrong numbers, factorial, power, and so on—both with and without tracing applied. For example, on armstrong.json (input 407) the plain interpreter did 133 dynamic instructions, and the traced version (with my unoptimized 4-iteration trace) did 149; on fact.json (input 10) I saw 119 instructions untraced vs. about 120 traced. I also confirmed correctness by comparing program outputs (true / 3628800) on multiple inputs. Keeping the trace buffers, the tracing flag, and the speculative state (specparent) correctly threaded through recursive calls was tricky. My first attempt reset or dropped trace state on each call. The final solution simply carries the single tracing boolean and the growing traceInstrs array in the shared State object (and only forks it when you actually enter a speculate), so nested calls no longer break the recording. This is admittedly a pretty rudimentary implementation. |
Beta Was this translation helpful? Give feedback.
-
I modified the Bril interpreter to record each instruction that is evaluated (this skips over labels). I hardcoded the interpreter to stop tracing after recording 20 instructions. The interpreter starts recording immediately. When a "br" instruction is evaluated in the interpreter, I check whether the "cond" is true or false. If it is false, I insert an artificial instruction in my trace to flip it. Then I substitute a guard to a "hotpathfailed" label on "cond" for the "br" instruction. Tracing stops at the first "ret" or "call". My optimizer takes in the original program and the trace and forms a new program by inserting a "speculate", then the entire trace, then "commit", then a "hotpathfailed" label, and then the original program. I didn't actually run any optimizations on the trace (although I left a stub file to do that) because my original code for optimizations is in Python, and I did this assignment in C++. For testing, I ran my tracing optimizer on a handful of benchmarks. I quickly realized that it is ineffective for the majority of benchmarks because the benchmarks usually have a single instruction in 'main' before calling another function that does the heavy-lifting (but my tracing will stop itself at that first function call). I found a few good benchmarks though. 'Fizz-buzz' is perfect because the entire program is crammed into 'main'. My optimizer increased the number of dynamic instructions by 23 instructions. But this is without running any optimizations on the trace so it makes sense that it's pretty bad. I found this assignment interesting. I liked that I had to think about how much logic to put into the interpreter and how much to put into my optimizer. My gut-feeling was that all logic should be in the optimizer file, but this wasn't possible, which is interesting. This assignment also made me realize the importance of understanding how compilers work because I couldn't do very much without understanding how the interpreter works (and I don't so that is why my implementation is very basic). I used ChatGPT for documentation (because I'm lazy, but I want good documentation when I come back to my code). I did not use AI for anything else because I want to learn the material and learn C++. |
Beta Was this translation helpful? Give feedback.
-
For this assignment I worked on using jit compilation for loop unrolling. The main idea was to detect hot loops in a program, flatten them into a series of instructions and duplicate them four times, if at the end of the speculative execution doesn't hold then we just jump back into the original execution of the program. For simple loops (no nesting, no branching inside the loop) that execute a significant number of times (tens or hundreds of times) we can get significant performance wins (10% on computing the 500th fibonacci number0 but on the majority of programs the difference is negligible or even slightly negative. In more detail, I modified the brili interpreter to have a tracing mode that keeps track of the frequency of execution of different basic blocks; once it detects a basic block as being hot (executed 20 times) it enters tracing mode and tracks the blocks that form the hot path. At the end of the execution all the hot loops are printed in json format. A turnt command makes stores all hot paths in a program in a Once we have access to traces, we can call a Kotlin program that stitches the traces and the original program into an unrolled program. To do so we decide what the best loop to unroll for each bril function in the source with a couple of heuristics: (1) avoid loops that have side effects (2) prefer shorter loops over longer loops (3) avoid loops that have too many instructions (less than 20 instructions). The main idea of (1) was to avoid speculating on instructions that have side effects and of (2) and (3) to avoid having a large program due to unrolling large blocks. I worked on this part by first writing a version by hand of what I would expect my optimization to look like (you can find it here) and then writing my solution until I approached a better and better approximation of what I wanted. To decide how many times to unroll a program I just selected four as my hyperparameter of choice and didn't play around with tuning the best number of loop unrolls. How well does this work? Well for simple loops that execute a large number of times like fibonacci. I saw a decrease of around 15% dynamic instructions (3,509 vs 3,032). But for the majority of benchmarks in the core directory there was no benefit or even a negative impact to performance. Here's a summary of results:
I'm super late with this assignment but had a lot of fun working on unrolling and stitching code. |
Beta Was this translation helpful? Give feedback.
-
code Tracing Once I am done tracing, I write the list of traced instructions to std.error, which is sent to a file via piping. (the bash scripting took me a while to figure out since I'm pretty bad at it) optimizations + stitching the trace back in testing Testing was pretty challenging since if something was broken it was really hard to figure out what was going wrong and how to fix it. I spent a lot of time just combing through my output file and doing the program by hand to see why things weren't behaving properly. Even though I didn't accomplish everything I had hoped, I was still able to get a much better understanding of the bril compiler and on making dynamic compilers in general |
Beta Was this translation helpful? Give feedback.
-
code
I ran several different inputs on several different programs from core benchmarks. For example, for
There aren't that much of different, but we can still see a slight increase for traced case for other inputs. |
Beta Was this translation helpful? Give feedback.
-
I wrote a very simple tracing implementation that is effectively a branch predictor that gets to "cheat" by seeing what branches were taken during the profiled/traced execution (on the first run through a basic block). My code only support a handful of operations. I did this by: (1) instrumenting the reference interpreter to spit out each instruction that happened and the branches taken (when a I tested my code1 against my BBS PRNG implementation, with varying traced inputs (during traced execution) and varying run-time inputs (during program execution). It seems to behave correctly for my BBS implementation. Footnotes |
Beta Was this translation helpful? Give feedback.
-
Sorry about how late this is! Unfortunately both of us have been too busy to give this as much focus as our past work. ImplementationFor this assignment, we wrote a straightforward tracer operation using our TypeScript Bril infrastructure. We are able to support a couple of different operations in Bril. We began first by setting up the reference interpreter; subsequently, we implemented a trace that makes a pass through the program, and set up functions We then have more holistic functions TestingAs with past assignments, we utilized the core directory's benchmarks. We seem to be able to preserve correctness for the more basic cases, but its limitation is being able to deal with function calls, i.e. references to the other function names itself in Bril. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Here's the thread for the dynamic compilers task, which involves doing some speculative transformations on Bril IR!
Beta Was this translation helpful? Give feedback.
All reactions