[Apr 19 Discussion] Trace-Based Just-in-Time Type Specialization for Dynamic Languages #319

yy665 · 2022-04-13T10:37:53Z

yy665
Apr 13, 2022

Here is the link for the paper

andrewb1999 · 2022-04-18T01:06:47Z

andrewb1999
Apr 18, 2022

This paper felt like the tracing JIT counterpart to the SELF paper we read last week. Rather than do type specialization per function as in the method JIT used to implement SELF, this paper does type specialization for a JavaScript tracing JIT. Therefore, it's surprising that this paper came out 20 years after the SELF paper. I wonder if there's a reason why type specialization for tracing JITs took a lot longer to develop. Is it more challenging to actually see the benefit of type specialization in tracing JITs than method JITs? Or were tracing JITs as a whole developed much later than method JITs?

I also found the section on blacklisting quite interesting. I think it's particularly telling that the authors chose not to support arbitrary exceptions at all in JavaScript. It makes sense to me that supporting arbitrary exceptions in a tracing JIT would be challenging and blacklisting these paths is an interesting solution to reduce the overhead of tracing in a tracing JIT. I wonder how challenging it would be to support arbitrary exceptions? Is it just a matter of implementation difficultly or is there a significant performance impact from supporting these exceptions (requiring some backup of the state before throwing an exception)?

2 replies

sampsyo Apr 18, 2022
Maintainer

I don't know the right answer to your historical questions about "why this took so long," but one possible explanation is that the pressure to make more JITs increased around this time. Namely, JITs were mostly the domain of the JVM and CIL until JavaScript got popular, engendering a need for more creativity in making a language go fast that cannot be compiled ahead of time (because it's delivered to clients as source code).

I again don't know the exact answer for why they didn't support exceptions in this "v1.0" of the JIT, but one possible explanation is that it makes tracing the control flow transfers much more complicated (because they can be interprocedural and go to arbitrary points in the CFG, rather than just the entry/exit).

andrewb1999 Apr 18, 2022

That makes sense to me. Thanks for the explanation!

5hubh4m · 2022-04-18T17:34:32Z

5hubh4m
Apr 18, 2022

This paper was interesting to read because of the detailed discussions on the challenges of implementing a tracing JIT. One would think that just adding type assumptions to a block of code and compiling it would be enough but it turns out to be much more nuanced than that (nested loops, exceptions, preemptions, FFI, etc.)

The FFI aspect was extremely interesting especially because of the possibility of the FFI functions calling back into the interpreter. I wonder if newer VMs handle that any better than to just give up and not trace any blocks that use FFI.

1 reply

sampsyo Apr 18, 2022
Maintainer

It would be really interesting to dive into what V8/JSC/etc. do about the FFI question. Clearly, it's pretty important in browsers, because JS can frequently call into "native code" for web platform APIs. Not giving up on that entirely would seem to be important.

gsvic · 2022-04-18T18:41:39Z

gsvic
Apr 18, 2022

Reading that triggered me to search more about the evolution of JIT compilers in web browsers (Firefox specifically). While in this paper the authors point-out that TraceMonkey achieves better performance than method-base JITs because it operates on the granularity of the loops, it looks like it was later replaced by a JIT that combines elements of both strategies (JägerMonkey). It looks like that there are some issues related to trace-based JIT. Specifically, some info from this article on why the replaced TraceMonkey with JägerMonkey:

The downside of the tracing JIT is that we have to switch back and forth between the 
interpreter and the machine code  whenever we reach certain conditions. When we 
have to jump back from machine code to the interpreter this is what we call being 
“knocked off trace.” The interpreter is, of course, much slower than running native 
machine code. And it turns out that happens a lot – more than anyone expected.

The full article can be found here: https://hacks.mozilla.org/2010/03/improving-javascript-performance-with-jagermonkey/

1 reply

sampsyo Apr 18, 2022
Maintainer

I remember hearing a rumor at the time that the main problem was that the trace-based JIT got way too complicated and nobody knew how to improve it any more. So swapping it out for a method JIT was a way to increase the number of developers who were capable of pushing the JIT forward. (I have no idea is this is true.)

anshumanmohan · 2022-04-18T21:25:02Z

anshumanmohan
Apr 18, 2022

My most burning question is basically just gossip: what is it like to write a paper with fifteen others? I wonder if Adrian has any historical context about the group that developed TraceMonkey and its successors.

Often in this course we've looked at a paper and then wondered wistfully what the authors went on to do with the project. That question is relatively easy to answer in this case; see here and here. I found it interesting that they built upon TraceMonkey by going back to a method-level JIT compilation. Quote below is from the second link. Thoughts?

We’ll be bolting our tracing engine into the back of that machine code to generate super-fast code for inner loops. This means that we’ll be able to still have the advantages of a tracing engine with the consistency of the method-based JIT.

4 replies

sampsyo Apr 18, 2022
Maintainer

It is a big team! I would be happy to give a little more context about some of the personnel involved, but that's probably best done synchronously. But probably the big thing to know about the context of the paper itself is that it was one of the sparks for an enormous JavaScript JIT arms race between three browser (Mozilla, Google, Apple) vendors in the late 2000s/early 2010s. It was characterized by silly codenames, lots of blog posts, rapidly evolving benchmarks, and so on. Exciting times if you were around and liked compilers!

For example, see the famous comic introducing Chrome and V8 and this 2008 blog post about "SquirrelFish Extreme".

anshumanmohan Apr 18, 2022

!! I've actually read and enjoyed Scott McCloud's Understanding Comics, but using a 40-page comic to introduce Chrome may be a bridge too far for me, haha

sampsyo Apr 18, 2022
Maintainer

😂 If you were living in a world where per-tab process isolation and JITs for JavaScript were new ideas, you would definitely be excited to read the whole thing---I can tell you from personal experience!

charles-rs Apr 19, 2022

omg understanding comics is so good!

JonathanDLTran · 2022-04-19T04:36:52Z

JonathanDLTran
Apr 19, 2022

I found it interesting how this paper builds on the idea of using type specialization, used in Self, and applies it to loop traces. I agree with the authors assumptions that loops are the most executed parts of the programs, which makes type specialization useful, but I was also curious if type specialization could be used in other places such as memory allocation. For instance, I was thinking that if you knew what types objects were on a certain trace, then you could allocate those objects using minimal amounts of memory, rather than having to allocate larger chunks of memory if you did not know the exact type of the object, and had to allocate generically.

I was also curious why bitops-3bit-bits-in-byte and bitops-bitwise-and perform so much better than the V8 and SFX jit compilers. The performances on these 2 programs seem much better than usual, and it would be interesting if the performance could be explained, similar to how the authors explain why some of the benchmarks act the way they do. Otherwise, I do feel that these 2 benchmarks performances indicate the realm of improvement possible, even without other techniques used in the V8 and SFX compilers like call threading.

2 replies

sampsyo Apr 19, 2022
Maintainer

Cool idea to try optimizing allocations based on types! I guess one conceptual item to ponder is that, by the time you actually call malloc to create a new object on the heap, you know exactly how many bytes you need. But your idea could apply to pre-allocating regions of memory before you actually need to fill them with data. Could be neat!

As far as those outliers, it would be interesting to think about "best case scenarios" for tracing... it would even be possible to pull up the source code for those old JS benchmarks to see if they obviously exhibit those properties!

susan-garry Apr 19, 2022

I think the paper mostly attributed the immense speedups on these benchmarks to being able to cast numbers as ints and perform integer operations as opposed to flops

michaelmaitland · 2022-04-19T04:45:44Z

michaelmaitland
Apr 19, 2022

I thought the solution to the nested loop problem was particularly cool. I thought the solution to the nested loop problem was particularly important to the success of the idea of a tracing JIT. The tracing here leads to the compilation of hot loops.

I wonder whether there are any other particularity useful "hot" things compilers could choose to compile as well. For example, consider a call to a logger, log(). This might be called many times without being in a loop. It might make sense to compile this code, but this papers methodology would not choose to do so. What other methods might we use to define "hot" code and can we use tracing to identify it in an efficient manner?

1 reply

sampsyo Apr 19, 2022
Maintainer

Hot functions like log are the purview of method JITs. So your observation could be seen as a justification for designing JITs with characteristics of both styles: mostly tracing, say, with a per-function compilation for widely called functions that appear in many traces.

chhzh123 · 2022-04-19T05:09:23Z

chhzh123
Apr 19, 2022

I would say this paper is more readable than the previous OOPSLA papers (probably because I am not a big fan of those long and old-fashion things). It uses type information to specialize different traces and compile the traces in a hierarchical way, which is able to tackle complex nested loops. The key idea is rather intuitive and easy to understand. I at first thought it would be easy to implement, but the authors told us there were lots of details to be considered, including tacking exceptions, blacklisting with nesting, calling external functions, etc. They finally resolved those issues systematically and made a workable and efficient implementation for JavaScript.

My question is also similar: These techniques seem not that novel and perhaps should be invented in the last century when the tracing-based JIT compiler was first invented, but why the paper was published so late in 2009. Was it the first to propose this kind of type-specialized trace-based JIT technique? Also, it mentions multicore compilation would speedup the compilation process but would be left for their future work. Hasn't the multicore hardware been prevalent at that time? I think running two processes for runtime and JIT compiler concurrently should not be a big challenge?

2 replies

sampsyo Apr 19, 2022
Maintainer

To be somewhat blunt, I think it's important for researchers to learn to enjoy reading old papers. If you only read "modern" papers, you run the risk of repeating history instead of learning from it. It can also be critical to understand not only how the world is now but how we got here.

Relatedly, can you expand on why you think this paper should have been written earlier? Is it just that the idea is "obvious" so someone should have thought of it sooner? I think sometimes good ideas can seem that way because they have been so thoroughly absorbed into the world since they were published. Even if, at the time, it took a true flash of insight to see the new idea simply because it did not exist before. In other words, scientific progress happens because the people involved do it---not because it happens spontaneously when the conditions are right.

chhzh123 Apr 19, 2022

Thanks for replying. It makes sense. Probably we need to put ourselves into that historical context to see what was the status of programming languages and computer architecture in those days.

charles-rs · 2022-04-19T10:46:04Z

charles-rs
Apr 19, 2022

First of all, what is this cursed amalgam of intel and at&t syntax for x86 in fig 3...

I would definitely agree w/ the above comments that this paper was a lot more readable than the SELF one, which was nice.

One of the most interesting things to me in this paper was how they managed to have a tracing JIT that ended up with more compiled code than just simple traces, with the "trace trees." This left me wondering what other structures can be compiled like this? It seems that in this case they take advantage of the lack of merge points giving them an SSA without φ functions, but are there ways around this?

2 replies

susan-garry Apr 19, 2022

It does rather seem like you would be able to compile entire functions in this manner, using tracing for frequently executed portions of code like function calls and stitching these traces into the compiled code for the entire function.

tonyjie Apr 19, 2022

I think the idea of "trace trees" could be generalized to optimizations on other structures, as all the loops could be represented like that.

Also, I think what makes this paper a lot more readable is its sufficient background given, a clear overview figrue and a easy and detailed example!

susan-garry · 2022-04-19T13:16:55Z

susan-garry
Apr 19, 2022

This paper was an interesting read and a nice introduction to the more nitty-gritty aspects of tracing. I'm curious as to what future work (if any) was done to address the overhead due to short, nested loops, as this seems like a fairly common occurrence in the benchmarks. What exactly causes the additional overhead in these loops, and can this be addressed by somehow inlining the inner loop to the outer method trace? (The paper mentions a large overhead associated with compiling multiple versions of an outer loop for each inner loop trace, but it seems unlikely that the types of variables used in inner loops would change a lot over the course of executing the outer loop).

Edit: I realize that if we were to trace every execution of the inner loop and outer loop, we would have just as many possible combinations if we started with the outer loop or the inner loop, but perhaps we could get away with only keeping track of a small fraction of these possible executions (e.g. the last two traces of the outer loop, inlining the most common traces of the inner loop).

0 replies

andreyyao · 2022-04-19T14:28:52Z

andreyyao
Apr 19, 2022

The heuristic of marking a loop path as hot when the minotaur comes across its backedge at least twice seems reasonable enough but it's still a little arbitrary. I guess in order to keep the JIT's optimization capabilities general it's very difficult to make up a good criterion for hot paths that makes sense for all programs. However I wonder how advantageous it is to extend the program with optional annotations on for or while loops to mark them as hot, so that someone who REALLY cares about the performance of their program can use the annotations at loops where they predict will run many many times, using their high level understanding of their program.

0 replies

alaiasolkobreslin · 2022-04-20T05:18:49Z

alaiasolkobreslin
Apr 20, 2022

This paper was an interesting read because I had never really thought about how much slower compiled code might be for dynamically typed languages. So I was pleasantly surprised to see speedups of around 2-20x on many benchmarks with TraceMonkey. On a very unrelated note, something I’m learning this semester is that loops can be a great target for optimizations in so many ways. TraceMonkey finds hot paths in loops and executes these type-specialized traces, resulting in massive speedup- this sort of reminded me of a runtime verification paper I read recently that showed how you can transform loops in order to only monitor events produced from the first few iterations, which results in a large overhead reduction!

0 replies

[Apr 19 Discussion] Trace-Based Just-in-Time Type Specialization for Dynamic Languages #319

Uh oh!

Uh oh!

Replies: 11 comments · 15 replies

Uh oh!

Uh oh!

sampsyo Apr 18, 2022 Maintainer

Uh oh!

Uh oh!

Uh oh!

Uh oh!

sampsyo Apr 18, 2022 Maintainer

Uh oh!

Uh oh!

sampsyo Apr 18, 2022 Maintainer

Uh oh!

Uh oh!

sampsyo Apr 18, 2022 Maintainer

Uh oh!

Uh oh!

sampsyo Apr 18, 2022 Maintainer

Uh oh!

Uh oh!

Uh oh!

sampsyo Apr 19, 2022 Maintainer

Uh oh!

Uh oh!

Uh oh!

sampsyo Apr 19, 2022 Maintainer

Uh oh!

Uh oh!

sampsyo Apr 19, 2022 Maintainer

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Replies: 11 comments 15 replies

sampsyo Apr 18, 2022
Maintainer

sampsyo Apr 18, 2022
Maintainer

sampsyo Apr 18, 2022
Maintainer

sampsyo Apr 18, 2022
Maintainer

sampsyo Apr 18, 2022
Maintainer

sampsyo Apr 19, 2022
Maintainer

sampsyo Apr 19, 2022
Maintainer

sampsyo Apr 19, 2022
Maintainer