You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The Rust compiler can use various *backends* for generating executable code. The main one is of course the LLVM backend, but there are other backends, such as [GCC][gcc backend], [.NET](#rust-to-net-compiler---add-support-for-compiling--running-cargo-tests) or [Cranelift][clif backend]. Cranelift is a code generator for various hardware targets, essentially something similar to LLVM. The Cranelift backend uses Cranelift to compile Rust code into executable code, with the goal of improving compilation performance, especially for debug (unoptimized) builds. Even though this backend can already be faster than the LLVM backend, we have identified that it was slowed down by the register allocator used by Cranelift.
49
+
50
+
Register allocation is a well-known compiler task where the compiler decides which registers should hold variables and temporary expressions of a program. Usually, the goal of register allocation is to perform the register assignment in a way that maximizes the runtime performance of the compiled program. However, for unoptimized builds, we often care more about the compilation speed instead.
51
+
52
+
Demilade has thus proposed to implement a new Cranelift register allocator called `fastalloc`, with the goal of making it as fast as possible, at the cost of the quality of the generated code. He was very well-prepared, in fact he had a prototype implementation ready even before his GSoC project has started! However, register allocation is a complex problem, and thus it then took several months to finish the implementation and also optimize it as much as possible. Demilade also made extensive use of fuzzing to make sure that his allocator is robust even in the presence of various edge cases.
53
+
54
+
Once the allocator was ready, Demilade benchmarked the Cranelift backend both with the original and his new register allocator using our compiler [benchmark suite][rustc-perf]. And the performance results look awesome! With his faster register allocator, the Rust compiler executes up to 18% less instructions across several benchmarks, including complex ones like performing a debug build of Cargo itself. Note that this is an *end-to-end* performance improvement of the time needed to compile a whole crate, which is really impressive. If you would like to examine the results in more detail or even run the benchmark yourself, check out Demilade's [final report](https://d-sonuga.netlify.app/gsoc/regalloc-iii/), which includes detailed instructions on how to reproduce the benchmark.
55
+
56
+
Apart from having the potential to speed up compilation of Rust code, the new register allocator can be also useful for other use-cases, as it can be used in Cranelift on its own (outside the Cranelift codegen backend). What can we can say other than we are very happy with Demilade's work! Note that the new register allocator is not yet available in the Cranelift codegen backend out-of-the-box, but we expect that it will eventually become the default choice for debug builds and that it will thus make compilation of Rust crates using the Cranelift backend faster in the future.
This project was relatively loosely defined, with the overarching goal of improving the user interface of the [Rust compiler benchmark suite](https://github.com/rust-lang/rustc-perf). Eitaro tackled this challenge from various angles at once. He improved the visualization of runtime benchmarks, which were previously a second-class citizen in the benchmark suite, by adding them to our [dashboard](https://perf.rust-lang.org/dashboard.html) and by implementing [historical charts](https://github.com/rust-lang/rustc-perf/pull/1922) of runtime benchmark results, which help us figure out how is a given benchmark behaving over a longer time span.
67
+
This project was relatively loosely defined, with the overarching goal of improving the user interface of the [Rust compiler benchmark suite][rustc-perf]. Eitaro tackled this challenge from various angles at once. He improved the visualization of runtime benchmarks, which were previously a second-class citizen in the benchmark suite, by adding them to our [dashboard](https://perf.rust-lang.org/dashboard.html) and by implementing [historical charts](https://github.com/rust-lang/rustc-perf/pull/1922) of runtime benchmark results, which help us figure out how is a given benchmark behaving over a longer time span.
57
68
58
69
Another improvement that he has worked on was embedding a profiler trace visualizer directly within the `rustc-perf` website. This was a challenging task, which required him to evaluate several visualizers and figure out a way how to include them within the source code of the benchmark suite in a non-disruptive way. In the end, he managed to integrate [Perfetto](https://ui.perfetto.dev/) within the suite website, and also performed various [optimizations](https://github.com/rust-lang/rustc-perf/pull/1968) to improve the performance of loading compilation profiles.
59
70
@@ -150,3 +161,4 @@ We are grateful that we could have been a part of the Google Summer of Code 2024
0 commit comments