Skip to content

Commit c08a3d2

Browse files
committed
devlog: parallel self-hosted codegen
1 parent 6f06223 commit c08a3d2

File tree

1 file changed

+98
-0
lines changed

1 file changed

+98
-0
lines changed

content/en-US/devlog/2025/index.smd

Lines changed: 98 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -28,6 +28,104 @@
2828
// an example of a correct date string.
2929
---
3030

31+
# [Parallel Self-Hosted Code Generation]($section.id('2025-06-14'))
32+
Author: Matthew Lugg
33+
34+
Less than a week ago, we finally turned on the x86_64 backend by default for Debug builds on Linux
35+
and macOS. Today, we've got a big performance improvement to it: we've parallelized the compiler
36+
pipeline even more!
37+
38+
These benefits do not affect the LLVM backend, because it uses a lot more shared state; in fact, it
39+
is still limited to one thread, where every other backend was able to use two threads even before
40+
this change. But for the self-hosted backends, machine code generation is essentially an isolated
41+
task, so we can run it in parallel with everything else, and even run multiple code generation jobs
42+
in parallel with one another. The generated machine code then all gets glued together on the linker
43+
thread at the end. This means we end up with one thread performing semantic analysis, arbitrarily
44+
many threads performing code generation, and one thread performing linking. Parallelizing this phase
45+
is particularly beneficial because instruction selection for x86_64 is incredibly complex due to the
46+
architecture's huge variety of extensions and instructions.
47+
48+
This worked culminated in a [pull request](https://github.com/ziglang/zig/pull/24124), created by
49+
myself and Jacob, which was merged a couple of days ago. It was a fair amount of work, because a lot
50+
of internal details of the compiler pipeline needed reworking to completely isolate machine code
51+
generation from the linker. But it was all worth it in the end for the performance gains! Using the
52+
self-hosted x86_64 backend, we saw anywhere from 5% through to 50% improvements in wall-clock time
53+
for compiling Zig projects. For example, Andrew reports being able to build the Zig compiler itself
54+
(excluding linking LLVM, which would add a couple of seconds to the time) in 10 seconds or less:
55+
56+
```
57+
Benchmark 1 (32 runs): [... long command to build compiler with old compiler ...]
58+
measurement mean ± σ min … max outliers delta
59+
wall_time 13.8s ± 71.4ms 13.7s … 13.8s 0 ( 0%) 0%
60+
peak_rss 1.08GB ± 18.3MB 1.06GB … 1.10GB 0 ( 0%) 0%
61+
cpu_cycles 109G ± 71.2M 109G … 109G 0 ( 0%) 0%
62+
instructions 240G ± 48.3M 240G … 240G 0 ( 0%) 0%
63+
cache_references 6.42G ± 7.31M 6.41G … 6.42G 0 ( 0%) 0%
64+
cache_misses 450M ± 1.02M 449G … 451G 0 ( 0%) 0%
65+
branch_misses 422M ± 783K 421M … 423M 0 ( 0%) 0%
66+
Benchmark 2 (34 runs): [... long command to build compiler with new compiler ...]
67+
measurement mean ± σ min … max outliers delta
68+
wall_time 10.00s ± 32.2ms 9.96s … 10.0s 1 ( 3%) ⚡- 27.4% ± 0.9%
69+
peak_rss 1.35GB ± 18.6MB 1.34GB … 1.37GB 2 ( 6%) 💩+ 25.7% ± 3.9%
70+
cpu_cycles 95.1G ± 371M 94.8G … 95.5G 0 ( 0%) ⚡- 12.8% ± 0.6%
71+
instructions 191G ± 7.30M 191G … 191G 0 ( 0%) ⚡- 20.6% ± 0.0%
72+
cache_references 5.93G ± 33.3M 5.90G … 5.97G 4 (12%) ⚡- 7.5% ± 0.9%
73+
cache_misses 417M ± 4.55M 412M … 421M 2 ( 6%) ⚡- 7.2% ± 1.7%
74+
branch_misses 391M ± 549K 391M … 392M 2 ( 6%) ⚡- 7.3% ± 0.4%
75+
```
76+
77+
As another data point, I measure a 30% improvement in the time taken to build a simple "Hello World":
78+
79+
```
80+
Benchmark 1 (15 runs): /home/mlugg/zig/old-master/build/stage3/bin/zig build-exe hello.zig
81+
measurement mean ± σ min … max outliers delta
82+
wall_time 355ms ± 4.04ms 349ms … 361ms 0 ( 0%) 0%
83+
peak_rss 138MB ± 359KB 138MB … 139MB 0 ( 0%) 0%
84+
cpu_cycles 1.61G ± 16.4M 1.59G … 1.65G 0 ( 0%) 0%
85+
instructions 3.20G ± 57.8K 3.20G … 3.20G 0 ( 0%) 0%
86+
cache_references 113M ± 450K 112M … 113M 0 ( 0%) 0%
87+
cache_misses 10.5M ± 122K 10.4M … 10.8M 0 ( 0%) 0%
88+
branch_misses 9.73M ± 39.2K 9.67M … 9.79M 0 ( 0%) 0%
89+
Benchmark 2 (21 runs): /home/mlugg/zig/master/build/stage3/bin/zig build-exe hello.zig
90+
measurement mean ± σ min … max outliers delta
91+
wall_time 244ms ± 4.35ms 236ms … 257ms 1 ( 5%) ⚡- 31.5% ± 0.8%
92+
peak_rss 148MB ± 909KB 146MB … 149MB 2 (10%) 💩+ 7.3% ± 0.4%
93+
cpu_cycles 1.47G ± 12.5M 1.45G … 1.49G 0 ( 0%) ⚡- 8.7% ± 0.6%
94+
instructions 2.50G ± 169K 2.50G … 2.50G 1 ( 5%) ⚡- 22.1% ± 0.0%
95+
cache_references 106M ± 855K 105M … 108M 1 ( 5%) ⚡- 5.6% ± 0.4%
96+
cache_misses 9.67M ± 145K 9.35M … 10.0M 2 (10%) ⚡- 8.3% ± 0.9%
97+
branch_misses 9.23M ± 78.5K 9.09M … 9.39M 0 ( 0%) ⚡- 5.1% ± 0.5%
98+
```
99+
100+
By the way, I'm a real sucker for some good `std.Progress` output, so I can't help but mention how
101+
much I enjoy just *watching* the compiler now, and seeing all the work that it's doing:
102+
103+
```=html
104+
<script src="https://asciinema.org/a/bgDEbDt4AkZWORDX1YBMuKBD3.js" id="asciicast-bgDEbDt4AkZWORDX1YBMuKBD3" async="true"></script>
105+
```
106+
107+
Even with these numbers, we're still far from done in the area of compiler performance. Future
108+
improvements to our self-hosted linkers, as well as in the code which emits a function into the
109+
final binary, could help to speed up linking, which is now sometimes the bottleneck of compilation
110+
speed (you can actually see this bottleneck in the asciinema above). We also want to
111+
[improve the quality of the machine code we emit](https://github.com/ziglang/zig/issues/24144),
112+
which not only makes Debug binaries perform better, but (perhaps counterintutively) should further
113+
speed up linking. Other performance work on our radar includes decreasing the amount of work the
114+
compiler does at the very end of compilation (its "flush" phase) to eliminate another big chunk of
115+
overhead, and (in the more distant future) parallelizing semantic analysis.
116+
117+
Perhaps most significantly of all, incremental compilation -- which has been a long-term investment
118+
of the Zig project for many years -- is getting pretty close to being turned on by default in some
119+
cases, which will allow small changes to
120+
[rebuild in milliseconds](https://www.youtube.com/clip/Ugkxjn7L0hEfN1XLfH1soaUdCksG3FvJkXIS).
121+
By the way, remember that you can try out incremental compilation and start reaping its benefits
122+
*right now*, as long as you're okay with possible compiler bugs! Check out
123+
[the tracking issue](https://github.com/ziglang/zig/issues/21165) if you want to learn more about
124+
that.
125+
126+
That's enough rambling -- I hope y'all are as excited about these improvements as we are. Zig's
127+
compilation speed is the best it's ever been, and hopefully the worst it'll ever be again ;)
128+
31129
# [Self-Hosted x86 Backend is Now Default in Debug Mode]($section.id('2025-06-08'))
32130
Author: Andrew Kelley
33131

0 commit comments

Comments
 (0)