|
28 | 28 | // an example of a correct date string.
|
29 | 29 | ---
|
30 | 30 |
|
| 31 | +# [Parallel Self-Hosted Code Generation]($section.id('2025-06-14')) |
| 32 | +Author: Matthew Lugg |
| 33 | + |
| 34 | +Less than a week ago, we finally turned on the x86_64 backend by default for Debug builds on Linux |
| 35 | +and macOS. Today, we've got a big performance improvement to it: we've parallelized the compiler |
| 36 | +pipeline even more! |
| 37 | + |
| 38 | +These benefits do not affect the LLVM backend, because it uses a lot more shared state; in fact, it |
| 39 | +is still limited to one thread, where every other backend was able to use two threads even before |
| 40 | +this change. But for the self-hosted backends, machine code generation is essentially an isolated |
| 41 | +task, so we can run it in parallel with everything else, and even run multiple code generation jobs |
| 42 | +in parallel with one another. The generated machine code then all gets glued together on the linker |
| 43 | +thread at the end. This means we end up with one thread performing semantic analysis, arbitrarily |
| 44 | +many threads performing code generation, and one thread performing linking. Parallelizing this phase |
| 45 | +is particularly beneficial because instruction selection for x86_64 is incredibly complex due to the |
| 46 | +architecture's huge variety of extensions and instructions. |
| 47 | + |
| 48 | +This worked culminated in a [pull request](https://github.com/ziglang/zig/pull/24124), created by |
| 49 | +myself and Jacob, which was merged a couple of days ago. It was a fair amount of work, because a lot |
| 50 | +of internal details of the compiler pipeline needed reworking to completely isolate machine code |
| 51 | +generation from the linker. But it was all worth it in the end for the performance gains! Using the |
| 52 | +self-hosted x86_64 backend, we saw anywhere from 5% through to 50% improvements in wall-clock time |
| 53 | +for compiling Zig projects. For example, Andrew reports being able to build the Zig compiler itself |
| 54 | +(excluding linking LLVM, which would add a couple of seconds to the time) in 10 seconds or less: |
| 55 | + |
| 56 | +``` |
| 57 | +Benchmark 1 (32 runs): [... long command to build compiler with old compiler ...] |
| 58 | + measurement mean ± σ min … max outliers delta |
| 59 | + wall_time 13.8s ± 71.4ms 13.7s … 13.8s 0 ( 0%) 0% |
| 60 | + peak_rss 1.08GB ± 18.3MB 1.06GB … 1.10GB 0 ( 0%) 0% |
| 61 | + cpu_cycles 109G ± 71.2M 109G … 109G 0 ( 0%) 0% |
| 62 | + instructions 240G ± 48.3M 240G … 240G 0 ( 0%) 0% |
| 63 | + cache_references 6.42G ± 7.31M 6.41G … 6.42G 0 ( 0%) 0% |
| 64 | + cache_misses 450M ± 1.02M 449G … 451G 0 ( 0%) 0% |
| 65 | + branch_misses 422M ± 783K 421M … 423M 0 ( 0%) 0% |
| 66 | +Benchmark 2 (34 runs): [... long command to build compiler with new compiler ...] |
| 67 | + measurement mean ± σ min … max outliers delta |
| 68 | + wall_time 10.00s ± 32.2ms 9.96s … 10.0s 1 ( 3%) ⚡- 27.4% ± 0.9% |
| 69 | + peak_rss 1.35GB ± 18.6MB 1.34GB … 1.37GB 2 ( 6%) 💩+ 25.7% ± 3.9% |
| 70 | + cpu_cycles 95.1G ± 371M 94.8G … 95.5G 0 ( 0%) ⚡- 12.8% ± 0.6% |
| 71 | + instructions 191G ± 7.30M 191G … 191G 0 ( 0%) ⚡- 20.6% ± 0.0% |
| 72 | + cache_references 5.93G ± 33.3M 5.90G … 5.97G 4 (12%) ⚡- 7.5% ± 0.9% |
| 73 | + cache_misses 417M ± 4.55M 412M … 421M 2 ( 6%) ⚡- 7.2% ± 1.7% |
| 74 | + branch_misses 391M ± 549K 391M … 392M 2 ( 6%) ⚡- 7.3% ± 0.4% |
| 75 | +``` |
| 76 | + |
| 77 | +As another data point, I measure a 30% improvement in the time taken to build a simple "Hello World": |
| 78 | + |
| 79 | +``` |
| 80 | +Benchmark 1 (15 runs): /home/mlugg/zig/old-master/build/stage3/bin/zig build-exe hello.zig |
| 81 | + measurement mean ± σ min … max outliers delta |
| 82 | + wall_time 355ms ± 4.04ms 349ms … 361ms 0 ( 0%) 0% |
| 83 | + peak_rss 138MB ± 359KB 138MB … 139MB 0 ( 0%) 0% |
| 84 | + cpu_cycles 1.61G ± 16.4M 1.59G … 1.65G 0 ( 0%) 0% |
| 85 | + instructions 3.20G ± 57.8K 3.20G … 3.20G 0 ( 0%) 0% |
| 86 | + cache_references 113M ± 450K 112M … 113M 0 ( 0%) 0% |
| 87 | + cache_misses 10.5M ± 122K 10.4M … 10.8M 0 ( 0%) 0% |
| 88 | + branch_misses 9.73M ± 39.2K 9.67M … 9.79M 0 ( 0%) 0% |
| 89 | +Benchmark 2 (21 runs): /home/mlugg/zig/master/build/stage3/bin/zig build-exe hello.zig |
| 90 | + measurement mean ± σ min … max outliers delta |
| 91 | + wall_time 244ms ± 4.35ms 236ms … 257ms 1 ( 5%) ⚡- 31.5% ± 0.8% |
| 92 | + peak_rss 148MB ± 909KB 146MB … 149MB 2 (10%) 💩+ 7.3% ± 0.4% |
| 93 | + cpu_cycles 1.47G ± 12.5M 1.45G … 1.49G 0 ( 0%) ⚡- 8.7% ± 0.6% |
| 94 | + instructions 2.50G ± 169K 2.50G … 2.50G 1 ( 5%) ⚡- 22.1% ± 0.0% |
| 95 | + cache_references 106M ± 855K 105M … 108M 1 ( 5%) ⚡- 5.6% ± 0.4% |
| 96 | + cache_misses 9.67M ± 145K 9.35M … 10.0M 2 (10%) ⚡- 8.3% ± 0.9% |
| 97 | + branch_misses 9.23M ± 78.5K 9.09M … 9.39M 0 ( 0%) ⚡- 5.1% ± 0.5% |
| 98 | +``` |
| 99 | + |
| 100 | +By the way, I'm a real sucker for some good `std.Progress` output, so I can't help but mention how |
| 101 | +much I enjoy just *watching* the compiler now, and seeing all the work that it's doing: |
| 102 | + |
| 103 | +```=html |
| 104 | +<script src="https://asciinema.org/a/bgDEbDt4AkZWORDX1YBMuKBD3.js" id="asciicast-bgDEbDt4AkZWORDX1YBMuKBD3" async="true"></script> |
| 105 | +``` |
| 106 | + |
| 107 | +Even with these numbers, we're still far from done in the area of compiler performance. Future |
| 108 | +improvements to our self-hosted linkers, as well as in the code which emits a function into the |
| 109 | +final binary, could help to speed up linking, which is now sometimes the bottleneck of compilation |
| 110 | +speed (you can actually see this bottleneck in the asciinema above). We also want to |
| 111 | +[improve the quality of the machine code we emit](https://github.com/ziglang/zig/issues/24144), |
| 112 | +which not only makes Debug binaries perform better, but (perhaps counterintutively) should further |
| 113 | +speed up linking. Other performance work on our radar includes decreasing the amount of work the |
| 114 | +compiler does at the very end of compilation (its "flush" phase) to eliminate another big chunk of |
| 115 | +overhead, and (in the more distant future) parallelizing semantic analysis. |
| 116 | + |
| 117 | +Perhaps most significantly of all, incremental compilation -- which has been a long-term investment |
| 118 | +of the Zig project for many years -- is getting pretty close to being turned on by default in some |
| 119 | +cases, which will allow small changes to |
| 120 | +[rebuild in milliseconds](https://www.youtube.com/clip/Ugkxjn7L0hEfN1XLfH1soaUdCksG3FvJkXIS). |
| 121 | +By the way, remember that you can try out incremental compilation and start reaping its benefits |
| 122 | +*right now*, as long as you're okay with possible compiler bugs! Check out |
| 123 | +[the tracking issue](https://github.com/ziglang/zig/issues/21165) if you want to learn more about |
| 124 | +that. |
| 125 | + |
| 126 | +That's enough rambling -- I hope y'all are as excited about these improvements as we are. Zig's |
| 127 | +compilation speed is the best it's ever been, and hopefully the worst it'll ever be again ;) |
| 128 | + |
31 | 129 | # [Self-Hosted x86 Backend is Now Default in Debug Mode]($section.id('2025-06-08'))
|
32 | 130 | Author: Andrew Kelley
|
33 | 131 |
|
|
0 commit comments