Realistic performance expectations for Apple Silicon vs x86 #3061
-
Hello, I'd appreciate some help working out why ripgrep might be performing poorly on my work-issued MacBook Pro compared to my personal PC running Ubuntu. The Linux PC is built around an old AMD 5700X (8 cores with SMT and AVX2) with 32GB of DDR4 RAM and a good Gen4 NVMe SSD. The Macbook Pro is the latest model M4 Pro with 10 P cores and 4 E-cores, 24GB of RAM and a 512GB SSD. As I would expect from benchmarks published online, the Macbook Pro is substantially faster than the 5700X at most tasks, usually in the order of 20-40%. But when it comes to ripgrep the PC is leaving the Mac for dead - the Mac is consistenly 5x slower (literally 500% slower). For example, if I grab a clone of the linux kernel and checkout commit I know ARM/NEON are very different to x86/AVX2 but I would've expected the Mac to at least be in the same ballpark as this modestly-specced PC. The impression I get is that Linux's filesystem cache is making most of the difference here, and possibly the Mac's SSD is a little bit slower than the PC's SSD and the M4 CPU is a little starved of input. I'd love to hear from anyone else who has a similar Macbook whether they get similar results as I haven't been able to find any good direct comparisons of NEON vs AVX2 text processing online and I'd really like to get the Mac performing the same searches in <0.5s. |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 2 replies
-
Macbook
PC
|
Beta Was this translation helpful? Give feedback.
-
These are my results on an i9-12900K (with the same checkout of Linux as you):
And now on my M2 mac mini:
Note that I removed the I have no real idea how to compare these two. There are tons of variables. There's clock speed. There's the instruction set architecture difference. There's the operating system itself. You mentioned most of these yourself, so you're aware of them too. I don't really think I have anything more to add. I'm not really a hardware expert, and I think the kind of analysis you're looking for would need to very carefully control for a bunch of variables that I'm not well-equipped to do myself. Consider, for example, that NEON only gives you 16-bit vectors, but AVX2 on Intel gives you 32-bit vectors (and yes, ripgrep uses them for simple searches like this).
I think it's pretty tricky to assume that this will carry over to other workloads. For the kind of search you're benchmarking here, ripgrep is going to be using a highly optimized SIMD routine on the CPU. How NEON and AVX2 compare at a very precise level would be interesting to investigate. One thing you could do is isolate this down to a much simpler set of programs. Maybe one that uses |
Beta Was this translation helpful? Give feedback.
These are my results on an i9-12900K (with the same checkout of Linux as you):
And now on my M2 mac mini: