Realistic performance expectations for Apple Silicon vs x86 #3061

phodge · 2025-06-03T09:47:10Z

phodge
Jun 3, 2025

Hello, I'd appreciate some help working out why ripgrep might be performing poorly on my work-issued MacBook Pro compared to my personal PC running Ubuntu.

The Linux PC is built around an old AMD 5700X (8 cores with SMT and AVX2) with 32GB of DDR4 RAM and a good Gen4 NVMe SSD.

The Macbook Pro is the latest model M4 Pro with 10 P cores and 4 E-cores, 24GB of RAM and a 512GB SSD. As I would expect from benchmarks published online, the Macbook Pro is substantially faster than the 5700X at most tasks, usually in the order of 20-40%. But when it comes to ripgrep the PC is leaving the Mac for dead - the Mac is consistenly 5x slower (literally 500% slower).

For example, if I grab a clone of the linux kernel and checkout commit 546b1c9e93c2bb8cf5ed24e0be1c86bb089b3253 and then search for spaghetti with rg -win spaghetti, the PC result will finish the search in about 1.3s on first pass and then 0.25s on subsequent attempts after disk cache is warm. On the work-issued Mac the same search initially takes 1.4s and then on subsequent runs is slower - more like 2.5s.

I know ARM/NEON are very different to x86/AVX2 but I would've expected the Mac to at least be in the same ballpark as this modestly-specced PC. The impression I get is that Linux's filesystem cache is making most of the difference here, and possibly the Mac's SSD is a little bit slower than the PC's SSD and the M4 CPU is a little starved of input.

I'd love to hear from anyone else who has a similar Macbook whether they get similar results as I haven't been able to find any good direct comparisons of NEON vs AVX2 text processing online and I'd really like to get the Mac performing the same searches in <0.5s.

Answered by BurntSushi

Jun 3, 2025

These are my results on an i9-12900K (with the same checkout of Linux as you):

$ lscpu | rg 'Model name'
Model name:                           12th Gen Intel(R) Core(TM) i9-12900K

$ hyperfine 'rg spaghetti ./ | cat'
Benchmark 1: rg spaghetti ./ | cat
  Time (mean ± σ):      82.8 ms ±   1.9 ms    [User: 301.2 ms, System: 594.5 ms]
  Range (min … max):    79.0 ms …  87.0 ms    34 runs

And now on my M2 mac mini:

$ sysctl -a | rg -F 'machdep.cpu'
machdep.cpu.cores_per_package: 8
machdep.cpu.core_count: 8
machdep.cpu.logical_per_package: 8
machdep.cpu.thread_count: 8
machdep.cpu.brand_string: Apple M2

$ hyperfine 'rg spaghetti ./ | cat'
Benchmark 1: rg spaghetti ./ | cat
  Time (mean ± σ): …

View full answer

phodge · 2025-06-03T09:56:46Z

phodge
Jun 3, 2025
Author

Macbook

$ rg --version
ripgrep 14.1.1

features:+pcre2
simd(compile):+NEON
simd(runtime):+NEON

PCRE2 10.43 is available (JIT is available)
$ time rg -win spaghetti 1>/dev/null
rg -win spaghetti > /dev/null  0.53s user 5.97s system 429% cpu 1.513 total
$ time rg -win spaghetti 1>/dev/null
rg -win spaghetti > /dev/null  0.58s user 24.13s system 1049% cpu 2.355 total

PC

$ rg --version
ripgrep 13.0.0
-SIMD -AVX (compiled)
+SIMD +AVX (runtime)
$ time rg -win spaghetti 1>/dev/null
rg -win spaghetti > /dev/null  0.84s user 4.03s system 356% cpu 1.367 total
$ time rg -win spaghetti 1>/dev/null
rg -win spaghetti > /dev/null  0.77s user 2.12s system 1128% cpu 0.256 total

1 reply

BurntSushi Jun 3, 2025
Maintainer

The first thing you should do is use the same version on both... And I don't support older versions of ripgrep, so I suggest upgrading the version on your PC.

BurntSushi · 2025-06-03T14:20:04Z

BurntSushi
Jun 3, 2025
Maintainer

These are my results on an i9-12900K (with the same checkout of Linux as you):

$ lscpu | rg 'Model name'
Model name:                           12th Gen Intel(R) Core(TM) i9-12900K

$ hyperfine 'rg spaghetti ./ | cat'
Benchmark 1: rg spaghetti ./ | cat
  Time (mean ± σ):      82.8 ms ±   1.9 ms    [User: 301.2 ms, System: 594.5 ms]
  Range (min … max):    79.0 ms …  87.0 ms    34 runs

And now on my M2 mac mini:

$ sysctl -a | rg -F 'machdep.cpu'
machdep.cpu.cores_per_package: 8
machdep.cpu.core_count: 8
machdep.cpu.logical_per_package: 8
machdep.cpu.thread_count: 8
machdep.cpu.brand_string: Apple M2

$ hyperfine 'rg spaghetti ./ | cat'
Benchmark 1: rg spaghetti ./ | cat
  Time (mean ± σ):     386.8 ms ±   1.7 ms    [User: 418.1 ms, System: 2511.9 ms]
  Range (min … max):   384.1 ms … 389.2 ms    10 runs

Note that I removed the -i flag here because it complicates the benchmarking model. Ideally your model should be as simple as possible. With the -i flag, ripgrep will use a more complicated SIMD algorithm (probably). But without it, ripgrep will just use a memmem routine.

I have no real idea how to compare these two. There are tons of variables. There's clock speed. There's the instruction set architecture difference. There's the operating system itself. You mentioned most of these yourself, so you're aware of them too. I don't really think I have anything more to add. I'm not really a hardware expert, and I think the kind of analysis you're looking for would need to very carefully control for a bunch of variables that I'm not well-equipped to do myself.

Consider, for example, that NEON only gives you 16-bit vectors, but AVX2 on Intel gives you 32-bit vectors (and yes, ripgrep uses them for simple searches like this).

As I would expect from benchmarks published online, the Macbook Pro is substantially faster than the 5700X at most tasks, usually in the order of 20-40%.

I think it's pretty tricky to assume that this will carry over to other workloads. For the kind of search you're benchmarking here, ripgrep is going to be using a highly optimized SIMD routine on the CPU. How NEON and AVX2 compare at a very precise level would be interesting to investigate.

One thing you could do is isolate this down to a much simpler set of programs. Maybe one that uses memmem from the memchr crate (which is what ripgrep will ultimately use for my search above). And then another that uses a simple scalar algorithm. That would give you two comparison points. If the SIMD comparison reveals a much bigger gap, then maybe you have your answer that NEON sucks. But if they're roughly the same, then you can switch your investigation over to the scalar benchmark which is a much simpler model.

1 reply

phodge Jun 19, 2025
Author

Hi Andrew, thank you for the detailed response. I also have access to an older 12-core M2 Macbook Pro which is the original machine provided by my employer, and I ran the same benchmark on it:

$ hyperfine 'rg spaghetti ./ | cat'
Benchmark 1: rg spaghetti ./ | cat
  Time (mean ± σ):      2.964 s ±  0.545 s    [User: 0.487 s, System: 28.284 s]
  Range (min … max):    1.895 s …  3.550 s    10 runs

Despite this being a 12-core CPU my performance is much worse than what you're seeing on your 8-core Mac Mini, but roughly on par with my newer work-issued 14-core M4 Macbook Pro.

Taken with the other benchmarks above, to me this suggests that the M2 series Apple Silicon should be in the same vicinity with AMD Zen 3 (assuming sufficient RAM and a decent SSD), but there is something about my work-issued Macs that is causing them to fall far behind the expected Mac experience. Two suspects immediately come to mind:

My work-issued machines have full disk encryption turned on.
My work-issued machines have Crowdstrike and other security-related programs that may be scanning files as they are read.

Given the Mac mini benchmarks you've posted being in the same ballpark as my 5700X, I think I'm happy to consider this "Answered" in the very general sense in which I originally asked. Hopefully at some point I'll be able to come back to this and determine whether full disk encryption is a bottleneck on macOS.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Realistic performance expectations for Apple Silicon vs x86 #3061

Uh oh!

{{title}}

Uh oh!

Replies: 2 comments 2 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Uh oh!

Realistic performance expectations for Apple Silicon vs x86 #3061

Uh oh!

phodge Jun 3, 2025

Replies: 2 comments · 2 replies

Uh oh!

Uh oh!

phodge Jun 3, 2025 Author

Uh oh!

BurntSushi Jun 3, 2025 Maintainer

Uh oh!

BurntSushi Jun 3, 2025 Maintainer

Uh oh!

phodge Jun 19, 2025 Author

phodge
Jun 3, 2025

Replies: 2 comments 2 replies

phodge
Jun 3, 2025
Author

BurntSushi Jun 3, 2025
Maintainer

BurntSushi
Jun 3, 2025
Maintainer

phodge Jun 19, 2025
Author