Can anyone explain why prompt evaluation on Macs is so slow? #1945

shouyiwang · 2023-06-20T00:13:59Z

shouyiwang
Jun 20, 2023

Using Llama.cpp on Mac is such a pain because the prompt evaluation time is incredibly slow.
Look at these three outputs:

CPU (Ryzen 5600) only on Linux with 13b q6k:

llama_print_timings: prompt eval time =  1146.71 ms /   111 tokens (   10.33 ms per token,    96.80 tokens per second)
llama_print_timings:        eval time = 35998.84 ms /   127 runs   (  283.46 ms per token,     3.53 tokens per second)

GPU (4090) on Linux with 13b q6k:

llama_print_timings: prompt eval time =   225.93 ms /   111 tokens (    2.04 ms per token,   491.30 tokens per second)
llama_print_timings:        eval time =  1895.24 ms /   127 runs   (   14.92 ms per token,    67.01 tokens per second)

M1 with 13b q4km (GPU or CPU speed is almost the same on M1):

llama_print_timings: prompt eval time = 13767.84 ms /   111 tokens (  124.03 ms per token,     8.06 tokens per second)
llama_print_timings:        eval time = 22384.10 ms /   127 runs   (  176.25 ms per token,     5.67 tokens per second)

On Linux, the prompt evaluation time for CPUs is 27.4 times faster than token generation, while GPUs are 7.3 times faster. However, on Mac, it is only 1.4 times faster. This discrepancy doesn't make sense.
Additionally, Ryzen 5600 on Linux is reported to be 12 times faster than the CPU of M1 in terms of prompt evaluation, which is also puzzling. From what I understand, these two CPUs have similar performance in various benchmarks.

Why?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Can anyone explain why prompt evaluation on Macs is so slow? #1945

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

Can anyone explain why prompt evaluation on Macs is so slow? #1945

Uh oh!

Uh oh!

shouyiwang Jun 20, 2023

Replies: 0 comments

shouyiwang
Jun 20, 2023