Massive slowdown on Linux #8582
Replies: 1 comment 1 reply
-
@MrJackSpade Hm interesting, I do not seem to see this with the current version of llama.cpp and Ryzen 3700x what Linux are you using? Is it possible that Linux missing some drivers for your CPU? |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
I'm using the same commit of Llama.cpp on a Linux machine, and a Windows machine.
Both machines have DDR4 memory. I've tested the memory speeds, and the Windows machine is ~35,000MiB and the Linux machine is ~30,000MiB
For some reason though, the Windows machine is running ~4x faster than the Linux machine using the same(ish) settings. The linux machine used to be windows, and as far as I remember it ran about the same speed as the current windows machine, which makes sense because the memory speeds are about the same.
Both have CUDA support compiled in for CUBLAS, however both are running 0 layers offloaded to GPU
This model I'm using to test is
L3-8B-Celeste-v1.Q8_0.gguf
This is the Linux machine
This is the Windows machine
The machine specs themselves are pretty different, with the Windows machine having a 3090 and a 5900x while the Linux machine is a laptop with a 3080m and an i7-11800H, but since its pure CPU inference I'm of the understanding that it should be the RAM speed that really determines the inference speed, right?
I swear to god I remember getting approx the same speed on pure CPU before moving one of the machines over to Linux. Maybe not the exact same, but not a 4x difference!
Beta Was this translation helpful? Give feedback.
All reactions