Replies: 1 comment
-
Without repro instructions this qualifies just as user error. Provide the commands and more information about your system in order to get some support |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
My previous compile of llama.cpp was in April. Today, I recompiled with master (b3130) and ran with the same model (miqu-1-70b.q4_k_m.gguf) and there is a significant drop in inference speed 0.5tk/s and stalls after one or two lines of text. Before it was running at 2-4tk/s. That's ~ 4X-8X slowdown.
I reverted to an older version (b2766) and recompiled and the normal speed is back.
Is it accepted that newer features slow inference down or does this qualify as a bug?
Beta Was this translation helpful? Give feedback.
All reactions