llama-bench and llama-cli inference differs #13273
Unanswered
afsara-ben
asked this question in
Q&A
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
When I run
llama-bench
I get an eval rate of 88t/s but the same prompt len and --n usingllama-cli
gives me 85 t/s. Is there a reason for the difference?Beta Was this translation helpful? Give feedback.
All reactions