llama-bench and llama-cli inference differs #13273

afsara-ben · 2025-05-02T22:18:49Z

afsara-ben
May 2, 2025

When I run llama-bench I get an eval rate of 88t/s but the same prompt len and --n using llama-cli gives me 85 t/s. Is there a reason for the difference?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

llama-bench and llama-cli inference differs #13273

Uh oh!

{{title}}

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

llama-bench and llama-cli inference differs #13273

Uh oh!

afsara-ben May 2, 2025

Replies: 0 comments

afsara-ben
May 2, 2025