Curious about the bechmark in llama.cpp #14751
Unanswered
YaoCheng-Chang
asked this question in
Q&A
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Since I try to estimate the HW device efficiency with running LLM in llama.cpp, I would like to understand the meanings of these time clearly.

Can I define that the prefill time is equal to the sum of sampling time, load time, and prompt eval time ?
My opinion is these benchmark are calculated before the first token is generated. However, if i lost the time of generating the first token in the eval time.
If someone have any opinion, please let me know. Thanks !
Beta Was this translation helpful? Give feedback.
All reactions