Curious about the bechmark in llama.cpp #14751

YaoCheng-Chang · 2025-07-18T07:02:46Z

YaoCheng-Chang
Jul 18, 2025

Since I try to estimate the HW device efficiency with running LLM in llama.cpp, I would like to understand the meanings of these time clearly.

Can I define that the prefill time is equal to the sum of sampling time, load time, and prompt eval time ?
My opinion is these benchmark are calculated before the first token is generated. However, if i lost the time of generating the first token in the eval time.

If someone have any opinion, please let me know. Thanks !

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Curious about the bechmark in llama.cpp #14751

Uh oh!

{{title}}

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

Curious about the bechmark in llama.cpp #14751

Uh oh!

YaoCheng-Chang Jul 18, 2025

Replies: 0 comments

YaoCheng-Chang
Jul 18, 2025