Need help understanding llama-bench.exe outputs #9124
-
Hello, I am trying to understand the llama-bench.exe JSON outputs. Take this output for example: llamacpp_output.json What are these metrics, how are they calculated? I am trying to find/calculate:
Also, quite confused the JSON outputs don't match the 'llama_print_timing' metrics with the '--verbose' flag: llamacpp_verbose_output.json. Not sure which tok/s or latency to use as I've used the the 'llama_print_timing' metrics before llama-bench was added. |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 5 replies
-
You can do this with a
You can do this with a
You can do this with a Note that the tests are performed from an empty context, but in practice the performance decreases as the size of the context increases, so the results cannot be extrapolated to every situation.
|
Beta Was this translation helpful? Give feedback.
The throughput in
avg_ts
includes the prompt tokens, ie. it's calculated as200 tokens / 1814609460 ns
. If you prefer to count only the generated tokens, you can calculate that yourself, which would give you the 55 t/s.