how to measure time to first token (TTFT) and time between tokens (TBT) #13251
-
I cannot seem to find any metric for |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment
-
Hello, I'm not sure if there's a more direct way, but here's my take on using llama-bench to estimate the TTFT and TBT: To get the TTFT, we can use the For example, llama-bench -m <model_path> -p <prompt_length> -o json
# output json
{
"n_prompt": 100,
"n_gen": 0,
"avg_ns": 163327916,
"stddev_ns": 6844086,
"avg_ts": 613.081377,
"stddev_ts": 24.347449,
"samples_ns": [ 175529000, 160741208, 159486875, 160000792, 160881709 ],
"samples_ts": [ 569.706, 622.118, 627.011, 624.997, 621.575 ]
}
The To get the TTFT, we convert this Therefore, for this specific test with a 100-token prompt, the TTFT, that is the time until the model has prepared the logits for the 101st token and is ready for sampling, is approximately 163.33 ms. The time between token (TBT) measures the average time it takes to generate each subsequent token after the first token. To measure the time between token, we can use For example, llama-bench -m <model_path> -n <number of tokens to generate> -o json
# output json
{
"n_gen": 10,
"avg_ns": 389868091,
"stddev_ns": 9010942,
"avg_ts": 25.660775,
} Using |
Beta Was this translation helpful? Give feedback.
Hello, I'm not sure if there's a more direct way, but here's my take on using llama-bench to estimate the TTFT and TBT:
To get the TTFT, we can use the
-p
test usingllama-bench
, which covers the time taken to process the prompt and the logic computation to get the potential first token.For example,
-p 100
performs a prompt processing test with 100 tokens.