Skip to content

how to measure time to first token (TTFT) and time between tokens (TBT) #13251

Answered by immaixq
afsara-ben asked this question in Q&A
Discussion options

You must be logged in to vote

Hello, I'm not sure if there's a more direct way, but here's my take on using llama-bench to estimate the TTFT and TBT:

To get the TTFT, we can use the -p test using llama-bench, which covers the time taken to process the prompt and the logic computation to get the potential first token.

For example, -p 100 performs a prompt processing test with 100 tokens.

llama-bench -m <model_path> -p <prompt_length> -o json

# output json
{
    "n_prompt": 100,
    "n_gen": 0,
    "avg_ns": 163327916,
    "stddev_ns": 6844086,
    "avg_ts": 613.081377,
    "stddev_ts": 24.347449,
    "samples_ns": [ 175529000, 160741208, 159486875, 160000792, 160881709 ],
    "samples_ts": [ 569.706, 622.118, 627.011, 6…

Replies: 1 comment

Comment options

You must be logged in to vote
0 replies
Answer selected by afsara-ben
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
2 participants