how to measure time to first token (TTFT) and time between tokens (TBT) #13251

afsara-ben · 2025-05-02T02:43:55Z

afsara-ben
May 2, 2025

I cannot seem to find any metric for time to first token (TTFT) or time between tokens (TBT) in llama.cpp. I can ofcourse manually calculate them but was wondering if its already there

Answered by immaixq

Jun 5, 2025

Hello, I'm not sure if there's a more direct way, but here's my take on using llama-bench to estimate the TTFT and TBT:

To get the TTFT, we can use the -p test using llama-bench, which covers the time taken to process the prompt and the logic computation to get the potential first token.

For example, -p 100 performs a prompt processing test with 100 tokens.

llama-bench -m <model_path> -p <prompt_length> -o json

# output json
{
    "n_prompt": 100,
    "n_gen": 0,
    "avg_ns": 163327916,
    "stddev_ns": 6844086,
    "avg_ts": 613.081377,
    "stddev_ts": 24.347449,
    "samples_ns": [ 175529000, 160741208, 159486875, 160000792, 160881709 ],
    "samples_ts": [ 569.706, 622.118, 627.011, 6…

View full answer

immaixq · 2025-06-05T17:24:53Z

immaixq
Jun 5, 2025

Hello, I'm not sure if there's a more direct way, but here's my take on using llama-bench to estimate the TTFT and TBT:

To get the TTFT, we can use the -p test using llama-bench, which covers the time taken to process the prompt and the logic computation to get the potential first token.

For example, -p 100 performs a prompt processing test with 100 tokens.

llama-bench -m <model_path> -p <prompt_length> -o json

# output json
{
    "n_prompt": 100,
    "n_gen": 0,
    "avg_ns": 163327916,
    "stddev_ns": 6844086,
    "avg_ts": 613.081377,
    "stddev_ts": 24.347449,
    "samples_ns": [ 175529000, 160741208, 159486875, 160000792, 160881709 ],
    "samples_ts": [ 569.706, 622.118, 627.011, 624.997, 621.575 ]
   }

The avg_ns returned from the execution can indicate the TTFT. This latency represents the average time taken to process the n prompt tokens and prepare the logits for the first potential output token excluding sampling time.

To get the TTFT, we convert this avg_ns: 163327916 / 1000000 to 163.33 ms.

Therefore, for this specific test with a 100-token prompt, the TTFT, that is the time until the model has prepared the logits for the 101st token and is ready for sampling, is approximately 163.33 ms.

The time between token (TBT) measures the average time it takes to generate each subsequent token after the first token.

To measure the time between token, we can use -n test, which is the time taken to generate x tokens and take the reciprocal of avg_ts or take avg_ns/n_gen to get the average time to generate a single token.

For example, -n 10 performs a test that generates 10 tokens,

llama-bench -m <model_path> -n <number of tokens to generate> -o json

# output json
{
    "n_gen": 10,
    "avg_ns": 389868091,
    "stddev_ns": 9010942,
    "avg_ts": 25.660775,
}

Using avg_ts, TBT = (1 / 25.660775 token/second) * 1000 = 38.9 ms/token or using avg_ns, TBT = (389,868,091 ns / 10 tokens) / 1,000,000 ns/ms = 38.99 ms/token

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

how to measure time to first token (TTFT) and time between tokens (TBT) #13251

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Select a reply

Uh oh!

how to measure time to first token (TTFT) and time between tokens (TBT) #13251

Uh oh!

Uh oh!

afsara-ben May 2, 2025

Replies: 1 comment

Uh oh!

Uh oh!

immaixq Jun 5, 2025

afsara-ben
May 2, 2025

immaixq
Jun 5, 2025