Why do we get different logits when the number of tokens in the batch is different? #9837

sjtu-zwh · 2024-10-11T05:37:29Z

sjtu-zwh
Oct 11, 2024

When the batch have only one token:

  //prompt: "Hello my name is"
  llama_batch batch = llama_batch_init(ctx_params.n_ctx, 0, 1);
  llama_decode(ctx, llama_batch_get_one(prompt_tokens.data(), prompt_tokens.size() - 1, 0, 0));
  common_batch_add(batch, prompt_tokens.back(), prompt_tokens.size() - 1, {0}, true);
  llama_decode(ctx, batch);
  new_token_id = llama_sampler_sample(smpl, ctx, 0);
  // output:  new_token_id is 435, logit is 10.115658

When the batch have more than one token:

  //prompt: "Hello my name is"
  llama_batch batch = llama_batch_init(ctx_params.n_ctx, 0, 1);
  llama_decode(ctx, llama_batch_get_one(prompt_tokens.data(), prompt_tokens.size() - 1, 0, 0));
  //just an example batch
  common_batch_add(batch, prompt_tokens.back(), prompt_tokens.size() - 1, {0}, true);
  common_batch_add(batch, prompt_tokens.back(), prompt_tokens.size()    , {0}, true);
  common_batch_add(batch, prompt_tokens.back(), prompt_tokens.size() + 1, {0}, true);
  llama_decode(ctx, batch);
  new_token_id = llama_sampler_sample(smpl, ctx, 0);
  // output:  new_token_id is 435, logit is 10.103312

This question was found when I want to implement the speculative decoding by myself. The target model should verify all tokens in batch in parallel.But It got different logit about the first token in the batch compared with verifying token by token. I don`t no where the problem is. Can anyone help me? Thanks a lot!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Why do we get different logits when the number of tokens in the batch is different? #9837

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

Why do we get different logits when the number of tokens in the batch is different? #9837

Uh oh!

Uh oh!

sjtu-zwh Oct 11, 2024

Replies: 0 comments

sjtu-zwh
Oct 11, 2024