why ngram speculative decode got the diff result with no speculative decode #9362

liuteng · 2024-10-15T06:41:46Z

liuteng
Oct 15, 2024

In my case, ngram speculative got diff result with no speculative decode, and even diff with different ngram-prompt-lookup-min args (1 and 2)

I just run the same model with different vllm args like:
python -m vllm.entrypoints.api_server --host 127.0.0.1 --trust-remote-code --disable-custom-all-reduce --use-v2-block-manager --enable-prefix-caching --max-model-len 3072 --gpu-memory-utilization 0.7 --model /path/to/model/ --port 9122
and
python -m vllm.entrypoints.api_server --host 127.0.0.1 --trust-remote-code --disable-custom-all-reduce --use-v2-block-manager --enable-prefix-caching --max-model-len 3072 --gpu-memory-utilization 0.7 --model /path/to/model/ --port 9122 --speculative-model [ngram] --num-speculative-tokens 6 --ngram-prompt-lookup-max 5 --ngram-prompt-lookup-min 2

is it should got the same result when I use the same query with "temperature" == 0 ?

liuteng · 2024-10-15T13:41:22Z

liuteng
Oct 15, 2024
Author

I tried v0.4.2 to v0.6.2, all of them has the same problem

0 replies

liuteng · 2024-10-15T15:13:37Z

liuteng
Oct 15, 2024
Author

I changed self.disable_logprobs = True to self.disable_logprobs = False in TargetModelRunner
and then got a confused log info about the prefill output, the outputs gives the right result, but sampled_token_ids has wrong:

[[SamplerOutput(outputs=[CompletionSequenceGroupOutput(samples=[SequenceOutput(parent_seq_id=0, output_token=50006, logprobs={50006: Logprob(logprob=0.0, rank=1, decoded_token=None)})], prompt_logprobs=None)], sampled_token_probs=torch.Size([1, 115584]), sampled_token_ids=[[26888]], spec_decode_worker_metrics=None)]]

maybe there is something wrong in func _sample_with_torch in sampler.py

4 replies

comaniac Oct 15, 2024
Collaborator

Can you try to use an open model like llama3 to see if the problem can be reproduced, and posted your command (both server and client) here to help investigation? Thanks

liuteng Oct 16, 2024
Author

Hi, I finally find the cause, we miss the sop process in speculative decode as it is in _build_sampler_output:

if is_prompt and prms.sop_token_id:
seq_outputs.append(
SequenceOutput(seq_id, prms.sop_token_id, {prms.sop_token_id: Logprob(0.0, 1)}))
else:
seq_outputs.append(
SequenceOutput(seq_id, next_token_id, logprobs))

and we should add this one?

comaniac Oct 16, 2024
Collaborator

I don't really get it, but if it's an easy fix please feel free to directly send a PR. Thanks

liuteng Oct 16, 2024
Author

I'm sorry, this is a issue about our own version of vllm, not this public one

liuteng · 2024-10-16T03:19:31Z

liuteng
Oct 16, 2024
Author

but still, I has another case that is not caused by the sop_token_id
in that case, the begin part of the outputs are the same in both normal decode and ngram decode ,but got diff result from some where
maybe there are two token have the same probs I thhink as the model is train by some spec data

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

why ngram speculative decode got the diff result with no speculative decode #9362

Uh oh!

{{title}}

Uh oh!

Replies: 3 comments 4 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Uh oh!

why ngram speculative decode got the diff result with no speculative decode #9362

Uh oh!

liuteng Oct 15, 2024

Replies: 3 comments · 4 replies

Uh oh!

liuteng Oct 15, 2024 Author

Uh oh!

liuteng Oct 15, 2024 Author

Uh oh!

comaniac Oct 15, 2024 Collaborator

Uh oh!

liuteng Oct 16, 2024 Author

Uh oh!

comaniac Oct 16, 2024 Collaborator

Uh oh!

liuteng Oct 16, 2024 Author

Uh oh!

liuteng Oct 16, 2024 Author

liuteng
Oct 15, 2024

Replies: 3 comments 4 replies

liuteng
Oct 15, 2024
Author

liuteng
Oct 15, 2024
Author

comaniac Oct 15, 2024
Collaborator

liuteng Oct 16, 2024
Author

comaniac Oct 16, 2024
Collaborator

liuteng Oct 16, 2024
Author

liuteng
Oct 16, 2024
Author