Is there a way that vllm can only generate new tokens in response? #3427

gaoxt1983 · 2024-03-15T09:39:25Z

gaoxt1983
Mar 15, 2024

First, I am sorry for so many posts written by me. It's just I am a newbie to vllm.

I'm using vllm for stream inference. But I have found vllm generates every token including previous one. So for temporary solution, I store last output in a variable, and when current step inference finished, I cut the result by the previous one.

I want to know is there a way(like a configuration) to only generate new tokens

SanjayGudla · 2024-04-11T05:26:17Z

SanjayGudla
Apr 11, 2024

I am also having the same problem with generation of old tokens everytime when a new token is generated. Can someone help how to stream the response from vllm.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Is there a way that vllm can only generate new tokens in response? #3427

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Uh oh!

Is there a way that vllm can only generate new tokens in response? #3427

Uh oh!

gaoxt1983 Mar 15, 2024

Replies: 1 comment

Uh oh!

SanjayGudla Apr 11, 2024

gaoxt1983
Mar 15, 2024

SanjayGudla
Apr 11, 2024