Replies: 1 comment
-
I am also having the same problem with generation of old tokens everytime when a new token is generated. Can someone help how to stream the response from vllm. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
First, I am sorry for so many posts written by me. It's just I am a newbie to vllm.
I'm using vllm for stream inference. But I have found vllm generates every token including previous one. So for temporary solution, I store last output in a variable, and when current step inference finished, I cut the result by the previous one.
I want to know is there a way(like a configuration) to only generate new tokens
Beta Was this translation helpful? Give feedback.
All reactions