Replies: 1 comment
-
The streaming output is ordered and token-by-token. We follow the same semantics as OpenAI API |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
By in order, I mean, like this:
"I" -> "I like" -> "I Like vLLM" -> "I Like vLLM\n"
What I want to implement is that I want to stream inference returns only incremental characters, not whole output. So to achieve this goal, I must store the last output. I must ensure the last output is REAL last output, not out of ordered output.
Again, sorry of my poor English. :)
Beta Was this translation helpful? Give feedback.
All reactions