Is the continuous batching function enabled by default in vllm? #547
Closed
SeibertronSS
announced in
Q&A
Replies: 3 comments 1 reply
-
Yes, this is enabled by default and cannot be turned off. Turning off continuous batching requires a rewrite of our system architecture, which also brings no benefit in performance. Therefore, we did not implement this. |
Beta Was this translation helpful? Give feedback.
1 reply
-
Does continues batching has a way to be adjusted? I found that the latency increased significantly when using api_server than offline inference. |
Beta Was this translation helpful? Give feedback.
0 replies
-
How do we control the batch size and timeout for continous batching though? |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Is the continuous batching function enabled by default in vllm? Can this feature be turned on or off selectively?
Beta Was this translation helpful? Give feedback.
All reactions