"PLZ!"Is possible to customize the request schedule strategy "continuous batching" ? #9616
Noblezhong
announced in
Q&A
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Hi, I am a graduate students who interested in request schedule strategy of LLM. I have surveyed many framwork like Orca, Deepspeed, Tensorrt-llm. But I found these inference framework don't share how to implement their request strategy(e.g. iteration-levle schedule, dynamic batching etc.). So I turn to vLLM which also has a similar schedule strategy outperforming traditional static batching. But I read the documention found there are no tutorial teach me how to implement and customize it. What's more when I seek answer in 'issue' part, it seems that the continuous batching is enabled by default and has no chance to degrade to static batching.
So I wonder if there any demo or tutorial build for continuous batching, or just how to customize this excellent strategy. SRY I am a freshman in both vLLM and LLM inference. orz
Beta Was this translation helpful? Give feedback.
All reactions