-
Notifications
You must be signed in to change notification settings - Fork 18
[CB][do not merge] enable warmup for batch size 1 #287
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: Joe Runde <joe@joerun.de>
Signed-off-by: Joe Runde <joe@joerun.de>
Signed-off-by: Yannick Schnider <yannick.schnider1@ibm.com>
👋 Hi! Thank you for contributing to vLLM support on Spyre.
Or this can be done with
Now you are good to go 🚀 |
Signed-off-by: Yannick Schnider <yannick.schnider1@ibm.com>
Signed-off-by: Yannick Schnider <yannick.schnider1@ibm.com>
bot:test |
naive question - can we add a failing test for that which this PR fixes? Edit: I guess we had the assert earlier which prevented batch size 1 from running 🤷 |
ah interesting- I swear I had tested manually on a dev pod and saw batch size 1 working, but I dunno how that was happening when the test clearly failed 🤦. Anyway nice fix! |
@@ -317,6 +318,18 @@ def _warmup_spyre_dynamic_size(self, special_token_ids): | |||
prompt_len = 42 | |||
num_decode_tokens = 2 | |||
|
|||
# Fix for batch size 1: set input batch to fit 2 requests for warmup | |||
if model_runner.vllm_config.scheduler_config.max_num_seqs == 1: | |||
model_runner.input_batch = InputBatch( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alternatively, could the InputBatch
construct itself with:
self.max_num_reqs = min(max_num_reqs, 2)
since we know that it'll always need at least 2, and then we avoid reconstructing it in the worker here? That way we have a much smaller diff to back out once we can lift this bs>=2 restriction
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not sure if I follow here. it has to be >=2 for the warmup. with the min(1,2) we would still fail?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would that work if you directly set model_runner.input_batch.max_num_reqs = 2
, instead of instantiating a new InputBatch
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no, because InputBatch
initialization gets model_runner.input_batch.max_num_reqs
..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
max_num_seqs
occurs 17 times in the init of the InputBatch
. It is not a single attribute, but used to construct several attributes. So re-initializing is simpler...
I tried to test this on the card, but it failed. Not clear to me why though. Will have to compare graphs tomorrow. |
bot:test |
1 similar comment
bot:test |
Signed-off-by: Yannick Schnider <Yannick.Schnider1@ibm.com>
Signed-off-by: Yannick Schnider <Yannick.Schnider1@ibm.com>
bot:test |
Signed-off-by: Yannick Schnider <yannick.schnider1@ibm.com>
@yannicks1 in order to support this, we will need to add the following which will enable symbolic shapes for size 1 dimensions (if marked dynamic):
|
thanks for the hint @JRosenkranz ! I think though your suggested changes should be tried with #312 which actually uses torch 2.7.1 and applies real batch size 1. This PR (#287) is doing padding under the hood. It is merely supposed to support |
[CB] enable warmup for batch size 1
in #285 we wanted to allow batch size 1 for continuous batching. However, the warmup did not support batch size 1. With this small fix it does work on CPU at least. It does not currently compile on the card. Would need to compare graphs next...