Skip to content

[CB][do not merge] enable warmup for batch size 1 #287

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 10 commits into
base: main
Choose a base branch
from

Conversation

yannicks1
Copy link
Collaborator

@yannicks1 yannicks1 commented Jul 8, 2025

[CB] enable warmup for batch size 1

in #285 we wanted to allow batch size 1 for continuous batching. However, the warmup did not support batch size 1. With this small fix it does work on CPU at least. It does not currently compile on the card. Would need to compare graphs next...

joerunde and others added 3 commits July 7, 2025 15:33
Signed-off-by: Joe Runde <joe@joerun.de>
Signed-off-by: Joe Runde <joe@joerun.de>
Signed-off-by: Yannick Schnider <yannick.schnider1@ibm.com>
Copy link

github-actions bot commented Jul 8, 2025

👋 Hi! Thank you for contributing to vLLM support on Spyre.
Just a reminder: Make sure that your code passes all the linting checks, otherwise your PR won't be able to be merged. To do so, first install the linting requirements, then run format.sh and commit the changes. This can be done with uv directly:

uv sync --frozen --group lint --active --inexact

Or this can be done with pip:

uv pip compile --group lint > requirements-lint.txt
pip install -r requirements-lint.txt
bash format.sh

Now you are good to go 🚀

yannicks1 added 2 commits July 8, 2025 13:17
Signed-off-by: Yannick Schnider <yannick.schnider1@ibm.com>
Signed-off-by: Yannick Schnider <yannick.schnider1@ibm.com>
@yannicks1
Copy link
Collaborator Author

bot:test
TEST_FILE=tests/e2e/test_spyre_cb.py MARKERS="spyre"

@yannicks1 yannicks1 changed the title remove assertion batch size >= 2 (for warmup) [CB] enable warmup for batch size 1 Jul 8, 2025
@yannicks1 yannicks1 marked this pull request as ready for review July 8, 2025 14:12
@yannicks1 yannicks1 requested a review from joerunde July 8, 2025 14:12
@prashantgupta24
Copy link
Collaborator

prashantgupta24 commented Jul 8, 2025

the warmup did not support batch size 1

naive question - can we add a failing test for that which this PR fixes?

Edit: I guess we had the assert earlier which prevented batch size 1 from running 🤷

@joerunde
Copy link
Collaborator

joerunde commented Jul 8, 2025

ah interesting- I swear I had tested manually on a dev pod and saw batch size 1 working, but I dunno how that was happening when the test clearly failed 🤦.

Anyway nice fix!

@@ -317,6 +318,18 @@ def _warmup_spyre_dynamic_size(self, special_token_ids):
prompt_len = 42
num_decode_tokens = 2

# Fix for batch size 1: set input batch to fit 2 requests for warmup
if model_runner.vllm_config.scheduler_config.max_num_seqs == 1:
model_runner.input_batch = InputBatch(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternatively, could the InputBatch construct itself with:

self.max_num_reqs = min(max_num_reqs, 2)

since we know that it'll always need at least 2, and then we avoid reconstructing it in the worker here? That way we have a much smaller diff to back out once we can lift this bs>=2 restriction

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure if I follow here. it has to be >=2 for the warmup. with the min(1,2) we would still fail?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would that work if you directly set model_runner.input_batch.max_num_reqs = 2, instead of instantiating a new InputBatch?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no, because InputBatch initialization gets model_runner.input_batch.max_num_reqs..

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

max_num_seqs occurs 17 times in the init of the InputBatch. It is not a single attribute, but used to construct several attributes. So re-initializing is simpler...

@yannicks1
Copy link
Collaborator Author

I tried to test this on the card, but it failed. Not clear to me why though. Will have to compare graphs tomorrow.

@yannicks1 yannicks1 self-assigned this Jul 9, 2025
Base automatically changed from cb-batch-size-1 to main July 9, 2025 19:23
@yannicks1
Copy link
Collaborator Author

bot:test
TEST_FILE=tests/e2e/test_spyre_cb.py MARKERS="spyre"

1 similar comment
@waleedqk
Copy link
Collaborator

bot:test
TEST_FILE=tests/e2e/test_spyre_cb.py MARKERS="spyre"

Signed-off-by: Yannick Schnider <Yannick.Schnider1@ibm.com>
Signed-off-by: Yannick Schnider <Yannick.Schnider1@ibm.com>
@yannicks1
Copy link
Collaborator Author

bot:test
TEST_FILE=tests/e2e/test_spyre_cb.py MARKERS="spyre"

@yannicks1 yannicks1 changed the title [CB] enable warmup for batch size 1 [CB][do not merge] enable warmup for batch size 1 Jul 18, 2025
Signed-off-by: Yannick Schnider <yannick.schnider1@ibm.com>
@JRosenkranz
Copy link
Collaborator

I tried to test this on the card, but it failed. Not clear to me why though. Will have to compare graphs tomorrow.

@yannicks1 in order to support this, we will need to add the following which will enable symbolic shapes for size 1 dimensions (if marked dynamic):

from torch.fx.experimental import _config as config
config.backed_size_oblivious = True

@yannicks1 yannicks1 marked this pull request as draft July 18, 2025 12:00
@yannicks1
Copy link
Collaborator Author

thanks for the hint @JRosenkranz ! I think though your suggested changes should be tried with #312 which actually uses torch 2.7.1 and applies real batch size 1. This PR (#287) is doing padding under the hood. It is merely supposed to support --max_num_seqs 1 from a scheduling perspective...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants