Skip to content

[CB] Add scheduling tests #329

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Jul 24, 2025
Merged

[CB] Add scheduling tests #329

merged 5 commits into from
Jul 24, 2025

Conversation

sducouedic
Copy link
Collaborator

@sducouedic sducouedic commented Jul 22, 2025

This PR adds a scheduling steps test where new prompts are joining during the decode of other sequences, when there is still room left in the batch for new sequences.

Execution was tested on AIU as well (passing)

Signed-off-by: Sophie du Couédic <sop@zurich.ibm.com>
Signed-off-by: Sophie du Couédic <sop@zurich.ibm.com>
Signed-off-by: Sophie du Couédic <sop@zurich.ibm.com>
Signed-off-by: Sophie du Couédic <sop@zurich.ibm.com>
Copy link

👋 Hi! Thank you for contributing to vLLM support on Spyre.
Just a reminder: Make sure that your code passes all the linting checks, otherwise your PR won't be able to be merged. To do so, first install the linting requirements, then run format.sh and commit the changes. This can be done with uv directly:

uv sync --frozen --group lint --active --inexact

Or this can be done with pip:

uv pip compile --group lint > requirements-lint.txt
pip install -r requirements-lint.txt
bash format.sh

Now you are good to go 🚀

@sducouedic sducouedic force-pushed the add_scheduling_tests branch 2 times, most recently from 0915466 to 59e7270 Compare July 23, 2025 11:03
@sducouedic
Copy link
Collaborator Author

bot:test
MARKERS="spyre and not quantized and not multi and not embedding"

Copy link
Collaborator

@prashantgupta24 prashantgupta24 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excited for this!

@sducouedic
Copy link
Collaborator Author

Let me break this PR in to two PRs

  1. This PR: add a simple additional test for steps
  2. Another PR: check the end output of scheduling steps tests

@sducouedic sducouedic force-pushed the add_scheduling_tests branch from ac373c0 to 6e1f33a Compare July 24, 2025 09:59
Signed-off-by: Sophie du Couédic <sop@zurich.ibm.com>
@sducouedic sducouedic marked this pull request as ready for review July 24, 2025 10:05
@sducouedic sducouedic requested a review from rafvasq as a code owner July 24, 2025 10:05
@sducouedic sducouedic enabled auto-merge (squash) July 24, 2025 10:12
@github-actions github-actions bot added the ready label Jul 24, 2025
Copy link
Collaborator

@maxdebayser maxdebayser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, this is very easy to follow with pencil and paper. It's almost like a documentation.

Configuration:
* max_num_seqs: 4
* number of prompts: 4
* 1: len = 49, max tokens = 119, step joining = 0
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: maybe start counting at 0 here to use the sequence IDs

@sducouedic sducouedic merged commit 2d65d56 into main Jul 24, 2025
15 of 18 checks passed
@sducouedic sducouedic deleted the add_scheduling_tests branch July 24, 2025 17:05
Comment on lines +501 to +503
seqs_max_tokens = [119, 52, 104, 64]
prompts_lengths = [49, 14, 89, 9]
steps_add_reqs = [0, 0, 32, 131]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we can lower the values here - the time for CB testing on CPU is on the rise, maybe (if possible) having shorter max_tokens can speed tests up if the test logic remains the same

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the eventual goal could be to reduce the total number of steps - the lesser the steps, the faster the test. I don't think we really need 197 steps for this test case?

Copy link
Collaborator

@prashantgupta24 prashantgupta24 Jul 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something like

    seqs_max_tokens = [3, 10, 5]
    prompts_lengths = [10, 10, 10]
    steps_add_reqs = [0, 0, 5]

where request 0 would finish first, request 1 would be still decoding when request 2 shows up? Or am I missing something obvious?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this can be made to work with lesser max_tokens, then perhaps we can open an issue to change all tests within this file to use lesser values to speed things up?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants