-
Notifications
You must be signed in to change notification settings - Fork 18
[CB] Add scheduling tests #329
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: Sophie du Couédic <sop@zurich.ibm.com>
Signed-off-by: Sophie du Couédic <sop@zurich.ibm.com>
Signed-off-by: Sophie du Couédic <sop@zurich.ibm.com>
👋 Hi! Thank you for contributing to vLLM support on Spyre.
Or this can be done with
Now you are good to go 🚀 |
0915466
to
59e7270
Compare
bot:test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Excited for this!
Let me break this PR in to two PRs
|
ac373c0
to
6e1f33a
Compare
Signed-off-by: Sophie du Couédic <sop@zurich.ibm.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, this is very easy to follow with pencil and paper. It's almost like a documentation.
Configuration: | ||
* max_num_seqs: 4 | ||
* number of prompts: 4 | ||
* 1: len = 49, max tokens = 119, step joining = 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggestion: maybe start counting at 0 here to use the sequence IDs
seqs_max_tokens = [119, 52, 104, 64] | ||
prompts_lengths = [49, 14, 89, 9] | ||
steps_add_reqs = [0, 0, 32, 131] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if we can lower the values here - the time for CB testing on CPU is on the rise, maybe (if possible) having shorter max_tokens can speed tests up if the test logic remains the same
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the eventual goal could be to reduce the total number of steps - the lesser the steps, the faster the test. I don't think we really need 197
steps for this test case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Something like
seqs_max_tokens = [3, 10, 5]
prompts_lengths = [10, 10, 10]
steps_add_reqs = [0, 0, 5]
where request 0 would finish first, request 1 would be still decoding when request 2 shows up? Or am I missing something obvious?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this can be made to work with lesser max_tokens
, then perhaps we can open an issue to change all tests within this file to use lesser values to speed things up?
This PR adds a scheduling steps test where new prompts are joining during the decode of other sequences, when there is still room left in the batch for new sequences.
Execution was tested on AIU as well (passing)