-
-
Notifications
You must be signed in to change notification settings - Fork 8.8k
[V1] Logits processors extensibility #19912
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: Andrew Feldman <afeldman@redhat.com>
Signed-off-by: Andrew Feldman <afeldman@redhat.com>
Signed-off-by: Andrew Feldman <afeldman@redhat.com>
Signed-off-by: Andrew Feldman <afeldman@redhat.com>
Signed-off-by: Andrew Feldman <afeldman@redhat.com>
Signed-off-by: Andrew Feldman <afeldman@redhat.com>
Signed-off-by: Andrew Feldman <afeldman@redhat.com>
Signed-off-by: Andrew Feldman <afeldman@redhat.com>
Signed-off-by: Andrew Feldman <afeldman@redhat.com>
Signed-off-by: Andrew Feldman <afeldman@redhat.com>
This pull request has merge conflicts that must be resolved before it can be |
Signed-off-by: Andrew Feldman <afeld2012@gmail.com>
At first it wasn't obvious if it would be easy to integrate the changes of PR vllm-project/vllm#16728 so initially I added PR that copies the sampler files previous to that PR in vllm-spyre. But actually it's easier than I thought because the sampler code is not compiled to the AIU, only the model forward is. Currently in the MinP processor there is a tensor for the cpu and for the device. Since only the model forward runs on the AIU, both tensors end up on the CPU, which means that there is an unnecessary copy from one to the other, but the result is still correct. There is a future upstream PR that will generalize the Logits processor to other sampling parameters: vllm-project/vllm#19912 Signed-off-by: Max de Bayser <mbayser@br.ibm.com> Co-authored-by: Joe Runde <joe@joerun.de>
This pull request has merge conflicts that must be resolved before it can be |
This PR looks great as it refactors the code related to LogitsProcessors, splitting it into separate files. I'm currently working on a new logits processor to limit thinking tokens (PR: #20859), and I needed the changes in this PR, so I hope it gets merged soon. |
Signed-off-by: Andrew Feldman <afeldman@redhat.com>
Signed-off-by: Andrew Feldman <afeldman@redhat.com>
Signed-off-by: Andrew Feldman <afeldman@redhat.com>
Signed-off-by: Andrew Feldman <afeldman@redhat.com>
Signed-off-by: Andrew Feldman <afeldman@redhat.com>
Purpose
Enable V1 logits processors support to be extended with custom logits processors.
New Python interface for custom logitsprocs
New CLI interface for custom logitsprocs
Test Plan
(WIP)
Test Result
WIP
(Optional) Documentation Update
WIP
Fixes #17799
Fixes #12678