Skip to content

[do not merge][CB] requesting only one token via index (fms api change) #253

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 13 commits into
base: main
Choose a base branch
from

Conversation

yannicks1
Copy link
Collaborator

@yannicks1 yannicks1 commented Jun 23, 2025

[do not merge][CB] requesting only one token via index (fms api change)

To minimize data transfer for CB, we only want to request the last prompt tokens logits instead of the the entire prompts logits.
Therefore the flag only_last_token: bool will be replaced by the argument index: int in the fms forward api. When passing and index i, fms will return the logits for the token at i-th position only. A draft implementation of this in fms can be found here.

Note: we only request the last token logits for static batching by default, as no right padding is ever required there.

changes:

  • install fms feature branch for testing
  • passing index instead of only_last_token
  • set number of right pads for decode to 0

solves #254

Signed-off-by: Yannick Schnider <yannick.schnider1@ibm.com>
Signed-off-by: Yannick Schnider <yannick.schnider1@ibm.com>
Copy link

👋 Hi! Thank you for contributing to vLLM support on Spyre.
Just a reminder: Make sure that your code passes all the linting checks, otherwise your PR won't be able to be merged. To do so, first install the linting requirements, then run format.sh and commit the changes. This can be done with uv directly:

uv sync --frozen --group lint --active --inexact

Or this can be done with pip:

uv pip compile --group lint > requirements-lint.txt
pip install -r requirements-lint.txt
bash format.sh

Now you are good to go 🚀

Signed-off-by: Yannick Schnider <yannick.schnider1@ibm.com>
Signed-off-by: Yannick Schnider <yannick.schnider1@ibm.com>
Signed-off-by: Yannick Schnider <yannick.schnider1@ibm.com>
@yannicks1 yannicks1 changed the title [do not merge][CB] requesting only last token per default [do not merge][CB] requesting only one token via index (fms api change) Jun 30, 2025
yannicks1 and others added 6 commits July 2, 2025 08:16
Signed-off-by: Yannick Schnider <yannick.schnider1@ibm.com>
Signed-off-by: Yannick Schnider <Yannick.Schnider1@ibm.com>
Signed-off-by: Yannick Schnider <Yannick.Schnider1@ibm.com>
Signed-off-by: Yannick Schnider <yannick.schnider1@ibm.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant