Skip to content

[V1] Logits processors extensibility #19912

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 326 commits into
base: main
Choose a base branch
from

Conversation

afeldman-nm
Copy link
Contributor

@afeldman-nm afeldman-nm commented Jun 20, 2025

Purpose

Enable V1 logits processors support to be extended with custom logits processors.

New Python interface for custom logitsprocs

    # Specify logitproc by entrypoint
    llm = LLM(
        model="facebook/opt-125m",
        logits_processors_entrypoints=["dummy_logitproc"],
    )

    # Specify logitproc by fully-qualified name
    llm = LLM(
        model="facebook/opt-125m",
        logits_processors_fqns=["vllm.v1.sample.logits_processor.impls:DummyLogitsProcessor"],
    )

New CLI interface for custom logitsprocs

# Engine CLI args for specifying logitsprocs by entrypoint
--logits-processors-entrypoints=dummy_logitproc,<other logitproc>,...

# Engine CLI args for specifying logitsprocs by fully-qualified name
--logits-processors-fqns=vllm.v1.sample.logits_processor.impls:DummyLogitsProcessor,<other logitproc>,...

Test Plan

(WIP)

  • Configuring a contrived custom logits proc
  • E2E tests using REST API and Python interfaces
  • Wrap a V0-style logits proc to create a V1-style logits proc

Test Result

WIP

(Optional) Documentation Update

WIP

Fixes #17799
Fixes #12678

Signed-off-by: Andrew Feldman <afeldman@redhat.com>
Signed-off-by: Andrew Feldman <afeldman@redhat.com>
Signed-off-by: Andrew Feldman <afeldman@redhat.com>
Signed-off-by: Andrew Feldman <afeldman@redhat.com>
Signed-off-by: Andrew Feldman <afeldman@redhat.com>
Signed-off-by: Andrew Feldman <afeldman@redhat.com>
Signed-off-by: Andrew Feldman <afeldman@redhat.com>
Signed-off-by: Andrew Feldman <afeldman@redhat.com>
Signed-off-by: Andrew Feldman <afeldman@redhat.com>
Signed-off-by: Andrew Feldman <afeldman@redhat.com>
Signed-off-by: Andrew Feldman <afeldman@redhat.com>
Signed-off-by: Andrew Feldman <afeldman@redhat.com>
Signed-off-by: Andrew Feldman <afeldman@redhat.com>
Signed-off-by: Andrew Feldman <afeldman@redhat.com>
Signed-off-by: Andrew Feldman <afeldman@redhat.com>
Signed-off-by: Andrew Feldman <afeldman@redhat.com>
Signed-off-by: Andrew Feldman <afeldman@redhat.com>
Copy link

mergify bot commented Jul 12, 2025

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @afeldman-nm.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify bot added the needs-rebase label Jul 12, 2025
abf149 added 3 commits July 14, 2025 10:14
Signed-off-by: Andrew Feldman <afeld2012@gmail.com>
Signed-off-by: Andrew Feldman <afeld2012@gmail.com>
Signed-off-by: Andrew Feldman <afeld2012@gmail.com>
@mergify mergify bot removed the needs-rebase label Jul 14, 2025
Signed-off-by: Andrew Feldman <afeld2012@gmail.com>
@aarnphm aarnphm mentioned this pull request Jul 14, 2025
4 tasks
joerunde added a commit to vllm-project/vllm-spyre that referenced this pull request Jul 14, 2025
At first it wasn't obvious if it would be easy to integrate the changes
of PR vllm-project/vllm#16728 so initially I
added PR that copies the sampler files previous to that PR in
vllm-spyre. But actually it's easier than I thought because the sampler
code is not compiled to the AIU, only the model forward is.

Currently in the MinP processor there is a tensor for the cpu and for
the device. Since only the model forward runs on the AIU, both tensors
end up on the CPU, which means that there is an unnecessary copy from
one to the other, but the result is still correct.

There is a future upstream PR that will generalize the Logits processor
to other sampling parameters:

vllm-project/vllm#19912

Signed-off-by: Max de Bayser <mbayser@br.ibm.com>
Co-authored-by: Joe Runde <joe@joerun.de>
Copy link

mergify bot commented Jul 16, 2025

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @afeldman-nm.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify bot added the needs-rebase label Jul 16, 2025
@llsj14
Copy link
Contributor

llsj14 commented Jul 16, 2025

This PR looks great as it refactors the code related to LogitsProcessors, splitting it into separate files. I'm currently working on a new logits processor to limit thinking tokens (PR: #20859), and I needed the changes in this PR, so I hope it gets merged soon.

Signed-off-by: Andrew Feldman <afeldman@redhat.com>
Signed-off-by: Andrew Feldman <afeldman@redhat.com>
Signed-off-by: Andrew Feldman <afeldman@redhat.com>
Signed-off-by: Andrew Feldman <afeldman@redhat.com>
@mergify mergify bot added ci/build and removed needs-rebase labels Jul 16, 2025
Signed-off-by: Andrew Feldman <afeldman@redhat.com>
Signed-off-by: Andrew Feldman <afeldman@redhat.com>
Signed-off-by: Andrew Feldman <afeldman@redhat.com>
Signed-off-by: Andrew Feldman <afeldman@redhat.com>
Signed-off-by: Andrew Feldman <afeldman@redhat.com>
Signed-off-by: Andrew Feldman <afeldman@redhat.com>
Signed-off-by: Andrew Feldman <afeldman@redhat.com>
Signed-off-by: Andrew Feldman <afeldman@redhat.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ci/build documentation Improvements or additions to documentation frontend performance Performance-related issues speculative-decoding v1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[RFC]: Logits processor extensibility [Bug]: V1 engine ignores logits processors and min-p sampling
4 participants