Skip to content

[V1] Logits processors extensibility #19912

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 336 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 250 commits
Commits
Show all changes
336 commits
Select commit Hold shift + click to select a range
bd1ffa3
Merge branch 'main' into logitsprocs_merge
afeldman-nm Jun 6, 2025
279679b
Merge branch 'logitsprocs_reorg' into logitsprocs_reorg_bugfix
afeldman-nm Jun 6, 2025
8cf4817
merge
afeldman-nm Jun 7, 2025
17c10ca
refactor
afeldman-nm Jun 7, 2025
849d829
bugfix - redundant batch update
afeldman-nm Jun 7, 2025
03a836b
min tokens test bugfix
afeldman-nm Jun 7, 2025
5ab8af1
Merge branch 'main' into logitsprocs_merge
afeldman-nm Jun 7, 2025
d92a3f3
remove prints
afeldman-nm Jun 7, 2025
1f87ec8
Merge branch 'main' into logitsprocs_merge
afeldman-nm Jun 8, 2025
ef2294d
rejection sampling test bugfix
afeldman-nm Jun 8, 2025
198db48
sampler test bugfix
afeldman-nm Jun 8, 2025
2f2550b
removed logitsprocs where not needed in test
afeldman-nm Jun 8, 2025
1b1f8ca
Merge branch 'main' into logitsprocs_merge
afeldman-nm Jun 8, 2025
5fc130b
refactor
afeldman-nm Jun 9, 2025
7b8f299
sampling_params min-p check
afeldman-nm Jun 9, 2025
4d5ea01
Merge branch 'main' into logitsprocs_merge
afeldman-nm Jun 9, 2025
d8a6761
Merge branch 'logitsprocs' into logitsprocs_valid
afeldman-nm Jun 9, 2025
0515848
small test optimization
afeldman-nm Jun 9, 2025
17e7f62
Merge branch 'main' into logitsprocs_merge
afeldman-nm Jun 9, 2025
c392898
refactor
afeldman-nm Jun 10, 2025
dc4b6b8
wip tests
afeldman-nm Jun 11, 2025
fd26581
Merge branch 'main' into logitsprocs_merge
afeldman-nm Jun 11, 2025
5fb16a6
refactor
afeldman-nm Jun 11, 2025
b0658c2
passing mixed batch test for min_p and none
afeldman-nm Jun 11, 2025
ac608f1
Merge branch 'main' into logitsprocs_merge
afeldman-nm Jun 11, 2025
7f44262
Merge branch 'logitsprocs' into logitsprocs_reorder
afeldman-nm Jun 11, 2025
0a20965
mix batch test passes without reorder
afeldman-nm Jun 12, 2025
bdea83c
refactor
afeldman-nm Jun 12, 2025
7f4d72e
Merge branch 'main' into logitsprocs_merge
afeldman-nm Jun 12, 2025
ec25ab5
Merge branch 'main' into logitsprocs_merge
afeldman-nm Jun 12, 2025
5a5e38f
move-only
afeldman-nm Jun 12, 2025
19d3882
Merge branch 'main' into logitsprocs_merge
afeldman-nm Jun 12, 2025
588b845
Merge branch 'main' into logitsprocs_merge
afeldman-nm Jun 13, 2025
38746ae
Merge branch 'main' into logitsprocs_merge
afeldman-nm Jun 13, 2025
9703c4a
fake reordering logic
afeldman-nm Jun 13, 2025
6078602
fake logitsproc invocation against fake batch
afeldman-nm Jun 13, 2025
ae5b600
almost passing
afeldman-nm Jun 13, 2025
d5679bb
Merge branch 'main' into logitsprocs_merge
afeldman-nm Jun 13, 2025
84cad20
Merge branch 'logitsprocs' into logitsprocs_reorder
afeldman-nm Jun 13, 2025
89ea6dd
wip refactor
afeldman-nm Jun 13, 2025
76438fb
test fix
afeldman-nm Jun 13, 2025
e03c561
Merge branch 'main' into logitsprocs_merge
afeldman-nm Jun 13, 2025
045bc01
Merge branch 'main' into logitsprocs_merge
afeldman-nm Jun 16, 2025
9ac6190
latest
afeldman-nm Jun 16, 2025
c1b8e69
Merge branch 'main' into logitsprocs_merge
afeldman-nm Jun 16, 2025
e83f90b
Merge branch 'main' into logitsprocs_merge
afeldman-nm Jun 17, 2025
360f2c4
removed tpu hack
afeldman-nm Jun 17, 2025
eac0c82
Merge branch 'main' into logitsprocs_merge
afeldman-nm Jun 17, 2025
395d472
wip tpu backward compat
afeldman-nm Jun 17, 2025
5c53a8c
typing
afeldman-nm Jun 17, 2025
e08e4f4
Merge branch 'logitsprocs' into logitsprocs_tpu
afeldman-nm Jun 17, 2025
f7969c5
wip
afeldman-nm Jun 17, 2025
1117f51
first pass at tpu/gpu separation
afeldman-nm Jun 17, 2025
e0fd74b
Merge branch 'main' into logitsprocs_merge
afeldman-nm Jun 17, 2025
2a4e09c
first pass at new TPU approach
afeldman-nm Jun 17, 2025
1f4cad3
docstrings
afeldman-nm Jun 17, 2025
a6be23c
Merge branch 'main' into tpu-isolate
afeldman-nm Jun 17, 2025
b28588c
merged in GPU/TPU decoupling PR
afeldman-nm Jun 17, 2025
ca87319
bugfix
afeldman-nm Jun 17, 2025
32e4275
type checking
afeldman-nm Jun 17, 2025
9aeb49d
Merge branch 'main' into tpu-isolate
afeldman-nm Jun 18, 2025
0383e73
InputBatch fix
afeldman-nm Jun 18, 2025
9564879
Merge branch 'tpu-isolate' into logitsprocs_merge
afeldman-nm Jun 18, 2025
c02ef1b
merge
afeldman-nm Jun 18, 2025
b804423
vllm_xargs/kv_transfer_params compatibility
afeldman-nm Jun 18, 2025
17f02ee
fix
afeldman-nm Jun 18, 2025
061ac67
remove unnecessary unit test
afeldman-nm Jun 18, 2025
421c278
precedence
afeldman-nm Jun 18, 2025
f315e0e
pre-commit fix
afeldman-nm Jun 18, 2025
3d92a07
Merge branch 'main' into extra_args_merge
afeldman-nm Jun 18, 2025
873b89f
merge
afeldman-nm Jun 18, 2025
f8609ff
Merge branch 'main' into extra_args_merge
afeldman-nm Jun 18, 2025
9c5f407
Documentation changes
afeldman-nm Jun 18, 2025
0857dc4
refactor
afeldman-nm Jun 18, 2025
f9c4e19
typing
afeldman-nm Jun 18, 2025
03c6010
typing
afeldman-nm Jun 18, 2025
95e1b0d
typing
afeldman-nm Jun 18, 2025
9daeaed
Update vllm/entrypoints/openai/protocol.py
afeldman-nm Jun 18, 2025
baf90c9
feedback
afeldman-nm Jun 18, 2025
4f04198
remove swap type
afeldman-nm Jun 18, 2025
da23801
refactor
afeldman-nm Jun 18, 2025
34c9866
move/swap refactoring
afeldman-nm Jun 18, 2025
e1f0455
refactoring
afeldman-nm Jun 18, 2025
fe088ea
Merge branch 'main' into logitsprocs_merge
afeldman-nm Jun 18, 2025
f506dd7
small fixes
afeldman-nm Jun 18, 2025
3257deb
Merge branch 'main' into logitsprocs_merge
afeldman-nm Jun 18, 2025
c0b2068
Merge branch 'main' into extra_args_merge
afeldman-nm Jun 18, 2025
5894110
Merge branch 'extra_args' into lp_ext
afeldman-nm Jun 18, 2025
3885bc5
merge
afeldman-nm Jun 20, 2025
e06f9e9
Merge branch 'main' into logitsprocs_merge
afeldman-nm Jun 24, 2025
7d89720
batch update builder
afeldman-nm Jun 24, 2025
33e0f14
comments
afeldman-nm Jun 24, 2025
26c18d6
Merge branch 'logitsprocs' into lp_ext
afeldman-nm Jun 24, 2025
2e56aec
add custom logitsprocs arg
afeldman-nm Jun 24, 2025
2213b44
Merge branch 'main' into logitsprocs_merge
afeldman-nm Jun 24, 2025
36d6f69
logitsprocs+pooling bugfix
afeldman-nm Jun 24, 2025
28b6606
Merge branch 'logitsprocs' into lp_ext_merge
afeldman-nm Jun 24, 2025
e422caa
Merge branch 'lp_ext' into lp_ext_py
afeldman-nm Jun 24, 2025
3cca78f
small tweaks
afeldman-nm Jun 24, 2025
4177594
refactor
afeldman-nm Jun 24, 2025
40407b7
Merge branch 'main' into logitsprocs_merge
afeldman-nm Jun 24, 2025
5209ffe
Fixed min tokens bug
afeldman-nm Jun 25, 2025
301db58
Merge branch 'main' into logitsprocs_merge
afeldman-nm Jun 25, 2025
6f41503
fixed logit bias bug
afeldman-nm Jun 25, 2025
36f161d
Merge branch 'main' into logitsprocs_merge
afeldman-nm Jun 25, 2025
a14d3a4
Merge branch 'logitsprocs' into lp_ext_merge
afeldman-nm Jun 25, 2025
1716f07
Merge branch 'lp_ext' into lp_ext_py
afeldman-nm Jun 25, 2025
b429c10
Merge branch 'main' into logitsprocs_merge
afeldman-nm Jun 25, 2025
fbdb595
comment Re: output tokens list ref
afeldman-nm Jun 25, 2025
e3dc71e
Merge branch 'logitsprocs' into logitsprocs_merge
afeldman-nm Jun 25, 2025
aa4c519
Merge branch 'main' into logitsprocs_merge
afeldman-nm Jun 25, 2025
3ae8a6b
Merge branch 'logitsprocs' into lp_ext_merge
afeldman-nm Jun 25, 2025
d58bf24
Merge branch 'lp_ext' into lp_ext_py
afeldman-nm Jun 25, 2025
77bba48
refactor
afeldman-nm Jun 25, 2025
890a9cd
refactor
afeldman-nm Jun 25, 2025
6b3ea9f
Update vllm/v1/sample/logits_processor.py
afeldman-nm Jun 25, 2025
8a8f9c2
wip
afeldman-nm Jun 25, 2025
070d71d
Merge branch 'main' into logitsprocs_merge
afeldman-nm Jun 25, 2025
5384732
feedback
afeldman-nm Jun 25, 2025
9aebc9f
Update vllm/v1/sample/sampler.py
afeldman-nm Jun 25, 2025
8bb6bf0
revert some changes
afeldman-nm Jun 25, 2025
0a88e16
refactor
afeldman-nm Jun 25, 2025
18721da
Merge branch 'logitsprocs' of https://github.com/neuralmagic/vllm int…
afeldman-nm Jun 25, 2025
dc0b23a
refactor
afeldman-nm Jun 25, 2025
21ad212
Merge branch 'main' into logitsprocs_merge
afeldman-nm Jun 25, 2025
2f0de77
argmax_invariant
afeldman-nm Jun 25, 2025
8d97a7c
batch update builder impl
afeldman-nm Jun 25, 2025
2abd24d
Merge branch 'main' into logitsprocs_merge
afeldman-nm Jun 25, 2025
d1c6607
refactor
afeldman-nm Jun 25, 2025
9fe0bc3
wip dict removal
afeldman-nm Jun 25, 2025
aa18e8f
Merge branch 'main' into logitsprocs_merge
afeldman-nm Jun 25, 2025
f7a162c
Merge branch 'main' into logitsprocs_merge
afeldman-nm Jun 26, 2025
de81e42
updated unit tests
afeldman-nm Jun 26, 2025
20928f0
refactor
afeldman-nm Jun 26, 2025
a0e5398
iterators
afeldman-nm Jun 26, 2025
d4704d7
refactor
afeldman-nm Jun 26, 2025
729729d
reorg
afeldman-nm Jun 27, 2025
9948fd3
Merge branch 'main' into logitsprocs_merge
afeldman-nm Jun 27, 2025
bc48f38
Merge branch 'main' into logitsprocs_merge
afeldman-nm Jun 27, 2025
9eeea03
feedback
afeldman-nm Jun 28, 2025
1078a24
Merge branch 'main' into logitsprocs_merge
afeldman-nm Jun 28, 2025
cd766a4
feedback
afeldman-nm Jun 28, 2025
2628f98
Merge branch 'main' into logitsprocs_merge
afeldman-nm Jun 30, 2025
2ecb37d
Merge branch 'main' into logitsprocs_merge
afeldman-nm Jun 30, 2025
64ac2cf
input batch tests
afeldman-nm Jul 1, 2025
4da82cc
Merge branch 'main' into logitsprocs_merge
afeldman-nm Jul 1, 2025
bd62df4
refactor
afeldman-nm Jul 1, 2025
8455bb6
Merge branch 'main' into logitsprocs_merge
afeldman-nm Jul 1, 2025
a6dc218
attempted fmt fix
afeldman-nm Jul 1, 2025
a870259
wip
afeldman-nm Jul 1, 2025
072ee00
wip
afeldman-nm Jul 1, 2025
55fd6e7
fixed cancellation bug
afeldman-nm Jul 1, 2025
6d4e073
Merge branch 'logitsprocs' into lp_ext_merge
afeldman-nm Jul 1, 2025
ab3a985
Merge branch 'lp_ext' into lp_ext_py
afeldman-nm Jul 1, 2025
b55f88e
Merge branch 'main' into logitsprocs_merge
afeldman-nm Jul 1, 2025
348a100
Merge branch 'logitsprocs' into lp_ext
afeldman-nm Jul 1, 2025
c397e24
Merge branch 'lp_ext' into lp_ext_py
afeldman-nm Jul 1, 2025
1217b74
wip
afeldman-nm Jul 1, 2025
402d012
Update vllm/v1/worker/gpu_model_runner.py
afeldman-nm Jul 1, 2025
7c15b43
CLI
afeldman-nm Jul 1, 2025
06fc926
pr feedback
afeldman-nm Jul 1, 2025
8d229ed
Merge branch 'main' into logitsprocs_merge
afeldman-nm Jul 1, 2025
4d0b612
Merge branch 'logitsprocs' into lp_ext
afeldman-nm Jul 1, 2025
4b1884b
Merge branch 'lp_ext' into lp_ext_py
afeldman-nm Jul 1, 2025
99c0c18
skeleton of example
afeldman-nm Jul 1, 2025
aabd1dd
fixes
afeldman-nm Jul 1, 2025
63b640c
wip
afeldman-nm Jul 1, 2025
45dade4
mem util
afeldman-nm Jul 1, 2025
d377a6b
Merge branch 'main' into logitsprocs_merge
afeldman-nm Jul 1, 2025
6ae7574
memory util
afeldman-nm Jul 1, 2025
5203324
Merge branch 'main' into logitsprocs_merge
afeldman-nm Jul 1, 2025
68aab25
Merge branch 'main' into logitsprocs_merge
afeldman-nm Jul 1, 2025
066736d
merge'
afeldman-nm Jul 2, 2025
31597e9
Merge branch 'logitsprocs' into lp_ext
afeldman-nm Jul 2, 2025
957bd86
Merge branch 'lp_ext' into lp_ext_py
afeldman-nm Jul 2, 2025
3a5564d
refactor
afeldman-nm Jul 2, 2025
663bff1
refactor
afeldman-nm Jul 3, 2025
69c2a0d
merge
afeldman-nm Jul 3, 2025
538c378
Merge branch 'main' into lp_ext
afeldman-nm Jul 3, 2025
270b184
wip
afeldman-nm Jul 3, 2025
195f651
Merge branch 'main' into lp_ext_merge
afeldman-nm Jul 3, 2025
f9df850
Merge branch 'lp_ext' into lp_ext_py
afeldman-nm Jul 3, 2025
fc9c308
py llm plumbing
afeldman-nm Jul 3, 2025
3aa383e
wip lp example
afeldman-nm Jul 3, 2025
b420aac
wip
afeldman-nm Jul 7, 2025
a475fe9
Merge branch 'main' into lp_ext_merge
afeldman-nm Jul 7, 2025
01d640c
Merge branch 'lp_ext' into lp_ext_py
afeldman-nm Jul 7, 2025
138dc07
Merge branch 'main' into lp_ext_merge
afeldman-nm Jul 7, 2025
699768a
Merge branch 'lp_ext' into lp_ext_py
afeldman-nm Jul 7, 2025
ee88fdf
Merge branch 'main' into lp_ext_py
afeldman-nm Jul 7, 2025
6a405ab
first pass at lp loading system
afeldman-nm Jul 7, 2025
0de1e73
wip
afeldman-nm Jul 8, 2025
ef51732
Merge branch 'main' into lp_ext_py
afeldman-nm Jul 8, 2025
c8e8671
loading logitsprocs
afeldman-nm Jul 8, 2025
52146dc
refactor
afeldman-nm Jul 8, 2025
e79f9ad
lp tests passing
afeldman-nm Jul 8, 2025
4c16135
Merge branch 'main' into lp_ext_py
afeldman-nm Jul 8, 2025
2e330e1
refactor
afeldman-nm Jul 8, 2025
7a60363
logitsprocs
afeldman-nm Jul 8, 2025
18129b4
example w/ dummy logitproc
afeldman-nm Jul 8, 2025
e73c00c
refactor
afeldman-nm Jul 8, 2025
f612fcf
Merge branch 'main' into lp_ext
afeldman-nm Jul 8, 2025
4af5159
entrypoint example
afeldman-nm Jul 8, 2025
be7177a
cli arg
afeldman-nm Jul 8, 2025
0ad8b1c
removed regex
afeldman-nm Jul 8, 2025
c21a2ec
fqn/entrypoint examples
afeldman-nm Jul 8, 2025
4730d7a
cli tests
afeldman-nm Jul 8, 2025
1617747
Merge branch 'main' into lp_ext
afeldman-nm Jul 8, 2025
1784079
Merge branch 'main' into lp_ext_merge
afeldman-nm Jul 10, 2025
f078ce7
tail end of merge
afeldman-nm Jul 10, 2025
129479a
Merge branch 'main' into lp_ext_merge
afeldman-nm Jul 10, 2025
ac1509f
refactor
afeldman-nm Jul 10, 2025
5b85255
wip
afeldman-nm Jul 10, 2025
d7499db
Merge branch 'main' into lp_ext_merge
afeldman-nm Jul 10, 2025
ee74904
all lp plugins are loaded; can pass lp types to LLM; refactor
afeldman-nm Jul 11, 2025
be9e750
Merge branch 'main' into lp_ext_merge
afeldman-nm Jul 11, 2025
4f80bee
unit test fix
afeldman-nm Jul 11, 2025
683b99f
typo
afeldman-nm Jul 11, 2025
f7ee5ee
refactor
afeldman-nm Jul 11, 2025
ad08c45
refactor
afeldman-nm Jul 11, 2025
bbabe50
abstract __init__
abf149 Jul 14, 2025
9e88f37
fqn
abf149 Jul 14, 2025
ae55b2e
merge
abf149 Jul 14, 2025
d08a89d
type checking
abf149 Jul 14, 2025
e7cb8e1
merge
afeldman-nm Jul 16, 2025
aae02a0
small fix
afeldman-nm Jul 16, 2025
2d807af
fixes
afeldman-nm Jul 16, 2025
c0cdd27
merge
afeldman-nm Jul 16, 2025
2a79d4a
merge
afeldman-nm Jul 16, 2025
7c49fe1
Merge branch 'main' into lp_ext_merge
afeldman-nm Jul 16, 2025
84112c0
Merge branch 'lp_ext_merge' into lp_ext
afeldman-nm Jul 16, 2025
bd243c9
refactor
afeldman-nm Jul 16, 2025
07d6056
fix
afeldman-nm Jul 16, 2025
83eca33
fix test bug
afeldman-nm Jul 16, 2025
6a4597b
cli test works again
afeldman-nm Jul 16, 2025
9d2156c
Merge branch 'main' into lp_ext_merge
afeldman-nm Jul 17, 2025
8c2d16c
LLM entrypoint testing
afeldman-nm Jul 17, 2025
35999c8
Merge branch 'main' into lp_ext_merge
afeldman-nm Jul 17, 2025
e6123e7
cli entrypoints test
afeldman-nm Jul 17, 2025
da8aa76
fixed example
afeldman-nm Jul 17, 2025
12d48a7
adding prompt tokens to added requests
afeldman-nm Jul 17, 2025
c4a76be
Merge branch 'main' into lp_ext_merge
afeldman-nm Jul 17, 2025
4a57631
initial feedback
afeldman-nm Jul 17, 2025
6697a30
Merge branch 'main' into lp_ext_merge
afeldman-nm Jul 17, 2025
5c350a3
wip
afeldman-nm Jul 17, 2025
4aa5c86
merge load.py into __init__.py
afeldman-nm Jul 17, 2025
d3099b4
refactor
afeldman-nm Jul 17, 2025
f6cbcad
resetting test.txt
afeldman-nm Jul 17, 2025
ffbc6f2
logitsprocs in input batch
afeldman-nm Jul 17, 2025
ea3c970
Merge branch 'main' into lp_ext_merge
afeldman-nm Jul 17, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
104 changes: 104 additions & 0 deletions examples/offline_inference/logits_processor.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,104 @@
# SPDX-License-Identifier: Apache-2.0
# SPDX-FileCopyrightText: Copyright contributors to the vLLM project

from typing import Optional

import torch

from vllm import LLM, SamplingParams
from vllm.v1.sample.logits_processor import (
BatchUpdate,
LogitsProcessor,
MoveDirectionality,
)


def make_dummy_logitproc_type():
class DummyLogitsProcessor(LogitsProcessor):
"""Fake logit processor to support unit testing and examples"""

def __init__(self, _):
super().__init__()
self.req_info = {}

def is_argmax_invariant(self) -> bool:
"""Never impacts greedy sampling"""
return False

def update_state(self, batch_update: Optional[BatchUpdate]):
if not batch_update:
return

# Process added requests.
for index, params, _ in batch_update.added:
if isinstance(params, SamplingParams) and params.extra_args:
target_token = params.extra_args.get("target_token", None)
else:
target_token = None
self.req_info[index] = target_token

if self.req_info:
# Process removed requests.
for index in batch_update.removed:
self.req_info.pop(index, None)

# Process moved requests, unidirectional (a->b) and swap (a<->b)
for adx, bdx, direct in batch_update.moved:
if direct == MoveDirectionality.SWAP:
(self.req_info[adx], self.req_info[bdx]) = (
self.req_info[bdx],
self.req_info[adx],
)
else:
self.req_info[bdx] = self.req_info[adx]

def apply(self, logits: torch.Tensor) -> torch.Tensor:
for bdx in range(logits.shape[0]):
if (target_token := self.req_info[bdx]) is not None:
mask = torch.ones_like(logits[bdx, :], dtype=torch.bool)
mask[target_token] = False
logits[bdx, mask] = float("-inf")

return logits

return DummyLogitsProcessor


# Sample prompts.
prompts = [
"Hello, my name is",
"The president of the United States is",
"The capital of France is",
"The future of AI is",
]
# Create a mixture of requests which do and don't utilize the dummy logitproc
sampling_params_list = [
SamplingParams(temperature=0.0, extra_args={"target_token": 128}),
SamplingParams(temperature=0.0),
SamplingParams(temperature=0.0, extra_args={"target_token": 67}),
SamplingParams(temperature=0.0),
]


def main():
# Create an LLM.
llm = LLM(
model="facebook/opt-125m",
logits_processors=[make_dummy_logitproc_type()],
)
# Generate texts from the prompts.
# The output is a list of RequestOutput objects
# that contain the prompt, generated text, and other information.
outputs = llm.generate(prompts, sampling_params_list)
# Print the outputs.
print("\nGenerated Outputs:\n" + "-" * 60)
for output in outputs:
prompt = output.prompt
generated_text = output.outputs[0].text
print(f"Prompt: {prompt!r}")
print(f"Output: {generated_text!r}")
print("-" * 60)


if __name__ == "__main__":
main()
Original file line number Diff line number Diff line change
@@ -0,0 +1,53 @@
# SPDX-License-Identifier: Apache-2.0
# SPDX-FileCopyrightText: Copyright contributors to the vLLM project

import argparse

from openai import OpenAI

# Modify OpenAI's API key and API base to use vLLM's API server.
openai_api_key = "EMPTY"
openai_api_base = "http://localhost:8000/v1"


def parse_args():
parser = argparse.ArgumentParser(description="Client for vLLM API server")
parser.add_argument(
"--stream", action="store_true", help="Enable streaming response"
)
return parser.parse_args()


def main(args):
client = OpenAI(
# defaults to os.environ.get("OPENAI_API_KEY")
api_key=openai_api_key,
base_url=openai_api_base,
)

models = client.models.list()
model = models.data[0].id

# Completion API
completion = client.completions.create(
model=model,
prompt="A robot may not injure a human being",
echo=False,
n=2,
stream=args.stream,
logprobs=3,
)

print("-" * 50)
print("Completion results:")
if args.stream:
for c in completion:
print(c)
else:
print(completion)
print("-" * 50)


if __name__ == "__main__":
args = parse_args()
main(args)
2 changes: 1 addition & 1 deletion tests/v1/entrypoints/openai/test_multi_api_servers.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,11 +2,11 @@
# SPDX-FileCopyrightText: Copyright contributors to the vLLM project
import asyncio
import os
import re

import openai # use the official client for correctness check
import pytest
import pytest_asyncio
import regex as re
import requests

from tests.utils import RemoteOpenAIServer
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@
create_prompt_tokens_tensor,
fake_apply_logitsprocs,
fake_update_logitsprocs_state)
from vllm.config import VllmConfig
from vllm.platforms import current_platform
from vllm.sampling_params import SamplingParams
from vllm.utils import is_pin_memory_available
Expand All @@ -23,9 +24,9 @@
LogitsProcessor,
MinPLogitsProcessor,
MinTokensLogitsProcessor,
MoveDirectionality,
init_builtin_logitsprocs)
MoveDirectionality)
# yapf: enable
from vllm.v1.sample.logits_processor.load import build_logitsprocs
from vllm.v1.sample.metadata import SamplingMetadata

PIN_MEMORY_AVAILABLE = is_pin_memory_available()
Expand Down Expand Up @@ -70,7 +71,6 @@ def __str__(self):
summ = ', '.join(f'{k}={v}' for k, v in vars(self).items())
return f"MyClass({summ})"


def _generate_fake_sampling_metadata(
num_output_tokens: int,
batch_size: int,
Expand All @@ -88,11 +88,11 @@ def _generate_fake_sampling_metadata(
vocab_size,
size=np.random.randint(
1, MAX_NUM_PROMPT_TOKENS)).tolist())
logitsprocs = init_builtin_logitsprocs(
pin_memory_available=PIN_MEMORY_AVAILABLE,
max_num_reqs=MAX_NUM_REQS + 1,
device=device)

logitsprocs = build_logitsprocs(
vllm_config=VllmConfig(),
device=device,
is_pin_memory=PIN_MEMORY_AVAILABLE,
)
fake_sampling_metadata = SamplingMetadata(
temperature=torch.full((batch_size, ), 0.0),
all_greedy=True,
Expand Down
113 changes: 113 additions & 0 deletions tests/v1/sample/logits_processors/test_custom_cli.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,113 @@
# SPDX-License-Identifier: Apache-2.0
# SPDX-FileCopyrightText: Copyright contributors to the vLLM project

import random

import openai # use the official client for correctness check
import pytest
import pytest_asyncio

from tests.utils import RemoteOpenAIServer
from tests.v1.sample.logits_processors.utils import (
DUMMY_LOGITPROC_ARG, DUMMY_LOGITPROC_ENTRYPOINT, DUMMY_LOGITPROC_FQN,
MAX_TOKENS, MODEL_NAME, TEMP_GREEDY, prompts)


@pytest.fixture(scope="module")
def default_server_args():
return [
# use half precision for speed and memory savings in CI environment
"--dtype",
"bfloat16",
"--max-model-len",
"2048",
"--max-num-seqs",
"128",
"--enforce-eager"
]


@pytest.fixture(
scope="module",
params=[[
"--logits-processors-entrypoints",
DUMMY_LOGITPROC_ENTRYPOINT + "," + DUMMY_LOGITPROC_ENTRYPOINT
],
[
"--logits-processors-fqns",
DUMMY_LOGITPROC_FQN + "," + DUMMY_LOGITPROC_FQN
]])
def server(default_server_args, request):
if request.param:
default_server_args = default_server_args + request.param
with RemoteOpenAIServer(MODEL_NAME, default_server_args) as remote_server:
yield remote_server


@pytest_asyncio.fixture
async def client(server):
async with server.get_async_client() as async_client:
yield async_client


api_kwargs = {
"temperature": TEMP_GREEDY,
"max_tokens": MAX_TOKENS,
"logprobs": 0,
}

extra_body_kwargs = {"vllm_xargs": {DUMMY_LOGITPROC_ARG: 128}}


@pytest.mark.asyncio
@pytest.mark.parametrize(
"model_name",
[MODEL_NAME],
)
async def test_custom_logitsprocs_cli(client: openai.AsyncOpenAI,
model_name: str):
"""Test CLI interface for passing custom logitsprocs

Launch vLLM OpenAI-compatible server with CLI argument to loads a custom
logitproc that has a well-defined behavior (mask out all tokens except one
`target_token`) Test is implicitly parameterized by the logitproc source
(fully-qualified class name or entrypoint)

Pass in requests, 50% of which pass a `target_token` value
in through `extra_body["vllm_xargs"]`, 50% of which do not.

Validate that requests which activate the custom logitproc, only output
`target_token`
"""
use_dummy_logitproc = True
for prompt in prompts:
# Send vLLM API request; for some requests, activate dummy logitproc
kwargs = {
**api_kwargs,
}
if use_dummy_logitproc:
target_token = random.choice([128, 67])
# For requests which activate the dummy logitproc, choose one of
# two `target_token` values which are known not to be EOS tokens
kwargs["extra_body"] = {

Check failure on line 92 in tests/v1/sample/logits_processors/test_custom_cli.py

View workflow job for this annotation

GitHub Actions / pre-commit

Incompatible types in assignment (expression has type "dict[str, dict[str, int]]", target has type "float") [assignment]

Check failure on line 92 in tests/v1/sample/logits_processors/test_custom_cli.py

View workflow job for this annotation

GitHub Actions / pre-commit

Incompatible types in assignment (expression has type "dict[str, dict[str, int]]", target has type "float") [assignment]

Check failure on line 92 in tests/v1/sample/logits_processors/test_custom_cli.py

View workflow job for this annotation

GitHub Actions / pre-commit

Incompatible types in assignment (expression has type "dict[str, dict[str, int]]", target has type "float") [assignment]

Check failure on line 92 in tests/v1/sample/logits_processors/test_custom_cli.py

View workflow job for this annotation

GitHub Actions / pre-commit

Incompatible types in assignment (expression has type "dict[str, dict[str, int]]", target has type "float") [assignment]

Check failure on line 92 in tests/v1/sample/logits_processors/test_custom_cli.py

View workflow job for this annotation

GitHub Actions / pre-commit

Incompatible types in assignment (expression has type "dict[str, dict[str, int]]", target has type "float") [assignment]

Check failure on line 92 in tests/v1/sample/logits_processors/test_custom_cli.py

View workflow job for this annotation

GitHub Actions / pre-commit

Incompatible types in assignment (expression has type "dict[str, dict[str, int]]", target has type "float") [assignment]

Check failure on line 92 in tests/v1/sample/logits_processors/test_custom_cli.py

View workflow job for this annotation

GitHub Actions / pre-commit

Incompatible types in assignment (expression has type "dict[str, dict[str, int]]", target has type "float") [assignment]

Check failure on line 92 in tests/v1/sample/logits_processors/test_custom_cli.py

View workflow job for this annotation

GitHub Actions / pre-commit

Incompatible types in assignment (expression has type "dict[str, dict[str, int]]", target has type "float") [assignment]
"vllm_xargs": {
DUMMY_LOGITPROC_ARG: target_token
}
}
batch = await client.completions.create(
model=model_name,
prompt=prompt,
**kwargs,
)

if use_dummy_logitproc:
# Only for requests which activate dummy logitproc - validate that
# only `target_token` is generated
choices: openai.types.CompletionChoice = batch.choices
toks = choices[0].logprobs.tokens
if not all([x == toks[0] for x in toks]):
raise AssertionError(
f"Generated {toks} should all be {toks[0]}")

# Alternate whether to activate dummy logitproc for each request
use_dummy_logitproc = not use_dummy_logitproc
Loading