-
-
Notifications
You must be signed in to change notification settings - Fork 8.9k
[V1] Logits processors extensibility #19912
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
afeldman-nm
wants to merge
340
commits into
vllm-project:main
Choose a base branch
from
neuralmagic:lp_ext
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
+965
−298
Open
Changes from 250 commits
Commits
Show all changes
340 commits
Select commit
Hold shift + click to select a range
849d829
bugfix - redundant batch update
afeldman-nm 03a836b
min tokens test bugfix
afeldman-nm 5ab8af1
Merge branch 'main' into logitsprocs_merge
afeldman-nm d92a3f3
remove prints
afeldman-nm 1f87ec8
Merge branch 'main' into logitsprocs_merge
afeldman-nm ef2294d
rejection sampling test bugfix
afeldman-nm 198db48
sampler test bugfix
afeldman-nm 2f2550b
removed logitsprocs where not needed in test
afeldman-nm 1b1f8ca
Merge branch 'main' into logitsprocs_merge
afeldman-nm 5fc130b
refactor
afeldman-nm 7b8f299
sampling_params min-p check
afeldman-nm 4d5ea01
Merge branch 'main' into logitsprocs_merge
afeldman-nm d8a6761
Merge branch 'logitsprocs' into logitsprocs_valid
afeldman-nm 0515848
small test optimization
afeldman-nm 17e7f62
Merge branch 'main' into logitsprocs_merge
afeldman-nm c392898
refactor
afeldman-nm dc4b6b8
wip tests
afeldman-nm fd26581
Merge branch 'main' into logitsprocs_merge
afeldman-nm 5fb16a6
refactor
afeldman-nm b0658c2
passing mixed batch test for min_p and none
afeldman-nm ac608f1
Merge branch 'main' into logitsprocs_merge
afeldman-nm 7f44262
Merge branch 'logitsprocs' into logitsprocs_reorder
afeldman-nm 0a20965
mix batch test passes without reorder
afeldman-nm bdea83c
refactor
afeldman-nm 7f4d72e
Merge branch 'main' into logitsprocs_merge
afeldman-nm ec25ab5
Merge branch 'main' into logitsprocs_merge
afeldman-nm 5a5e38f
move-only
afeldman-nm 19d3882
Merge branch 'main' into logitsprocs_merge
afeldman-nm 588b845
Merge branch 'main' into logitsprocs_merge
afeldman-nm 38746ae
Merge branch 'main' into logitsprocs_merge
afeldman-nm 9703c4a
fake reordering logic
afeldman-nm 6078602
fake logitsproc invocation against fake batch
afeldman-nm ae5b600
almost passing
afeldman-nm d5679bb
Merge branch 'main' into logitsprocs_merge
afeldman-nm 84cad20
Merge branch 'logitsprocs' into logitsprocs_reorder
afeldman-nm 89ea6dd
wip refactor
afeldman-nm 76438fb
test fix
afeldman-nm e03c561
Merge branch 'main' into logitsprocs_merge
afeldman-nm 045bc01
Merge branch 'main' into logitsprocs_merge
afeldman-nm 9ac6190
latest
afeldman-nm c1b8e69
Merge branch 'main' into logitsprocs_merge
afeldman-nm e83f90b
Merge branch 'main' into logitsprocs_merge
afeldman-nm 360f2c4
removed tpu hack
afeldman-nm eac0c82
Merge branch 'main' into logitsprocs_merge
afeldman-nm 395d472
wip tpu backward compat
afeldman-nm 5c53a8c
typing
afeldman-nm e08e4f4
Merge branch 'logitsprocs' into logitsprocs_tpu
afeldman-nm f7969c5
wip
afeldman-nm 1117f51
first pass at tpu/gpu separation
afeldman-nm e0fd74b
Merge branch 'main' into logitsprocs_merge
afeldman-nm 2a4e09c
first pass at new TPU approach
afeldman-nm 1f4cad3
docstrings
afeldman-nm a6be23c
Merge branch 'main' into tpu-isolate
afeldman-nm b28588c
merged in GPU/TPU decoupling PR
afeldman-nm ca87319
bugfix
afeldman-nm 32e4275
type checking
afeldman-nm 9aeb49d
Merge branch 'main' into tpu-isolate
afeldman-nm 0383e73
InputBatch fix
afeldman-nm 9564879
Merge branch 'tpu-isolate' into logitsprocs_merge
afeldman-nm c02ef1b
merge
afeldman-nm b804423
vllm_xargs/kv_transfer_params compatibility
afeldman-nm 17f02ee
fix
afeldman-nm 061ac67
remove unnecessary unit test
afeldman-nm 421c278
precedence
afeldman-nm f315e0e
pre-commit fix
afeldman-nm 3d92a07
Merge branch 'main' into extra_args_merge
afeldman-nm 873b89f
merge
afeldman-nm f8609ff
Merge branch 'main' into extra_args_merge
afeldman-nm 9c5f407
Documentation changes
afeldman-nm 0857dc4
refactor
afeldman-nm f9c4e19
typing
afeldman-nm 03c6010
typing
afeldman-nm 95e1b0d
typing
afeldman-nm 9daeaed
Update vllm/entrypoints/openai/protocol.py
afeldman-nm baf90c9
feedback
afeldman-nm 4f04198
remove swap type
afeldman-nm da23801
refactor
afeldman-nm 34c9866
move/swap refactoring
afeldman-nm e1f0455
refactoring
afeldman-nm fe088ea
Merge branch 'main' into logitsprocs_merge
afeldman-nm f506dd7
small fixes
afeldman-nm 3257deb
Merge branch 'main' into logitsprocs_merge
afeldman-nm c0b2068
Merge branch 'main' into extra_args_merge
afeldman-nm 5894110
Merge branch 'extra_args' into lp_ext
afeldman-nm 3885bc5
merge
afeldman-nm e06f9e9
Merge branch 'main' into logitsprocs_merge
afeldman-nm 7d89720
batch update builder
afeldman-nm 33e0f14
comments
afeldman-nm 26c18d6
Merge branch 'logitsprocs' into lp_ext
afeldman-nm 2e56aec
add custom logitsprocs arg
afeldman-nm 2213b44
Merge branch 'main' into logitsprocs_merge
afeldman-nm 36d6f69
logitsprocs+pooling bugfix
afeldman-nm 28b6606
Merge branch 'logitsprocs' into lp_ext_merge
afeldman-nm e422caa
Merge branch 'lp_ext' into lp_ext_py
afeldman-nm 3cca78f
small tweaks
afeldman-nm 4177594
refactor
afeldman-nm 40407b7
Merge branch 'main' into logitsprocs_merge
afeldman-nm 5209ffe
Fixed min tokens bug
afeldman-nm 301db58
Merge branch 'main' into logitsprocs_merge
afeldman-nm 6f41503
fixed logit bias bug
afeldman-nm 36f161d
Merge branch 'main' into logitsprocs_merge
afeldman-nm a14d3a4
Merge branch 'logitsprocs' into lp_ext_merge
afeldman-nm 1716f07
Merge branch 'lp_ext' into lp_ext_py
afeldman-nm b429c10
Merge branch 'main' into logitsprocs_merge
afeldman-nm fbdb595
comment Re: output tokens list ref
afeldman-nm e3dc71e
Merge branch 'logitsprocs' into logitsprocs_merge
afeldman-nm aa4c519
Merge branch 'main' into logitsprocs_merge
afeldman-nm 3ae8a6b
Merge branch 'logitsprocs' into lp_ext_merge
afeldman-nm d58bf24
Merge branch 'lp_ext' into lp_ext_py
afeldman-nm 77bba48
refactor
afeldman-nm 890a9cd
refactor
afeldman-nm 6b3ea9f
Update vllm/v1/sample/logits_processor.py
afeldman-nm 8a8f9c2
wip
afeldman-nm 070d71d
Merge branch 'main' into logitsprocs_merge
afeldman-nm 5384732
feedback
afeldman-nm 9aebc9f
Update vllm/v1/sample/sampler.py
afeldman-nm 8bb6bf0
revert some changes
afeldman-nm 0a88e16
refactor
afeldman-nm 18721da
Merge branch 'logitsprocs' of https://github.com/neuralmagic/vllm int…
afeldman-nm dc0b23a
refactor
afeldman-nm 21ad212
Merge branch 'main' into logitsprocs_merge
afeldman-nm 2f0de77
argmax_invariant
afeldman-nm 8d97a7c
batch update builder impl
afeldman-nm 2abd24d
Merge branch 'main' into logitsprocs_merge
afeldman-nm d1c6607
refactor
afeldman-nm 9fe0bc3
wip dict removal
afeldman-nm aa18e8f
Merge branch 'main' into logitsprocs_merge
afeldman-nm f7a162c
Merge branch 'main' into logitsprocs_merge
afeldman-nm de81e42
updated unit tests
afeldman-nm 20928f0
refactor
afeldman-nm a0e5398
iterators
afeldman-nm d4704d7
refactor
afeldman-nm 729729d
reorg
afeldman-nm 9948fd3
Merge branch 'main' into logitsprocs_merge
afeldman-nm bc48f38
Merge branch 'main' into logitsprocs_merge
afeldman-nm 9eeea03
feedback
afeldman-nm 1078a24
Merge branch 'main' into logitsprocs_merge
afeldman-nm cd766a4
feedback
afeldman-nm 2628f98
Merge branch 'main' into logitsprocs_merge
afeldman-nm 2ecb37d
Merge branch 'main' into logitsprocs_merge
afeldman-nm 64ac2cf
input batch tests
afeldman-nm 4da82cc
Merge branch 'main' into logitsprocs_merge
afeldman-nm bd62df4
refactor
afeldman-nm 8455bb6
Merge branch 'main' into logitsprocs_merge
afeldman-nm a6dc218
attempted fmt fix
afeldman-nm a870259
wip
afeldman-nm 072ee00
wip
afeldman-nm 55fd6e7
fixed cancellation bug
afeldman-nm 6d4e073
Merge branch 'logitsprocs' into lp_ext_merge
afeldman-nm ab3a985
Merge branch 'lp_ext' into lp_ext_py
afeldman-nm b55f88e
Merge branch 'main' into logitsprocs_merge
afeldman-nm 348a100
Merge branch 'logitsprocs' into lp_ext
afeldman-nm c397e24
Merge branch 'lp_ext' into lp_ext_py
afeldman-nm 1217b74
wip
afeldman-nm 402d012
Update vllm/v1/worker/gpu_model_runner.py
afeldman-nm 7c15b43
CLI
afeldman-nm 06fc926
pr feedback
afeldman-nm 8d229ed
Merge branch 'main' into logitsprocs_merge
afeldman-nm 4d0b612
Merge branch 'logitsprocs' into lp_ext
afeldman-nm 4b1884b
Merge branch 'lp_ext' into lp_ext_py
afeldman-nm 99c0c18
skeleton of example
afeldman-nm aabd1dd
fixes
afeldman-nm 63b640c
wip
afeldman-nm 45dade4
mem util
afeldman-nm d377a6b
Merge branch 'main' into logitsprocs_merge
afeldman-nm 6ae7574
memory util
afeldman-nm 5203324
Merge branch 'main' into logitsprocs_merge
afeldman-nm 68aab25
Merge branch 'main' into logitsprocs_merge
afeldman-nm 066736d
merge'
afeldman-nm 31597e9
Merge branch 'logitsprocs' into lp_ext
afeldman-nm 957bd86
Merge branch 'lp_ext' into lp_ext_py
afeldman-nm 3a5564d
refactor
afeldman-nm 663bff1
refactor
afeldman-nm 69c2a0d
merge
afeldman-nm 538c378
Merge branch 'main' into lp_ext
afeldman-nm 270b184
wip
afeldman-nm 195f651
Merge branch 'main' into lp_ext_merge
afeldman-nm f9df850
Merge branch 'lp_ext' into lp_ext_py
afeldman-nm fc9c308
py llm plumbing
afeldman-nm 3aa383e
wip lp example
afeldman-nm b420aac
wip
afeldman-nm a475fe9
Merge branch 'main' into lp_ext_merge
afeldman-nm 01d640c
Merge branch 'lp_ext' into lp_ext_py
afeldman-nm 138dc07
Merge branch 'main' into lp_ext_merge
afeldman-nm 699768a
Merge branch 'lp_ext' into lp_ext_py
afeldman-nm ee88fdf
Merge branch 'main' into lp_ext_py
afeldman-nm 6a405ab
first pass at lp loading system
afeldman-nm 0de1e73
wip
afeldman-nm ef51732
Merge branch 'main' into lp_ext_py
afeldman-nm c8e8671
loading logitsprocs
afeldman-nm 52146dc
refactor
afeldman-nm e79f9ad
lp tests passing
afeldman-nm 4c16135
Merge branch 'main' into lp_ext_py
afeldman-nm 2e330e1
refactor
afeldman-nm 7a60363
logitsprocs
afeldman-nm 18129b4
example w/ dummy logitproc
afeldman-nm e73c00c
refactor
afeldman-nm f612fcf
Merge branch 'main' into lp_ext
afeldman-nm 4af5159
entrypoint example
afeldman-nm be7177a
cli arg
afeldman-nm 0ad8b1c
removed regex
afeldman-nm c21a2ec
fqn/entrypoint examples
afeldman-nm 4730d7a
cli tests
afeldman-nm 1617747
Merge branch 'main' into lp_ext
afeldman-nm 1784079
Merge branch 'main' into lp_ext_merge
afeldman-nm f078ce7
tail end of merge
afeldman-nm 129479a
Merge branch 'main' into lp_ext_merge
afeldman-nm ac1509f
refactor
afeldman-nm 5b85255
wip
afeldman-nm d7499db
Merge branch 'main' into lp_ext_merge
afeldman-nm ee74904
all lp plugins are loaded; can pass lp types to LLM; refactor
afeldman-nm be9e750
Merge branch 'main' into lp_ext_merge
afeldman-nm 4f80bee
unit test fix
afeldman-nm 683b99f
typo
afeldman-nm f7ee5ee
refactor
afeldman-nm ad08c45
refactor
afeldman-nm bbabe50
abstract __init__
abf149 9e88f37
fqn
abf149 ae55b2e
merge
abf149 d08a89d
type checking
abf149 e7cb8e1
merge
afeldman-nm aae02a0
small fix
afeldman-nm 2d807af
fixes
afeldman-nm c0cdd27
merge
afeldman-nm 2a79d4a
merge
afeldman-nm 7c49fe1
Merge branch 'main' into lp_ext_merge
afeldman-nm 84112c0
Merge branch 'lp_ext_merge' into lp_ext
afeldman-nm bd243c9
refactor
afeldman-nm 07d6056
fix
afeldman-nm 83eca33
fix test bug
afeldman-nm 6a4597b
cli test works again
afeldman-nm 9d2156c
Merge branch 'main' into lp_ext_merge
afeldman-nm 8c2d16c
LLM entrypoint testing
afeldman-nm 35999c8
Merge branch 'main' into lp_ext_merge
afeldman-nm e6123e7
cli entrypoints test
afeldman-nm da8aa76
fixed example
afeldman-nm 12d48a7
adding prompt tokens to added requests
afeldman-nm c4a76be
Merge branch 'main' into lp_ext_merge
afeldman-nm 4a57631
initial feedback
afeldman-nm 6697a30
Merge branch 'main' into lp_ext_merge
afeldman-nm 5c350a3
wip
afeldman-nm 4aa5c86
merge load.py into __init__.py
afeldman-nm d3099b4
refactor
afeldman-nm f6cbcad
resetting test.txt
afeldman-nm ffbc6f2
logitsprocs in input batch
afeldman-nm ea3c970
Merge branch 'main' into lp_ext_merge
afeldman-nm 90472d8
merge
afeldman-nm 961c863
wip feedback
afeldman-nm c50b51a
Merge branch 'main' into lp_ext_merge
afeldman-nm 89618f1
prompt token ids ordering
afeldman-nm File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,73 @@ | ||
# SPDX-License-Identifier: Apache-2.0 | ||
# SPDX-FileCopyrightText: Copyright contributors to the vLLM project | ||
|
||
"""This example demonstrates instantiating vLLM with a custom logits processor | ||
class object. | ||
|
||
For testing purposes, a dummy logits processor is employed which, if | ||
`target_token` is passed as a keyword argument to `SamplingParams.extra_args`, | ||
will mask out all tokens except `target_token`. | ||
|
||
A batch is constructed with `temperature=0.0` and 50% of requests specifying | ||
`target_token`, and for these requests - and *only* these requests - we | ||
expect the `target_token` to be decoded in each step, yielding an output | ||
similar to that shown below: | ||
|
||
Generated Outputs: | ||
------------------------------------------------------------ | ||
Prompt: 'Hello, my name is' | ||
Output: " ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' '" | ||
------------------------------------------------------------ | ||
Prompt: 'The president of the United States is' | ||
Output: " not a racist. He is a racist.\nHe's a racist because he" | ||
------------------------------------------------------------ | ||
Prompt: 'The capital of France is' | ||
Output: ' also also also also also also also also also also also also also | ||
also also also' | ||
------------------------------------------------------------ | ||
Prompt: 'The future of AI is' | ||
Output: ' in the hands of the people.\n\nThe future of AI is in the' | ||
------------------------------------------------------------ | ||
""" | ||
|
||
from vllm import LLM, SamplingParams | ||
from vllm.test_utils import DummyLogitsProcessor | ||
|
||
# Sample prompts. | ||
prompts = [ | ||
"Hello, my name is", | ||
"The president of the United States is", | ||
"The capital of France is", | ||
"The future of AI is", | ||
] | ||
# Create a mixture of requests which do and don't utilize the dummy logitproc | ||
sampling_params_list = [ | ||
SamplingParams(temperature=0.0, extra_args={"target_token": 128}), | ||
SamplingParams(temperature=0.0), | ||
SamplingParams(temperature=0.0, extra_args={"target_token": 67}), | ||
SamplingParams(temperature=0.0), | ||
] | ||
|
||
|
||
def main(): | ||
# Create an LLM. | ||
llm = LLM( | ||
model="facebook/opt-125m", | ||
logits_processors=[DummyLogitsProcessor], | ||
) | ||
# Generate texts from the prompts. | ||
# The output is a list of RequestOutput objects | ||
# that contain the prompt, generated text, and other information. | ||
outputs = llm.generate(prompts, sampling_params_list) | ||
# Print the outputs. | ||
print("\nGenerated Outputs:\n" + "-" * 60) | ||
for output in outputs: | ||
prompt = output.prompt | ||
generated_text = output.outputs[0].text | ||
print(f"Prompt: {prompt!r}") | ||
print(f"Output: {generated_text!r}") | ||
print("-" * 60) | ||
|
||
|
||
if __name__ == "__main__": | ||
main() |
53 changes: 53 additions & 0 deletions
53
examples/online_serving/openai_completion_client_logits_processor.py
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,53 @@ | ||
# SPDX-License-Identifier: Apache-2.0 | ||
# SPDX-FileCopyrightText: Copyright contributors to the vLLM project | ||
|
||
import argparse | ||
|
||
from openai import OpenAI | ||
|
||
# Modify OpenAI's API key and API base to use vLLM's API server. | ||
openai_api_key = "EMPTY" | ||
openai_api_base = "http://localhost:8000/v1" | ||
|
||
|
||
def parse_args(): | ||
parser = argparse.ArgumentParser(description="Client for vLLM API server") | ||
parser.add_argument( | ||
"--stream", action="store_true", help="Enable streaming response" | ||
) | ||
return parser.parse_args() | ||
|
||
|
||
def main(args): | ||
client = OpenAI( | ||
# defaults to os.environ.get("OPENAI_API_KEY") | ||
api_key=openai_api_key, | ||
base_url=openai_api_base, | ||
) | ||
|
||
models = client.models.list() | ||
model = models.data[0].id | ||
|
||
# Completion API | ||
completion = client.completions.create( | ||
model=model, | ||
prompt="A robot may not injure a human being", | ||
echo=False, | ||
n=2, | ||
stream=args.stream, | ||
logprobs=3, | ||
) | ||
|
||
print("-" * 50) | ||
print("Completion results:") | ||
if args.stream: | ||
for c in completion: | ||
print(c) | ||
else: | ||
print(completion) | ||
print("-" * 50) | ||
|
||
|
||
if __name__ == "__main__": | ||
args = parse_args() | ||
main(args) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,118 @@ | ||
# SPDX-License-Identifier: Apache-2.0 | ||
# SPDX-FileCopyrightText: Copyright contributors to the vLLM project | ||
|
||
import random | ||
|
||
import openai # use the official client for correctness check | ||
import pytest | ||
import pytest_asyncio | ||
|
||
from tests.utils import RemoteOpenAIServer | ||
from tests.v1.sample.logits_processors.utils import (DUMMY_LOGITPROC_ARG, | ||
MAX_TOKENS, MODEL_NAME, | ||
TEMP_GREEDY, prompts) | ||
from vllm.test_utils import DUMMY_LOGITPROC_FQCN | ||
|
||
|
||
@pytest.fixture(scope="module") | ||
def default_server_args(): | ||
return [ | ||
# use half precision for speed and memory savings in CI environment | ||
"--dtype", | ||
"bfloat16", | ||
"--max-model-len", | ||
"2048", | ||
"--max-num-seqs", | ||
"128", | ||
"--enforce-eager" | ||
] | ||
|
||
|
||
@pytest.fixture(scope="function", | ||
params=[[], ["--logits-processors", DUMMY_LOGITPROC_FQCN]]) | ||
def server(default_server_args, request, monkeypatch): | ||
"""Server cli arg list is parameterized by logitproc source: either fully- | ||
qualified class name (FQCN) specified by `--logits-processors`, or | ||
entrypoint. | ||
|
||
Entrypoint requires no cli argument, but for testing purposes an | ||
environment variable must be set to mock a dummy logit processor entrypoint | ||
""" | ||
monkeypatch.setenv("VLLM_ENABLE_V1_MULTIPROCESSING", "1") | ||
if request.param: | ||
# Append FQCN argument | ||
default_server_args = default_server_args + request.param | ||
else: | ||
# Enable mock logit processor entrypoint | ||
monkeypatch.setenv("VLLM_MOCK_LP_ENTRYPOINT", "1") | ||
|
||
with RemoteOpenAIServer(MODEL_NAME, default_server_args) as remote_server: | ||
yield remote_server | ||
|
||
|
||
@pytest_asyncio.fixture | ||
async def client(server): | ||
async with server.get_async_client() as async_client: | ||
yield async_client | ||
|
||
|
||
api_kwargs = { | ||
"temperature": TEMP_GREEDY, | ||
"max_tokens": MAX_TOKENS, | ||
"logprobs": 0, | ||
} | ||
|
||
extra_body_kwargs = {"vllm_xargs": {DUMMY_LOGITPROC_ARG: 128}} | ||
|
||
|
||
@pytest.mark.asyncio | ||
@pytest.mark.parametrize( | ||
"model_name", | ||
[MODEL_NAME], | ||
) | ||
async def test_custom_logitsprocs_cli(client: openai.AsyncOpenAI, | ||
model_name: str): | ||
"""Test CLI interface for passing custom logitsprocs | ||
|
||
Launch vLLM OpenAI-compatible server with CLI argument to loads a custom | ||
logitproc that has a well-defined behavior (mask out all tokens except one | ||
`target_token`). Logitproc is specified by fully-qualified class name (FQCN) | ||
|
||
Pass in requests, 50% of which pass a `target_token` value | ||
in through `extra_body["vllm_xargs"]`, 50% of which do not. | ||
|
||
Validate that requests which activate the custom logitproc, only output | ||
`target_token` | ||
""" | ||
use_dummy_logitproc = True | ||
for prompt in prompts: | ||
# Send vLLM API request; for some requests, activate dummy logitproc | ||
kwargs = { | ||
**api_kwargs, | ||
} | ||
if use_dummy_logitproc: | ||
target_token = random.choice([128, 67]) | ||
# For requests which activate the dummy logitproc, choose one of | ||
# two `target_token` values which are known not to be EOS tokens | ||
kwargs["extra_body"] = { | ||
"vllm_xargs": { | ||
DUMMY_LOGITPROC_ARG: target_token | ||
} | ||
} | ||
batch = await client.completions.create( | ||
model=model_name, | ||
prompt=prompt, | ||
**kwargs, | ||
) | ||
|
||
if use_dummy_logitproc: | ||
# Only for requests which activate dummy logitproc - validate that | ||
# only `target_token` is generated | ||
choices: openai.types.CompletionChoice = batch.choices | ||
toks = choices[0].logprobs.tokens | ||
if not all([x == toks[0] for x in toks]): | ||
raise AssertionError( | ||
f"Generated {toks} should all be {toks[0]}") | ||
|
||
# Alternate whether to activate dummy logitproc for each request | ||
use_dummy_logitproc = not use_dummy_logitproc |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.