Skip to content

Commit e0a1d15

Browse files
daniel-saliblk-chen
authored andcommitted
Fix raw_request extraction in load_aware_call decorator (vllm-project#15382)
Signed-off-by: Daniel Salib <danielsalib@meta.com>
1 parent 5c188cf commit e0a1d15

File tree

1 file changed

+10
-3
lines changed

1 file changed

+10
-3
lines changed

vllm/entrypoints/utils.py

Lines changed: 10 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -68,13 +68,20 @@ def decrement_server_load(request: Request):
6868
def load_aware_call(func):
6969

7070
@functools.wraps(func)
71-
async def wrapper(*args, raw_request: Request, **kwargs):
71+
async def wrapper(*args, **kwargs):
72+
raw_request = kwargs.get("raw_request",
73+
args[1] if len(args) > 1 else None)
74+
75+
if raw_request is None:
76+
raise ValueError(
77+
"raw_request required when server load tracking is enabled")
78+
7279
if not raw_request.app.state.enable_server_load_tracking:
73-
return await func(*args, raw_request=raw_request, **kwargs)
80+
return await func(*args, **kwargs)
7481

7582
raw_request.app.state.server_load_metrics += 1
7683
try:
77-
response = await func(*args, raw_request=raw_request, **kwargs)
84+
response = await func(*args, **kwargs)
7885
except Exception:
7986
raw_request.app.state.server_load_metrics -= 1
8087
raise

0 commit comments

Comments
 (0)