Skip to content

Commit 4d46368

Browse files
authored
Respect GUIDELLM__PREFERRED_ROUTE during backend validation (#223)
`Backend.validate()` always issued a smoke-test call to the legacy /v1/completions endpoint, even when the caller specified `GUIDELLM__PREFERRED_ROUTE=chat_completions`. This breaks validation against deployments that expose only the chat-completions route. The PR makes backend validation honor the `GUIDELLM__PREFERRED_ROUTE` setting. Instead of always using the /v1/completions endpoint, it now chooses between text_completions and chat_completions based on the configured preference.
1 parent 4a422e4 commit 4d46368

File tree

1 file changed

+15
-4
lines changed

1 file changed

+15
-4
lines changed

src/guidellm/backend/backend.py

Lines changed: 15 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,7 @@
77
from PIL import Image
88

99
from guidellm.backend.response import ResponseSummary, StreamingTextResponse
10+
from guidellm.config import settings
1011

1112
__all__ = [
1213
"Backend",
@@ -129,10 +130,20 @@ async def validate(self):
129130
if not models:
130131
raise ValueError("No models available for the backend")
131132

132-
async for _ in self.text_completions(
133-
prompt="Test connection", output_token_count=1
134-
): # type: ignore[attr-defined]
135-
pass
133+
# Use the preferred route defined in the global settings when performing the
134+
# validation request. This avoids calling an unavailable endpoint (ie
135+
# /v1/completions) when the deployment only supports the chat completions
136+
# endpoint.
137+
if settings.preferred_route == "chat_completions":
138+
async for _ in self.chat_completions( # type: ignore[attr-defined]
139+
content="Test connection", output_token_count=1
140+
):
141+
pass
142+
else:
143+
async for _ in self.text_completions( # type: ignore[attr-defined]
144+
prompt="Test connection", output_token_count=1
145+
):
146+
pass
136147

137148
await self.reset()
138149

0 commit comments

Comments
 (0)