vLLM serve wrong output #1525

Martins6 · 2025-04-08T19:25:28Z

Martins6
Apr 8, 2025

As I was testing the vLLM serve from pure Outlines perspective, I've noticed that the outputs have a weird format of the prompt that I've used + the result that I want.

To replicate first run on a Terminal:

python -m outlines.serve.serve --model="microsoft/Phi-3-mini-4k-instruct"

Then run this Python script that depends on requests and Pydantic. I'm using Python3.11 too.

from typing import Any, Type

import requests
from pydantic import BaseModel, Field
from requests.exceptions import RequestException


class APIError(Exception):
    """Custom exception for API related errors"""

    pass


class APIHandler:
    def __init__(self, base_url: str = "http://127.0.0.1:8000"):
        self.base_url = base_url

    def _convert_schema(self, schema: Type[BaseModel]) -> dict[str, Any]:
        """Convert Pydantic model to JSON schema format."""
        return schema.model_json_schema()

    def prepare_request(self, prompt: str, schema: Type[BaseModel]) -> dict[str, Any]:
        """
        Prepare the request payload with proper schema conversion.

        Args:
            prompt: The input text prompt
            schema: A Pydantic model class

        Returns:
            Dict containing the formatted request payload
        """

        return {"prompt": prompt, "schema": self._convert_schema(schema)}

    def generate(self, prompt: str, schema: Type[BaseModel]) -> Any:
        """
        Send generation request to the API and return the response.

        Args:
            prompt: The input text prompt
            schema: A Pydantic model class defining the output structure

        Returns:
            The API response parsed according to the provided schema

        Raises:
            APIError: If the API request fails or returns an error
        """
        try:
            payload = self.prepare_request(prompt, schema)
            response = requests.post(f"{self.base_url}/generate", json=payload)

            if response.status_code != 200:
                raise APIError(
                    f"API request failed with status {response.status_code}: {response.text}"
                )
            return response.json()
            # result = response.json()
            # return schema.model_validate(result)

        except RequestException as e:
            raise APIError(f"Failed to connect to API: {str(e)}")
        except Exception as e:
            raise APIError(f"Unexpected error: {str(e)}")


# Example usage:
if __name__ == "__main__":

    class Person(BaseModel):
        name: str = Field(description="The person's full name")
        age: int = Field(description="The person's age in years")
        sex: str = Field(description="The person's biological sex")

    handler = APIHandler()
    try:
        result = handler.generate("generate a person for me", Person)
        print(result)
        # print("Generated Person:", result.model_dump())
    except APIError as e:
        print(f"Error: {e}")

The result that I get is:

{'text': ['generate a person for me{ "name": "Susan", "age": 51']}

See? I belive the expected result is just:

{'text': [{ "name": "Susan", "age": 51']}

cpfiffer · 2025-04-08T19:30:17Z

cpfiffer
Apr 8, 2025

This is likely the result of missing chat templating (docs here). In general I'm not totally sure why we still provide the outlines serve functionality. I usually refer people to vLLM instead, which supports Outlines as a constrained decoding backend.

0 replies

Martins6 · 2025-04-08T20:20:12Z

Martins6
Apr 8, 2025
Author

From what I've inspected, it seems that it was just a wrong line of code! I've opened this PR to fix this in case you guys want to maintain the vLLM serve @cpfiffer

#1527

But nice, I'll also check out vLLM directly! Thank you very much! 🚀

1 reply

cpfiffer Apr 9, 2025

Excellent! I'll close this for now. Appreciate the pr.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

vLLM serve wrong output #1525

Uh oh!

{{title}}

Uh oh!

Replies: 2 comments 1 reply

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

vLLM serve wrong output #1525

Uh oh!

Martins6 Apr 8, 2025

Replies: 2 comments · 1 reply

Uh oh!

cpfiffer Apr 8, 2025

Uh oh!

Martins6 Apr 8, 2025 Author

Uh oh!

cpfiffer Apr 9, 2025

Martins6
Apr 8, 2025

Replies: 2 comments 1 reply

cpfiffer
Apr 8, 2025

Martins6
Apr 8, 2025
Author