vLLM support for output types and adding mistral.rs as the inference server #1693

torchss · 2025-07-19T19:52:54Z

torchss
Jul 19, 2025

As mentioned at https://dottxt-ai.github.io/outlines/latest/features/models/:

As a result, we have limited control over text generation and some output types are not supported.

However, for vLLM, it seems like all output types are supported, in direct contrast to the other server-based models. Also surprisingly, it uses the OpenAI client while the atual OpenAI server-based model doesn't support anything but JSON Schema. As a result I have 2 questions:

Is the documentation accurate for Features Matrix: https://dottxt-ai.github.io/outlines/latest/features/models/#features-matrix
Can I host any Huggingface model that can be served by vLLM and get full support for all output types?
If so, what's so special about the vLLM integration that allows us to get full support for all output types compared to other server-based models
If I wanted to use https://github.com/EricLBuehler/mistral.rs as the inference server, depending on its OpenAI compatible endpoint, what are some changes in both mistral.rs, outlines (and anything else) that needs to be made to have it support all output types in a similar manner to vLLM (and are there PRs that I could look at for more handson guides)

RE: #3, I assume (without reading the code) that it's using extra parameters from https://docs.vllm.ai/en/stable/features/structured_outputs.html#online-serving-openai-api - it would be good to know more details and get a confirmation

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

vLLM support for output types and adding mistral.rs as the inference server #1693

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

vLLM support for output types and adding mistral.rs as the inference server #1693

Uh oh!

Uh oh!

torchss Jul 19, 2025

Replies: 0 comments

torchss
Jul 19, 2025