Skip to content

Support different backends on top of outlines_core #1689

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Jul 25, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
56 changes: 56 additions & 0 deletions docs/features/advanced/backends.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,56 @@
---
title: Structured Generation Backends
---

# Structured Generation Backends

Outlines relies on a structured generation backend to control text generation for steerable models such thah they conform to the output type provided. One of those backends is of course `outlines-core`, but you also have access to two other libraries that fulfill the same purpose: `llguidance` and `xgrammar`.

## Overview

To select the backend to use for your generation, provide a value for the `backend` argument when calling a model or a generator.

For instance:

```python
from typing import Literal
import outlines
from transformers import AutoModelForCausalLM, AutoTokenizer

output_type = Literal["Paris", "London", "Rome", "Berlin"]

model = outlines.from_transformers(
AutoModelForCausalLM.from_pretrained("microsoft/Phi-3-mini-4k-instruct"),
AutoTokenizer.from_pretrained("microsoft/Phi-3-mini-4k-instruct")
)

result = model("What is the capital of France?", output_type, backend="llguidance")
print(result) # 'Paris'

generator = outlines.Generaor(model, output_type)
result = generator("What is the capital of France?", backend="xgrammar")
print(result) # 'Paris'
```

If you do not provide a value for the `backend` argument, the default value will be used. The default value depends on the type of output type:

- JSON schema: `outlines_core`
- Regex: `outlines_core`
- Context-free grammar: `llguidance`
- Interegular FSM: `outlines_core`

## Features matrix

As mentioned previously, selecting the structured generation backend is only applicable to steerable models, so `Transformers`, `LlmaCpp` and `MLXLM`. Additionaly, some backends do not support some models within those or some output types.

| | outlines_core | llguidance | xgrammar |
|---|---|---|---|
| **Models** | | | |
| Transformers | ✅ | ✅ | ✅ |
| LlamaCpp | ✅ | ✅ | ❌ |
| MLXLM | ✅ | ✅ | ❌ |
| **Output Types** | | | |
| JSON Schema | ✅ | ✅ | ✅ |
| Regex | ✅ | ✅ | ✅ |
| Grammar | ✅ | ✅ | ✅ |
| FSM | ✅ | ❌ | ❌ |
25 changes: 24 additions & 1 deletion docs/features/models/llamacpp.md
Original file line number Diff line number Diff line change
Expand Up @@ -109,7 +109,7 @@ for chunk in model.stream("Write a short story about a cat.", max_tokens=100):

## Structured Generation

The `LlamaCpp` model supports all output types available in Outlines except for context-free grammars. Simply provide an `output_type` after the prompt when calling the model.
The `LlamaCpp` model supports all output types available in Outlines. Simply provide an `output_type` after the prompt when calling the model.

### Basic Type

Expand Down Expand Up @@ -195,6 +195,29 @@ result = model("Generate a fake social security number.", output_type)
print(result) # '782-32-3789'
```

### Context-free grammar

```python
from outlines.types import CFG
import outlines
from llama_cpp import Llama

output_type = CFG("""
root ::= answer
answer ::= "yes" | "no"
""")

model = outlines.from_llamacpp(
Llama.from_pretrained(
repo_id="TheBloke/Mistral-7B-Instruct-v0.2-GGUF",
filename="mistral-7b-instruct-v0.2.Q5_K_M.gguf",
)
)

result = model("Are you feeling good today?", output_type)
print(result) # 'yes'
```

## Inference Arguments

When calling the model, you can provide optional inference parameters on top of the prompt and the output type. These parameters will be passed on to the `__call__` method of the `llama_cpp.Llama` model. Some common inference arguments include `max_tokens`, `temperature`, `frequency_penalty` and `top_p`.
Expand Down
1 change: 1 addition & 0 deletions mkdocs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -162,6 +162,7 @@ nav:

- Advanced:
- Logits Processors: features/advanced/logits_processors.md
- Structured Generation Backends: features/advanced/backends.md

- API Reference: api_reference/

Expand Down
160 changes: 160 additions & 0 deletions outlines/backends/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,160 @@
"""Module to define the backends in charge of creating logits processors."""

import interegular

from outlines.backends.base import (
BaseBackend,
LogitsProcessorType,
)
from outlines.backends.llguidance import LLGuidanceBackend
from outlines.backends.outlines_core import OutlinesCoreBackend
from outlines.backends.xgrammar import XGrammarBackend
from outlines.models import SteerableModel


CFG_DEFAULT_BACKEND = "llguidance"
FSM_DEFAULT_BACKEND = "outlines_core"
JSON_SCHEMA_DEFAULT_BACKEND = "outlines_core"
REGEX_DEFAULT_BACKEND = "outlines_core"


def _get_backend(backend_name: str, model: SteerableModel) -> BaseBackend:
"""Create a Backend instance.

Parameters
----------
backend_name: str
The name of the backend to get.
model: Model
The Outlines model of the user.

Returns
-------
backend: BaseBackend
The backend instance.

"""
if backend_name == "outlines_core":
return OutlinesCoreBackend(model)
elif backend_name == "xgrammar":
return XGrammarBackend(model)
elif backend_name == "llguidance":
return LLGuidanceBackend(model)
else:
raise ValueError(f"Backend {backend_name} not supported")


def get_json_schema_logits_processor(
backend_name: str | None,
model: SteerableModel,
json_schema: str,
) -> LogitsProcessorType:
"""Create a logits processor from a JSON schema.

Parameters
----------
backend_name: str | None
The name of the backend to use.
model: Model
The Outlines model of the user.
json_schema: str
The JSON schema to create a logits processor from.

Returns
-------
LogitsProcessorType
The logits processor.

"""
backend = _get_backend(
backend_name or JSON_SCHEMA_DEFAULT_BACKEND,
model,
)
return backend.get_json_schema_logits_processor(json_schema)


def get_regex_logits_processor(
backend_name: str | None,
model: SteerableModel,
regex: str,
) -> LogitsProcessorType:
"""Create a logits processor from a regex.

Parameters
----------
backend_name: str | None
The name of the backend to use.
model: Model
The Outlines model of the user.
regex: str
The regex to create a logits processor from.

Returns
-------
LogitsProcessorType
The logits processor.

"""
backend = _get_backend(
backend_name or REGEX_DEFAULT_BACKEND,
model,
)
return backend.get_regex_logits_processor(regex)


def get_cfg_logits_processor(
backend_name: str | None,
model: SteerableModel,
grammar: str,
) -> LogitsProcessorType:
"""Create a logits processor from a context-free grammar.

Parameters
----------
backend_name: str | None
The name of the backend to use.
model: Model
The Outlines model of the user.
grammar: str
The context-free grammar to create a logits processor from.

Returns
-------
LogitsProcessorType
The logits processor.

"""
backend = _get_backend(
backend_name or CFG_DEFAULT_BACKEND,
model,
)
return backend.get_cfg_logits_processor(grammar)


def get_fsm_logits_processor(
backend_name: str | None,
model: SteerableModel,
fsm: interegular,
) -> LogitsProcessorType:
"""Create a logits processor from an interegular FSM.

Parameters
----------
backend_name: str | None
The name of the backend to use.
model: Model
The Outlines model of the user.
fsm: interegular.fsm.FSM
The interegular FSM to create a logits processor from.

Returns
-------
LogitsProcessorType
The logits processor.

"""
backend = _get_backend(
backend_name or FSM_DEFAULT_BACKEND,
model,
)
return backend.get_fsm_logits_processor(fsm)
88 changes: 88 additions & 0 deletions outlines/backends/base.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,88 @@
"""Base class for all backends."""

from abc import ABC, abstractmethod
from typing import Any

from interegular.fsm import FSM


LogitsProcessorType = Any


class BaseBackend(ABC):
"""Base class for all backends.

The subclasses must implement methods that create a logits processor
from a JSON schema, regex, CFG or FSM.

"""

@abstractmethod
def get_json_schema_logits_processor(
self, json_schema: str
) -> LogitsProcessorType:
"""Create a logits processor from a JSON schema.

Parameters
----------
json_schema: str
The JSON schema to create a logits processor from.

Returns
-------
LogitsProcessorType
The logits processor.

"""
...

@abstractmethod
def get_regex_logits_processor(self, regex: str) -> LogitsProcessorType:
"""Create a logits processor from a regex.

Parameters
----------
regex: str
The regex to create a logits processor from.

Returns
-------
LogitsProcessorType
The logits processor.

"""
...

@abstractmethod
def get_cfg_logits_processor(self, grammar: str) -> LogitsProcessorType:
"""Create a logits processor from a context-free grammar.

Parameters
----------
grammar: str
The context-free grammar to create a logits processor from.

Returns
-------
LogitsProcessorType
The logits processor.

"""
...

@abstractmethod
def get_fsm_logits_processor(self, fsm: FSM) -> LogitsProcessorType:
"""Create a logits processor from an interegular FSM.

Parameters
----------
fsm: interegular.fsm.FSM
The interegular FSM to create a logits processor from.

Returns
-------
LogitsProcessorType
The logits processor.

"""
...
Loading
Loading