Skip to content

Feature: support structured generation through Outlines #101

@RobinPicard

Description

@RobinPicard

Adding support to structured generation to LangExtract would make a lot of sense to make sure the model's output conforms to the desired format of the user. For that purpose, I propose integrating with the Outlines library that specializes in structured generation.

I think there are several reasons supporting the idea of relying on Outlines instead of implementing it ourselves in LangExtract:

  • Managing the output formats specific to each model is cumbersome to implement and maintain for LangExtract.
  • The diversity in expected formats and keyword arguments means that users cannot easily switch from one model to another while Outlines proposes a unified interface for structured generation.
  • Local models (transformers, llama_cpp, mlx...) do not have an easy to use argument for structured generation and instead rely on providing a logits processor. That means that LangExtract would have to handle the creation of the logits processor. Instead, Outlines would take care of that and the user could just provide the same output format they would give to an API-based model.
  • Outlines exposes a high-level Python interface that is convenient to work with.

Let me know if you think it's something worth looking into. I would be glad to open a PR for it.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions