Structured Output: Format Instructions vs with_structured_output #27709

adamboulad · 2024-10-29T13:25:31Z

adamboulad
Oct 29, 2024

Hello All,

I am interested in having a robust way to get structured output from the LLM. I have seen that with_structured_output uses internally bind or bind_tools. However, other methods use the format instructions and provide them to the LLM. which is better and more robust?

torrresagus · 2024-10-30T04:34:02Z

torrresagus
Oct 30, 2024

Hello,

I've been exploring ways to obtain structured output from LLMs and wanted to share my experience on the topic.

While the with_structured_output method, which internally uses bind or bind_tools, is quite effective, I've also found that you can achieve robust structured outputs through prompt engineering. By providing format instructions in the system or assistant prompt, you can guide the LLM to respond in JSON format, and then parse that string into an object using Python or JavaScript.

For example, you could use a prompt like the following:

Please extract all URLs and dates from the following user input.

The response must be a JSON array of objects, each containing a "url" and "date" field.

## OUTPUT - YOU MUST ALWAYS RESPOND IN THIS JSON FORMAT
[
  {"url": "<detected URL>", "date": "<detected date>"},
  // Repeat for each URL and date found
]

Return the response as a plain JSON string without any formatting or code blocks. I only need the raw JSON data in a single-line format.

By specifying the format and reinforcing it in the prompt, the LLM is more likely to produce the structured output you need.

Moreover, one of the advantages of this approach is that you can use LangChain's Prompt Templates to make the JSON structure or certain elements within it variables. This allows you to dynamically generate prompts based on different inputs or contexts, making your solution more flexible and reusable.

For example:

from langchain_core.prompts import PromptTemplate

prompt_template = PromptTemplate.from_template(
    '''Please extract all {entity_type} from the following user input.

The response must be a JSON array containing all detected {entity_type}.

## OUTPUT - YOU MUST ALWAYS RESPOND IN THIS JSON FORMAT
[
  "{{ {entity_type}: \"<detected {entity_type}>\" }}",
  // Repeat for each {entity_type} found
]

Return the response as a plain JSON string without any formatting or code blocks. I only need the raw JSON data in a single-line format.
''')

prompt = prompt_template.invoke({"entity_type": "URLs"})

In this example, the {entity_type} variable within the prompt can be replaced with any entity you wish to extract, such as "URLs", "dates", or "email addresses". This makes your prompt adaptable to different tasks without rewriting the entire prompt.

Additionally, I wanted to mention that there is an experimental class called JsonFormer in LangChain that might interest you. JsonFormer is a library that wraps local Hugging Face models for structured decoding of a subset of the JSON Schema. It works by filling in the structure tokens and then sampling the content tokens from the model.

You can find more information here:

LangChain Documentation - JSONFormer

This could be another avenue to explore for achieving robust structured outputs.

Ultimately, the choice between using methods like with_structured_output (which utilize bind or bind_tools) and providing detailed format instructions in your prompts depends on the complexity of the task and the level of robustness you need. For straightforward tasks, where the expected output is simple and well-defined, structured output methods can be efficient and effective. However, for more complex tasks that require nuanced responses or specific formatting, incorporating more detailed explanations within your prompts may be more viable. Providing examples of the desired response format can also guide the LLM to produce outputs that closely match your requirements for specific tasks.

0 replies

adocherty · 2025-02-04T03:30:47Z

adocherty
Feb 4, 2025

Hi, Not sure if you still are interested in the details of structured outputs, I have a blog post on what I understand about it. Let me know if you find it useful or have any questions.

https://medium.com/@docherty/mastering-structured-output-in-llms-choosing-the-right-model-for-json-output-with-langchain-be29fb6f6675

@torrresagus I hadn't heard of JSONFormer, looks interesting but I don't fully understand what it is doing. How does this compare to JSON mode which constrains generation to JSON only?

0 replies

adamboulad · 2025-02-04T12:07:52Z

adamboulad
Feb 4, 2025
Author

Dear Andrew, I will for sure check it out. Regards.

…

On Tue, Feb 4, 2025, 6:31 AM Andrew Docherty ***@***.***> wrote: Hi, Not sure if you still are interested in the details of structured outputs, I have a blog post on what I understand about it. Let me know if you find it useful or have any questions. ***@***.***/mastering-structured-output-in-llms-choosing-the-right-model-for-json-output-with-langchain-be29fb6f6675 @torrresagus <https://github.com/torrresagus> I hadn't heard of JSONFormer, looks interesting but I don't fully understand what it is doing. How does this compare to JSON mode which constrains generation to JSON only? — Reply to this email directly, view it on GitHub <#27709 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AJ7PKR4ZZW6ILF56I2OCRBT2OAX7ZAVCNFSM6AAAAABQZ3E7PSVHI2DSMVQWIX3LMV43URDJONRXK43TNFXW4Q3PNVWWK3TUHMYTEMBVGAZDKMA> . You are receiving this because you authored the thread.Message ID: ***@***.*** com>

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Structured Output: Format Instructions vs with_structured_output #27709

Uh oh!

{{title}}

Uh oh!

Replies: 3 comments

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Structured Output: Format Instructions vs with_structured_output #27709

Uh oh!

adamboulad Oct 29, 2024

Replies: 3 comments

Uh oh!

torrresagus Oct 30, 2024

Uh oh!

adocherty Feb 4, 2025

Uh oh!

adamboulad Feb 4, 2025 Author

adamboulad
Oct 29, 2024

torrresagus
Oct 30, 2024

adocherty
Feb 4, 2025

adamboulad
Feb 4, 2025
Author