Extracting a structured (pydantic) output from documents #28371

idan-ben-ami · 2024-11-26T22:38:01Z

idan-ben-ami
Nov 26, 2024

Checked other resources

I added a very descriptive title to this question.
I searched the LangChain documentation with the integrated search.
I used the GitHub search to find a similar question and didn't find it.

Commit to Help

I commit to help with one of those options 👆

Example Code

from typing import Optional

from langchain.chains.combine_documents import create_stuff_documents_chain
from langchain.prompts import ChatPromptTemplate
from langchain_core.documents import Document
from langchain_openai import ChatOpenAI
from pydantic import BaseModel, Field


class Person(BaseModel):
    """Information about a person."""

    name: Optional[str] = Field(default=None, description="The name of the person")
    hair_color: Optional[str] = Field(
        default=None, description="The color of the person's hair if known"
    )
    height_in_meters: Optional[str] = Field(
        default=None, description="Height measured in meters"
    )

class People(BaseModel):
    """A collection of people."""

    people: list[Person] = Field(
        default_factory=list, description="A list of people"
    )

llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)
prompt = ChatPromptTemplate.from_messages([
    ("system", "You are an expert model excelling at extracting human information\n{context}"),
])
structured_llm = llm.with_structured_output(schema=People)
chain = create_stuff_documents_chain(structured_llm, prompt)

docs = [Document(metadata={"team": "dev"}, page_content="The dev team corresponds of Bob and Alice. Bob is 5'8 with a black hair. Alice has the same hair color but is 20cm shorter than Bob")]
chain.invoke({"context": docs})

Description

The problem is that when creating the chain with create_stuff_documents_chain it adds an unwanted StrOutputParser (output_parser: Output parser. Defaults to StrOutputParser.), and since I already use the with_structured_output which adds a PydanticOutputParser - the input to the (unwanted) StrOutputParser is a Pydantic instance rather than a string.

I ended up patching the "create_stuff_documents_chain" to ignore output_parser when not supplied (or worse - reimplement it in my code).

What is the right way of using it?

System Info

System Information

OS: Darwin
OS Version: Darwin Kernel Version 24.1.0: Thu Oct 10 21:03:15 PDT 2024; root:xnu-11215.41.3~2/RELEASE_ARM64_T6000
Python Version: 3.12.7 (main, Nov 22 2024, 09:38:06) [Clang 16.0.0 (clang-1600.0.26.4)]

Package Information

langchain_core: 0.3.21
langchain: 0.3.8
langchain_community: 0.3.8
langsmith: 0.1.146
langchain_ollama: 0.2.0
langchain_openai: 0.2.9
langchain_text_splitters: 0.3.2
langgraph_sdk: 0.1.36

Answered by nitinahuja

May 11, 2025

Ran into he same issue and here is what I did to override the default StrOutputParser
Ensure that the model you use has tool calling.

custom_parser = lambda x: x
stuff_chain = create_stuff_documents_chain(llm, prompt=my_prompt, output_parser=custom_parser)
# Then invoke normally - this replaces the default StrIOutputParser with this no-op lambda

View full answer

SetonLiang · 2024-11-27T11:45:12Z

SetonLiang
Nov 27, 2024

chain = prompt | llm.with_structured_output(schema=People)
You can use this chain to invoke.

2 replies

idan-ben-ami Nov 29, 2024
Author

This solution lacks the documents combining and formatting, and relies on the repr of the documents. It also lacks the validation of the "context" input (which is not dynamically on invocation).

SetonLiang Nov 29, 2024

Actually I use this method and get Entities list successfully.

nitinahuja · 2025-05-11T00:51:36Z

nitinahuja
May 11, 2025

Ran into he same issue and here is what I did to override the default StrOutputParser
Ensure that the model you use has tool calling.

custom_parser = lambda x: x
stuff_chain = create_stuff_documents_chain(llm, prompt=my_prompt, output_parser=custom_parser)
# Then invoke normally - this replaces the default StrIOutputParser with this no-op lambda

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Extracting a structured (pydantic) output from documents #28371

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 2 comments 2 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Extracting a structured (pydantic) output from documents #28371

Uh oh!

Uh oh!

idan-ben-ami Nov 26, 2024

Checked other resources

Commit to Help

Example Code

Description

System Info

System Information

Package Information

Replies: 2 comments · 2 replies

Uh oh!

SetonLiang Nov 27, 2024

Uh oh!

idan-ben-ami Nov 29, 2024 Author

Uh oh!

SetonLiang Nov 29, 2024

Uh oh!

nitinahuja May 11, 2025

idan-ben-ami
Nov 26, 2024

Replies: 2 comments 2 replies

SetonLiang
Nov 27, 2024

idan-ben-ami Nov 29, 2024
Author

nitinahuja
May 11, 2025