Streaming parsing performance (high cpu load) #27396

domsj-foodpairing · 2024-10-16T15:32:13Z

domsj-foodpairing
Oct 16, 2024

Checked other resources

I added a very descriptive title to this question.
I searched the LangChain documentation with the integrated search.
I used the GitHub search to find a similar question and didn't find it.

Commit to Help

I commit to help with one of those options 👆

Example Code

from langchain_core.prompts import PromptTemplate
from langchain_core.output_parsers import PydanticOutputParser
from langchain_google_vertexai import VertexAI
from pydantic import BaseModel, Field


class Recipe(BaseModel):
    name: str
    description: str
    ingredients: list[str]
    steps: list[str]

class Response(BaseModel):
    recipes: list[tuple[int, Recipe]] = Field(
        description="List of recipes. Each recipe is a tuple with the index and the recipe."
    )

generated_concepts_parser = PydanticOutputParser[Response](pydantic_object=Response)

model = VertexAI(max_output_tokens=8192, model_name="gemini-1.5-flash")

prompt = PromptTemplate(
    template="Generate 20 recipes that are easy to make and healthy. {format_instructions}",
    input_variables=[],
    partial_variables={
        "format_instructions": generated_concepts_parser.get_format_instructions(),
    },
)

chain = prompt | model | generated_concepts_parser

# soon after it starts generating output cpu usage climbs to ~100%
for res in chain.stream({}):
    print(res)

# same in async variant
# (note: while this isn't proper python as is to have top level in a script,
#  you can however execute this directly in ipython)
async for res in chain.astream({}):
    print(res)

# regularly invoking the chain does not consume so much cpu
# as it only parses the output once
chain.invoke({})

Description

I'm generating some output with an LLM, and parsing it in a streaming manner.
This results in the parser getting invoked MANY times (for each bunch of tokens the LLM emits), and parsing isn't cheap.
Given a large enough output this quickly consumes the entire python thread (presumably while holding the GIL global interpreter lock).
This has a negative impact on other things one tries to do in the same python process.

Note that in my actual code I would additionally be detecting when e.g. a full recipe was emitted by the LLM and emit towards the next step in my pipeline.

It would be nice if this can be throttled somehow. Some ideas

only retrigger parsing once there's e.g. an additional } received from the LLM
batch up more LLM replies before retriggering the parsing
make the parsing cheaper/faster, e.g. by making the re-parse incremental (keep some state around)
do (part of) the parsing while not holding the GIL (still wouldn't be ideal, but at least it would mean less impact)

obviously the first 2 ideas are quite context dependent. so a solution offering these would need to allow plugging in code or letting the user configure how they want it to be throttled

System Info

$ pip freeze | grep langchain
langchain==0.3.3
langchain-core==0.3.10
langchain-google-vertexai==2.0.4
langchain-text-splitters==0.3.0
openinference-instrumentation-langchain==0.1.28

platform = ubuntu (inside WSL)

Python 3.11.10

tianshangwuyun · 2025-01-08T02:00:59Z

tianshangwuyun
Jan 8, 2025

How to solve this problem? I used Langgraph astream_events ainvoke, but the CPU was 100%, My chain is v_compt | chat_llm and executed through ainvokewithout parsing operation，and after the execution was completed, the program did not do any business, so the CPU did not decrease and remained at 100%
langgraph==0.2.34

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Streaming parsing performance (high cpu load) #27396

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Select a reply

Uh oh!

Streaming parsing performance (high cpu load) #27396

Uh oh!

Uh oh!

domsj-foodpairing Oct 16, 2024

Checked other resources

Commit to Help

Example Code

Description

System Info

Replies: 1 comment

Uh oh!

Uh oh!

tianshangwuyun Jan 8, 2025

domsj-foodpairing
Oct 16, 2024

tianshangwuyun
Jan 8, 2025