Streaming parsing performance (high cpu load) #27396
Unanswered
domsj-foodpairing
asked this question in
Q&A
Replies: 1 comment
-
How to solve this problem? I used Langgraph astream_events ainvoke, but the CPU was 100%, My chain is v_compt | chat_llm and executed through ainvokewithout parsing operation,and after the execution was completed, the program did not do any business, so the CPU did not decrease and remained at 100% |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Checked other resources
Commit to Help
Example Code
Description
I'm generating some output with an LLM, and parsing it in a streaming manner.
This results in the parser getting invoked MANY times (for each bunch of tokens the LLM emits), and parsing isn't cheap.
Given a large enough output this quickly consumes the entire python thread (presumably while holding the GIL global interpreter lock).
This has a negative impact on other things one tries to do in the same python process.
Note that in my actual code I would additionally be detecting when e.g. a full recipe was emitted by the LLM and emit towards the next step in my pipeline.
It would be nice if this can be throttled somehow. Some ideas
}
received from the LLMobviously the first 2 ideas are quite context dependent. so a solution offering these would need to allow plugging in code or letting the user configure how they want it to be throttled
System Info
$ pip freeze | grep langchain
langchain==0.3.3
langchain-core==0.3.10
langchain-google-vertexai==2.0.4
langchain-text-splitters==0.3.0
openinference-instrumentation-langchain==0.1.28
platform = ubuntu (inside WSL)
Python 3.11.10
Beta Was this translation helpful? Give feedback.
All reactions