Facing problem while using asynchronous streaming with ChatOllama #31632

sourav-eegrab · 2025-06-17T07:48:06Z

sourav-eegrab
Jun 17, 2025

Checked other resources

I added a very descriptive title to this question.
I searched the LangChain documentation with the integrated search.
I used the GitHub search to find a similar question and didn't find it.

Commit to Help

I commit to help with one of those options 👆

Example Code

'''
Langgraph implementation of LLM with images and memory.
'''
from typing import Annotated, Optional

from typing_extensions import TypedDict

from langgraph.graph import StateGraph, START, END
from langgraph.graph.message import add_messages
from langgraph.checkpoint.memory import MemorySaver
import base64
from langchain_ollama import ChatOllama
from langchain_core.messages import HumanMessage, AIMessageChunk
from langchain_core.prompts import PromptTemplate
import asyncio
import time

PROMPT_TEMPLATE = """This is a {plant_name} plant."""

# Define store for thread id
store = {}


class State(TypedDict):
    # Messages have the type "list". The `add_messages` function
    # in the annotation defines how this state key should be updated
    # (in this case, it appends messages to the list, rather than overwriting them)
    messages: Annotated[list, add_messages]


graph_builder = StateGraph(State)
memory = MemorySaver()


model_name = "gemma3:12b"


def filter_messages(messages: list):
    # This is very simple helper function which only ever uses the last message
    return messages[-10:]


# Load and encode image
def encode_image_base64(image_path: str) -> str:
    """
    Function to encode image to base64 from path.
    """
    with open(image_path, "rb") as image_file:
        return base64.b64encode(image_file.read()).decode("utf-8")


def image_analyzer(state: State) -> State:
    """
    Analyzes images and text.
    """
    state["messages"] = filter_messages(state["messages"])

    # Initialize ChatOllama
    llm = ChatOllama(model=model_name, 
                     num_ctx=1500, 
                     num_gpu=49, 
                     num_thread=4,
                     temperature=0.5, 
                     seed=7,
                     keep_alive=0)

    return {"messages": [llm.invoke(state["messages"])]}


graph_builder.add_node('image_analyzer', image_analyzer)

graph_builder.add_edge(START, 'image_analyzer')
graph_builder.add_edge('image_analyzer', END)


graph = graph_builder.compile(checkpointer=memory)


async def stream_graph_updates(thread_id: str, query_text: Optional[str] = "", plant_name: Optional[str] = "", encoded_image: Optional[str] = None):
    """
    LLM Handler v2, query text, base64 encoded image --> answer
    """
    if thread_id not in store:
        store[thread_id] = time.time()

    if plant_name and encoded_image:

        prompt_template = PromptTemplate.from_template(template=PROMPT_TEMPLATE)
        prompt = prompt_template.format(plant_name=plant_name)

        # Create a message with the image and prompt
        message = HumanMessage(
            content=[
                {
                    "type": "text",
                    "text": prompt
                },
                {
                    "type": "image_url",
                    "image_url": {
                        "url": f"data:image/jpeg;base64,{encoded_image}"
                    }
                }
            ]
        )

    elif encoded_image:
        message = HumanMessage(
            content=[
                {
                    "type": "text",
                    "text": query_text
                },
                {
                    "type": "image_url",
                    "image_url": {
                        "url": f"data:image/jpeg;base64,{encoded_image}"
                    }
                }
            ]
        )

    elif query_text:
        message = HumanMessage(
            content=[
                {
                    "type": "text",
                    "text": query_text
                }
            ]
        )

    
    config = {"configurable": {"thread_id": thread_id}}


    async def generate():
        async for chunk in graph.astream({"messages": message}, config, stream_mode="messages"):
            if isinstance(chunk[0], AIMessageChunk):
                yield chunk[0].content

    return generate()

Description

I’ve successfully achieved asynchronous behavior in the application with num_gpu=48. In this configuration, the system offloads part of the workload to the CPU, which enables async execution—though it results in a slightly slower startup. However, when increasing num_gpu beyond 48, the application switches to synchronous execution but performs noticeably faster. This behavior was observed and validated through parallel runs across multiple systems. It's also worth noting that, in both cases, certain tasks are still being offloaded to the CPU.

How to achieve asynchronous behavior while the model is fully offloaded in the gpu? @baskaryan @hwchase17 any help is highly appriciated.

System configuration:

CPU: Intel Core i7-13700F
GPU: NVIDIA RTX 4070 Ti
RAM: 128 GB DDR5

System Info

System Information

OS: Linux
OS Version: #63~22.04.1-Ubuntu SMP PREEMPT_DYNAMIC Tue Apr 22 19:00:15 UTC 2
Python Version: 3.11.12 (main, Apr 9 2025, 08:55:54) [GCC 11.4.0]

Package Information

langchain_core: 0.3.60
langchain: 0.3.25
langchain_community: 0.3.24
langsmith: 0.3.42
langchain_chroma: 0.2.4
langchain_huggingface: 0.2.0
langchain_ollama: 0.3.2
langchain_text_splitters: 0.3.8
langgraph_sdk: 0.1.66

Optional packages not installed

langserve

Other Dependencies

aiohttp<4.0.0,>=3.8.3: Installed. No version info available.
async-timeout<5.0.0,>=4.0.0;: Installed. No version info available.
chromadb>=1.0.9: Installed. No version info available.
dataclasses-json<0.7,>=0.5.7: Installed. No version info available.
httpx: 0.28.1
httpx-sse<1.0.0,>=0.4.0: Installed. No version info available.
huggingface-hub>=0.30.2: Installed. No version info available.
jsonpatch<2.0,>=1.33: Installed. No version info available.
langchain-anthropic;: Installed. No version info available.
langchain-aws;: Installed. No version info available.
langchain-azure-ai;: Installed. No version info available.
langchain-cohere;: Installed. No version info available.
langchain-community;: Installed. No version info available.
langchain-core<1.0.0,>=0.3.51: Installed. No version info available.
langchain-core<1.0.0,>=0.3.52: Installed. No version info available.
langchain-core<1.0.0,>=0.3.58: Installed. No version info available.
langchain-core<1.0.0,>=0.3.59: Installed. No version info available.
langchain-core>=0.3.60: Installed. No version info available.
langchain-deepseek;: Installed. No version info available.
langchain-fireworks;: Installed. No version info available.
langchain-google-genai;: Installed. No version info available.
langchain-google-vertexai;: Installed. No version info available.
langchain-groq;: Installed. No version info available.
langchain-huggingface;: Installed. No version info available.
langchain-mistralai;: Installed. No version info available.
langchain-ollama;: Installed. No version info available.
langchain-openai;: Installed. No version info available.
langchain-perplexity;: Installed. No version info available.
langchain-text-splitters<1.0.0,>=0.3.8: Installed. No version info available.
langchain-together;: Installed. No version info available.
langchain-xai;: Installed. No version info available.
langchain<1.0.0,>=0.3.25: Installed. No version info available.
langsmith-pyo3: Installed. No version info available.
langsmith<0.4,>=0.1.125: Installed. No version info available.
langsmith<0.4,>=0.1.126: Installed. No version info available.
langsmith<0.4,>=0.1.17: Installed. No version info available.
numpy>=1.26.0;: Installed. No version info available.
numpy>=1.26.2;: Installed. No version info available.
numpy>=2.1.0;: Installed. No version info available.
ollama<1,>=0.4.4: Installed. No version info available.
openai-agents: Installed. No version info available.
opentelemetry-api: 1.33.0
opentelemetry-exporter-otlp-proto-http: Installed. No version info available.
opentelemetry-sdk: 1.33.0
orjson: 3.10.18
packaging: 24.2
packaging<25,>=23.2: Installed. No version info available.
pydantic: 2.11.4
pydantic-settings<3.0.0,>=2.4.0: Installed. No version info available.
pydantic<3.0.0,>=2.7.4: Installed. No version info available.
pydantic>=2.7.4: Installed. No version info available.
pytest: Installed. No version info available.
PyYAML>=5.3: Installed. No version info available.
requests: 2.32.3
requests-toolbelt: 1.0.0
requests<3,>=2: Installed. No version info available.
rich: 14.0.0
sentence-transformers>=2.6.0: Installed. No version info available.
SQLAlchemy<3,>=1.4: Installed. No version info available.
tenacity!=8.4.0,<10,>=8.1.0: Installed. No version info available.
tenacity!=8.4.0,<10.0.0,>=8.1.0: Installed. No version info available.
tokenizers>=0.19.1: Installed. No version info available.
transformers>=4.39.0: Installed. No version info available.
typing-extensions>=4.7: Installed. No version info available.
zstandard: 0.23.0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Facing problem while using asynchronous streaming with ChatOllama #31632

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

Facing problem while using asynchronous streaming with ChatOllama #31632

Uh oh!

Uh oh!

sourav-eegrab Jun 17, 2025

Checked other resources

Commit to Help

Example Code

Description

System Info

System Information

Package Information

Optional packages not installed

Other Dependencies

Replies: 0 comments

sourav-eegrab
Jun 17, 2025