How to optionally send a multimodal message into a ChatPromptTemplate #23374

jefflavallee · 2024-06-24T22:14:10Z

jefflavallee
Jun 24, 2024

Checked other resources

I added a very descriptive title to this question.
I searched the LangChain documentation with the integrated search.
I used the GitHub search to find a similar question and didn't find it.

Commit to Help

I commit to help with one of those options 👆

Example Code

This works, but only if the base64 data is written in when the prompt is created. Ideally the HumanMessage would only optionally add the image_url when the prompt is invoked.

with open("./Output/sample.jpg", "rb") as image_file:
    base64_encoded_data = base64.b64encode(image_file.read()).decode("UTF-8")
from langchain_core.messages import HumanMessage
chat_prompt_template = ChatPromptTemplate.from_messages(
messages=[
    SystemMessage(content='You are a helpful assistant'),
    HumanMessage(content=[{"type": "image_url", "image_url": {"url": f"data:image/jpeg;base64,{base64_encoded_data}"}},
                            {"type":"text","text":"describe this image"}]
    )])

from langchain_core.output_parsers import StrOutputParser
output_parser = StrOutputParser()
chain = chat_prompt_template | llm | output_parser
result = chain.invoke({})

Description

I'm trying to pass a multimodal message into my LLM, specifically Claude 3.5, but I'm unsure how to pass a base64 value as a variable from the invoke method.
When I hard-code the base64 string into the prompt, as above within the f-string, it works fine.

chat_prompt_template = ChatPromptTemplate.from_messages(
messages=[
SystemMessage(content='You are a helpful assistant'),
HumanMessage(content=[{"type": "image_url", "image_url": {"url": f"data:image/jpeg;base64,{base64_encoded_data}"}},
   {"type":"text","text":"describe this image"}]
    )])
...
result = chain.invoke({})
print(result)
# This image shows a close-up portrait of an adorable black Labrador puppy....

However when I use a template value like below i get a message indicating "anthropic.BadRequestError: Error code: 400 - {'type': 'error', 'error': {'type': 'invalid_request_error', 'message': 'messages.0.content.0.image.source.base64: invalid base64 data'}}"

chat_prompt_template = ChatPromptTemplate.from_messages(
messages=[
SystemMessage(content='You are a helpful assistant'),
HumanMessage(content=[{"type": "image_url", "image_url": {"url": "data:image/jpeg;base64,img_base64}"}},
   {"type":"text","text":"describe this image"}]
    )])
...
result = chain.invoke({"img_base64":base64_encoded_data})
# anthropic.BadRequestError: Error code: 400 - {'ty

Ultimately I'd like to be able to pass the whole human message into the prompt when I invoke it which may or may not include an image. So something like

chat_prompt_template = ChatPromptTemplate.from_messages(
messages=[
    SystemMessage(content='You are a helpful assistant'),
    HumanMessage(content="{human_msg}"
....
result = chain.invoke({"human_msg":[{"type": "image_url", "image_url": {"url": "data:image/jpeg;base64,{base64_encoded_data}"}},
                            {"type":"text","text":"describe this image"}]})

System Info

System Information

OS: Linux
OS Version: #1 SMP PREEMPT_DYNAMIC PMX 6.5.13-1 (2024-02-05T13:50Z)
Python Version: 3.10.12 (main, Nov 20 2023, 15:14:05) [GCC 11.4.0]

Package Information

langchain_core: 0.2.7
langchain: 0.2.5
langchain_community: 0.2.5
langsmith: 0.1.77
langchain_anthropic: 0.1.15
langchain_aws: 0.1.7
langchain_experimental: 0.0.61
langchain_openai: 0.1.8
langchain_text_splitters: 0.2.1
langchainhub: 0.1.20
langgraph: 0.0.69

Packages not installed (Not Necessarily a Problem)

The following packages were not found:

langserve

@jefflavallee · 2024-06-24T22:17:45Z

dosubot[bot]
bot Jun 24, 2024

Hey @jefflavallee! I'm here to help you with any bugs, questions, or contributions you have in mind. Let's work together to solve this!

To optionally send a multimodal message into a ChatPromptTemplate in LangChain, allowing the base64 image data to be passed as a variable when invoking the prompt, you can follow this approach:

Define the template with placeholders:
Create a ChatPromptTemplate with placeholders for the dynamic content.
Format the template with dynamic values:
Use the format_messages method to dynamically assign the values when invoking the prompt.

Here's an example:

import base64
from langchain_core.prompts.chat import ChatPromptTemplate
from langchain_core.messages import SystemMessage, HumanMessage

# Read and encode the image
with open("./Output/sample.jpg", "rb") as image_file:
    base64_encoded_data = base64.b64encode(image_file.read()).decode("UTF-8")

# Define the template with placeholders
chat_prompt_template = ChatPromptTemplate.from_messages(
    messages=[
        SystemMessage(content='You are a helpful assistant'),
        HumanMessage(content=[
            {"type": "image_url", "image_url": "data:image/jpeg;base64,{img_base64}"},
            {"type": "text", "text": "describe this image"}
        ])
    ]
)

# Format the template with the dynamic base64 value
formatted_messages = chat_prompt_template.format_messages(img_base64=base64_encoded_data)

# Create the chain with the template, the LLM, and the output parser
chain = formatted_messages | llm | output_parser

# Invoke the chain
result = chain.invoke({})
print(result)

This approach ensures that the base64 image data is dynamically assigned when invoking the prompt, avoiding the need to hard-code it into the template.

Additionally, if you want to optionally include the image in the message, you can conditionally format the message content based on whether the image data is provided:

import base64
from langchain_core.prompts.chat import ChatPromptTemplate
from langchain_core.messages import SystemMessage, HumanMessage

# Read and encode the image
with open("./Output/sample.jpg", "rb") as image_file:
    base64_encoded_data = base64.b64encode(image_file.read()).decode("UTF-8")

# Define the template with placeholders
chat_prompt_template = ChatPromptTemplate.from_messages(
    messages=[
        SystemMessage(content='You are a helpful assistant'),
        HumanMessage(content="{human_msg}")
    ]
)

# Create the human message content dynamically
human_msg = [
    {"type": "image_url", "image_url": f"data:image/jpeg;base64,{base64_encoded_data}"},
    {"type": "text", "text": "describe this image"}
]

# Format the template with the dynamic human message
formatted_messages = chat_prompt_template.format_messages(human_msg=human_msg)

# Create the chain with the template, the LLM, and the output parser
chain = formatted_messages | llm | output_parser

# Invoke the chain
result = chain.invoke({})
print(result)

This way, you can pass the entire human message, including the image, as a variable when invoking the prompt ^[1]^[2]^[3].

To continue talking to Dosu, mention @dosu.

2 replies

jefflavallee Jun 25, 2024
Author

@dosu
How would this work if I wanted to pass the chain into an agent_executor so that it can interpret images directly rather than relying on a custom tool.
So for instance,

import base64
from langchain_core.prompts.chat import ChatPromptTemplate
from langchain_core.messages import SystemMessage, HumanMessage
# Read and encode the image

# Define the template with placeholders
chat_prompt_template = ChatPromptTemplate.from_messages(
    messages=[
        SystemMessage(content='You are a helpful assistant'),
        HumanMessage(content="{input}")
    ]
)

from langchain.agents import create_tool_calling_agent, AgentExecutor
agent = create_tool_calling_agent(llm, tools, prompt)
agent_executor = AgentExecutor(agent=agent, tools=tools, memory=memory, verbose=True, handle_parsing_errors=True, return_intermediate_steps=True)

...
# Some other steps where a message is provided by the user and it was determined there was an image attached.  It's stored in /Output/sample.jpg
# msg from user: "Describe this image" stored in user_text
# attached file path: /Output/sample.jpg stored in path

# Read and encode the image
with open(path, "rb") as image_file:
    base64_encoded_data = base64.b64encode(image_file.read()).decode("UTF-8")

ret=agent_executor.invoke({"input":[
            {"type": "image_url", "image_url": "data:image/jpeg;base64,{img_base64}"},
            {"type": "text", "text": user_text}
        ]})

In this case the input structure is not known until very late in the processing, and again the message being passed in may or may not contain an image; it's typically just text.
Is there a way to pass the full message from the agent executor all the way up to the llm chatprompttemplate?

Thanks

dosubot[bot] bot Jun 25, 2024

To pass the full message, including both text and image data, from the AgentExecutor to the LLM ChatPromptTemplate, you can use the RunnableLambda to format the inputs and then pass them into your prompt template. Here is how you can achieve this:

Define the template with placeholders:
Create a ChatPromptTemplate with placeholders for the dynamic content.
Format the template with dynamic values:
Use the RunnableLambda to dynamically assign the values when invoking the prompt.

Here's an example:

import base64
from langchain_core.prompts.chat import ChatPromptTemplate
from langchain_core.messages import SystemMessage, HumanMessage
from langchain_core.runnables import RunnableLambda, RunnablePassthrough
from langchain_core.output_parsers import StrOutputParser
from langchain_community.chat_models import ChatOpenAI
from langchain.agents import create_tool_calling_agent, AgentExecutor

# Define the template with placeholders
chat_prompt_template = ChatPromptTemplate.from_messages(
    messages=[
        SystemMessage(content='You are a helpful assistant'),
        HumanMessage(content="{input}")
    ]
)

# Function to format the input message
def format_input(data_dict):
    messages = []
    if "image_base64" in data_dict:
        messages.append({"type": "image_url", "image_url": f"data:image/jpeg;base64,{data_dict['image_base64']}"})
    if "text" in data_dict:
        messages.append({"type": "text", "text": data_dict["text"]})
    return {"input": messages}

# Create the agent
llm = ChatOpenAI(temperature=0, model="gpt-4-vision-preview", max_tokens=1024)
tools = []  # Define your tools here if any
agent = create_tool_calling_agent(llm, tools, chat_prompt_template)

# Create the agent executor
agent_executor = AgentExecutor(agent=agent, tools=tools, memory=None, verbose=True, handle_parsing_errors=True, return_intermediate_steps=True)

# Read and encode the image
path = "./Output/sample.jpg"
with open(path, "rb") as image_file:
    base64_encoded_data = base64.b64encode(image_file.read()).decode("UTF-8")

# User message
user_text = "Describe this image"

# Invoke the agent executor with the formatted input
ret = agent_executor.invoke({
    "input": format_input({
        "image_base64": base64_encoded_data,
        "text": user_text
    })
})

print(ret)

This approach ensures that the value of "image_base64" and the text are dynamically assigned in your ChatPromptTemplate when using the AgentExecutor. The RunnableLambda is used to format the input message, allowing you to pass the full message, including both text and image data, to the LLM ^[1]^[2]^[3]^[4].

To continue talking to Dosu, mention @dosu.

MrShameer · 2025-02-05T09:08:22Z

MrShameer
Feb 5, 2025

Maybe related or not but a way to pass the image in the tag/fstring is you need to make sure the image is in a list.
Passing it directly in the "{human_msg}" will turn the list into a string and will have problem it to be processed or read.

A way to fix this is using "placeholder" tag.

prompt = ChatPromptTemplate.from_messages(
    [
        ("system", "You are a chatbot"),
        ("human", "{input}"),
        ("placeholder", "{image}"),
    ]
)

agent = prompt | llm

agent.invoke({
    "input":"what is this",
    "image" : [
        ("human", [
            {
                "type": "image_url",
                "image_url": {"url": f"data:image/png;base64,{base64_1}"},
            },
            {
                "type": "image_url",
                "image_url": {"url": f"data:image/png;base64,{base64_2}"},
            },
        ]),
    ]
})

From here you can add multiple image or do any modification before passing into the prompt.

2 replies

Olyray Apr 3, 2025

Damn! This was Godsent. I was stuck for hours!

Thanks!

gutbash Jun 6, 2025

but how to add memory?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

How to optionally send a multimodal message into a ChatPromptTemplate #23374

Uh oh!

{{title}}

Uh oh!

Replies: 2 comments 4 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

How to optionally send a multimodal message into a ChatPromptTemplate #23374

Uh oh!

jefflavallee Jun 24, 2024

Checked other resources

Commit to Help

Example Code

Description

System Info

System Information

Package Information

Packages not installed (Not Necessarily a Problem)

Replies: 2 comments · 4 replies

Uh oh!

dosubot[bot] bot Jun 24, 2024

Uh oh!

jefflavallee Jun 25, 2024 Author

Uh oh!

dosubot[bot] bot Jun 25, 2024

Uh oh!

MrShameer Feb 5, 2025

Uh oh!

Olyray Apr 3, 2025

Uh oh!

gutbash Jun 6, 2025

jefflavallee
Jun 24, 2024

Replies: 2 comments 4 replies

dosubot[bot]
bot Jun 24, 2024

jefflavallee Jun 25, 2024
Author

MrShameer
Feb 5, 2025