Replies: 2 comments 2 replies
-
Hey @VStev, I'm here to assist you with any bugs, questions, or contribution inquiries. The issue you're encountering is due to the LangChain-agent workflow not being configured to handle image inputs, while the native Azure OpenAI library can process images through its specific image analysis capabilities. To resolve this, you need to ensure that the LangChain-agent is properly configured to use the Azure Cognitive Services Image Analysis API. Here are the steps to set up and use the
Here is the relevant code snippet for setting up and using the tool: import os
from langchain_community.agent_toolkits import AzureAiServicesToolkit
from langchain import hub
from langchain.agents import AgentExecutor, create_structured_chat_agent
from langchain_openai import OpenAI
# Set environment variables
os.environ["AZURE_AI_SERVICES_KEY"] = "your_azure_cogs_key"
os.environ["AZURE_AI_SERVICES_ENDPOINT"] = "your_azure_cogs_endpoint"
os.environ["AZURE_AI_SERVICES_REGION"] = "your_azure_cogs_region"
# Create the toolkit
toolkit = AzureAiServicesToolkit()
# Initialize the agent
llm = OpenAI(temperature=0)
tools = toolkit.get_tools()
prompt = hub.pull("hwchase17/structured-chat-agent")
agent = create_structured_chat_agent(llm, tools, prompt)
agent_executor = AgentExecutor(
agent=agent, tools=tools, verbose=True, handle_parsing_errors=True
)
# Analyze an image (local or remote)
result = agent_executor.invoke(
{
"input": "Analyze this image: https://path_or_url_to_image"
}
)
print(result) Ensure that the Additionally, the import base64
import httpx
from langchain_core.messages import HumanMessage
from langchain_openai.chat_models.azure import AzureChatOpenAI
# Set environment variables
os.environ["AZURE_OPENAI_API_KEY"] = "your-api-key"
os.environ["AZURE_OPENAI_ENDPOINT"] = "https://your-endpoint.openai.azure.com/"
# Initialize the AzureChatOpenAI model
llm = AzureChatOpenAI(azure_deployment="your-deployment-name")
# Encode the image data
image_url = "https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg"
image_data = base64.b64encode(httpx.get(image_url).content).decode("utf-8")
# Create the message with image content
message = HumanMessage(
content=[
{"type": "text", "text": "describe the weather in this image"},
{
"type": "image_url",
"image_url": {"url": f"data:image/jpeg;base64,{image_data}"},
},
]
)
# Invoke the model
ai_msg = llm.invoke([message])
print(ai_msg.content) This example demonstrates how to use the |
Beta Was this translation helpful? Give feedback.
-
@dosu I already tried base64 image, best case scenario it will say the |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Checked other resources
Commit to Help
Example Code
Description
I'm trying to do a multimodal input using GPT-4o. I tested with the native azure documentation, it works.
But when I integrated it in my langchain-agent workflow, it says it cannot process images and unable to do so. Why is this?
All it says is
I'm sorry for any confusion, but I don't have the capability to view or describe images. However, if you describe the details of the image to me, I'll do my best to help you understand or interpret it!
While using the Azure documentation here produces the desired output (image described)
The prompt template I'm using is basically hwchase react chat
System Info
System Information
Package Information
Optional packages not installed
Other Dependencies
Beta Was this translation helpful? Give feedback.
All reactions