docs/how_to/multimodal_inputs/ #27702
Replies: 9 comments 10 replies
-
HI is there any lcel way of implementation of image input ? |
Beta Was this translation helpful? Give feedback.
-
is there a way to have chat history? and also automatic memory management using chains,runnables? |
Beta Was this translation helpful? Give feedback.
-
LCEL way: messages | llm |
Beta Was this translation helpful? Give feedback.
-
what about audio input? |
Beta Was this translation helpful? Give feedback.
-
If anyone wants to allow the agent to use multimodal output from a tool call (common scenario in complex tasks), for example, let the model capture a screenshot, check this discussion on how to do it: |
Beta Was this translation helpful? Give feedback.
-
This simply does not work with |
Beta Was this translation helpful? Give feedback.
-
Hi all, I'm also facing the same issue that @Albaeld reported. On the llm node I have: messages = prompt + state["messages"]
chat = ChatPromptTemplate.from_messages(messages)
logger.debug("-------------------------------------------------------------------------------->> invoking LLM with config")
logger.debug(f"{chat}")
logger.debug("-------------------------------------------------------------------------------->> invoking LLM...")
response = await model.ainvoke(chat.format_prompt(), config) this outputs: == APP == -------------------------------------------------------------------------------->> invoking LLM with config
== APP == input_variables=[] input_types={} partial_variables={} messages=[SystemMessage(content="You are a concierge agent. You are very helpful and friendly. You are very polite and respectful. You are always ready to help.\n\ncurrent date/time: 2025-06-05 19:32:40\n\nuser context: {'id': '123123123', 'name': 'Bruno Figueiredo', 'avatar': ''}", additional_kwargs={}, response_metadata={}), HumanMessage(content='olá', additional_kwargs={}, response_metadata={}, id='8fa5db18-69b0-4085-9d4e-aa16847da243'), AIMessage(content='Olá, Bruno! 😊 Como posso ajudar você hoje?', additional_kwargs={}, response_metadata={'finish_reason': 'stop', 'model_name': 'gpt-4o-2024-11-20', 'system_fingerprint': 'fp_ee1d74bde0'}, id='run-2df83ba9-a843-4029-a020-0d41465fddf3'), HumanMessage(content=[{'type': 'text', 'text': 'resume'}, {'type': 'file', 'source_type': 'url', 'url': 'https://www.w3.org/WAI/ER/tests/xhtml/testfiles/resources/pdf/dummy.pdf'}], additional_kwargs={}, response_metadata={}, id='363b871a-d9bc-4f40-ab56-5138486e7f07')]
== APP == -------------------------------------------------------------------------------->> invoking LLM...
== APP == Error invoking model: Error code: 400 - {'error': {'message': "Missing required parameter: 'messages[3].content[1].file'.", 'type': 'invalid_request_error', 'param': 'messages[3].content[1].file', 'code': 'missing_required_parameter'}} any thoughts? thanks |
Beta Was this translation helpful? Give feedback.
-
bro i dont understand why multimodality is still not really supported and definitely not well documented. they should've added multimodal handling before LCEL even existed. |
Beta Was this translation helpful? Give feedback.
-
I am having a bit of a misunderstanding. I am following the guidelines of this page to build the following input message:
However, I am getting the following error both with latest langchain-openai and langchain-google-vertexai library versions:
|
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
docs/how_to/multimodal_inputs/
Here we demonstrate how to pass multimodal input directly to models.
https://python.langchain.com/docs/how_to/multimodal_inputs/
Beta Was this translation helpful? Give feedback.
All reactions