Is it a good idea to utilize the multimodal input and output capabilities of BaseChatModel to support text-to-image/audio/video models, such as speech-to-text models, text-to-speech models, Stable Diffusion or Flux models? #31495

zhcn000000 · 2025-06-05T11:29:54Z

zhcn000000
Jun 5, 2025

Checked

I searched existing ideas and did not find a similar one
I added a very descriptive title
I've clearly described the feature request and motivation for it

Feature request

Is it a good idea to utilize the multimodal input and output capabilities of BaseChatModel to support text-to-image/audio/video models, such as speech-to-text models, text-to-speech models, Stable Diffusion or Flux models?
For example, my own repository: langchain-multimedia

Motivation

This enables a unified interface to support text-to-image, text-to-audio, and text-to-video models.

Proposal (If applicable)

No response

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Is it a good idea to utilize the multimodal input and output capabilities of BaseChatModel to support text-to-image/audio/video models, such as speech-to-text models, text-to-speech models, Stable Diffusion or Flux models? #31495

Uh oh!

{{title}}

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

Is it a good idea to utilize the multimodal input and output capabilities of BaseChatModel to support text-to-image/audio/video models, such as speech-to-text models, text-to-speech models, Stable Diffusion or Flux models? #31495

Uh oh!

zhcn000000 Jun 5, 2025

Checked

Feature request

Motivation

Proposal (If applicable)

Replies: 0 comments

zhcn000000
Jun 5, 2025