Is it a good idea to utilize the multimodal input and output capabilities of BaseChatModel to support text-to-image/audio/video models, such as speech-to-text models, text-to-speech models, Stable Diffusion or Flux models? #31495
zhcn000000
announced in
Ideas
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Checked
Feature request
Is it a good idea to utilize the multimodal input and output capabilities of BaseChatModel to support text-to-image/audio/video models, such as speech-to-text models, text-to-speech models, Stable Diffusion or Flux models?
For example, my own repository: langchain-multimedia
Motivation
This enables a unified interface to support text-to-image, text-to-audio, and text-to-video models.
Proposal (If applicable)
No response
Beta Was this translation helpful? Give feedback.
All reactions