OpenAI TTS and Gemini (Speech Generation) availability in Langchain Python #31907
Unanswered
michelhabib
asked this question in
Q&A
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Checked other resources
Commit to Help
Example Code
# its not an implemented feature, that is my question
Description
TTS is now available in major models from OpenAI and Gemini. I couldn't access them using Langchain so far.
in OpenAI, that's for example TTS-1 and gpt-4o-mini-tts - you input text with instructions, and the output is the same text in Audio Format. This is different from other models that take audio/text inputs and respond to them with audio, like gpt-4o-mini-audio-preview.
https://platform.openai.com/docs/guides/audio
In Gemini, that's speech generation, for example models like (gemini-2.5-flash-preview-tts)
https://ai.google.dev/gemini-api/docs/speech-generation
I couldn't find any documentation, or code examples that serve the above models, with the only exception of gemini speech generation availability in google/libs/vertexai/v2.0.26 (3 weeks ago) - issue 949 langchain-ai/langchain-google#949
But it's not in google/libs/genai/v2.1.6 which is the release path that i am using.
My questions:
1- Is it possible to use OpenAI/Gemini TTS with genai/v2.1.6 ? if not, is there a plan? I understand they have been released a while back, so i am just wondering.
2- Is there a langchain way to integrate the native code but still benefiting from langchain/langgraph?
System Info
System Information
Package Information
Optional packages not installed
Other Dependencies
Beta Was this translation helpful? Give feedback.
All reactions