A demo project for experimenting with the Azure OpenAI GPT-4o Mini TTS (Text-to-Speech) API. Includes a Gradio-powered soundboard UI and sample scripts for generating speech from text using a variety of voices and vibes.
- Interactive Gradio soundboard to try different voices and vibes
- Sample scripts for streaming and saving TTS audio
- Easily configurable for your Azure OpenAI resource
- Python 3.8+
- An Azure OpenAI resource created in
eastus2
with GPT-4o Mini TTS model deployed as a Global Standard deployment type
Quickly deploy the GPT-4o Mini TTS model:
- Sign in to the Azure AI Foundry portal.
- Navigate to Catalog, filter by Azure OpenAI, and select
gpt-4o-mini-tts
. - Click Deploy, choose or create your resource, and enter a deployment name.
Click Deploy again to finalize.
For detailed instructions, see the full deployment guide.
- Clone the repository:
git clone https://github.com/Azure-Samples/azure-openai-tts-demo.git cd azure-openai-tts-demo
- Create and activate a virtual environment:
python -m venv .venv source .venv/bin/activate
- Install dependencies:
pip install -r requirements.txt
- Copy
.env.example
to.env
and fill in your Azure OpenAI endpoint and API key:Example:cp .env.example .env # Edit .env with your values
AZURE_OPENAI_ENDPOINT="https://<your-resource-name>.openai.azure.com/" AZURE_OPENAI_API_KEY="your-azure-openai-api-key" AZURE_OPENAI_API_VERSION="2025-03-01-preview"
To launch the interactive soundboard:
python soundboard.py
- Select a voice and vibe, then click Play to generate and listen to speech.
streaming-tts-to-file-sample.py
: Streams TTS audio to a file.async-streaming-tts-sample.py
: Streams and plays TTS audio asynchronously.
Run a sample script with:
python streaming-tts-to-file-sample.py
When using this soundboard or any output generated by it, you must comply with the Microsoft Enterprise AI Services Code of Conduct, including but not limited to:
- Disclosure: Clearly disclose when audio is AI-generated. Do not mislead others into believing the synthetic voice is a real person or attributable to a specific individual without their consent.
- Prohibited Uses: Do not use this tool or its output to:
- Deceive, impersonate, or misinform others.
- Generate or distribute harmful, illegal, or abusive content (including hate speech, violence, harassment, or sexually explicit material).
- Attempt to infer or simulate sensitive personal attributes or emotional states.
- Create chatbots for erotic, romantic, or impersonation purposes.
- Violate any applicable law or regulation.
- Human Oversight: Ensure appropriate human oversight and do not use the tool for consequential decisions affecting legal, financial, or human rights.
- Content Rights: You are responsible for ensuring you have the rights to any content you input and for the responsible use of all output.
- Feedback and Abuse: Provide a way for users to report abuse or issues with generated content.
For the full list of requirements and restrictions, see the Microsoft Enterprise AI Services Code of Conduct.
MIT License. See LICENSE.md for details.
Disclaimer:
This project is for educational and personal use only. It is not affiliated with, endorsed by, or officially supported by OpenAI or Microsoft. This project was inspired by OpenAI's openai.fm interactive site, which is an interactive demo for developers to try the new text-to-speech model in the OpenAI API. This demo sample's sole purpose was lovingly inspired by openai.fm's interactive demo to help developers understand the API and how to use these models within the context of Azure OpenAI Service.