Start building interactive voice experiences with Deepgram's Voice Agent API using Python Flask starter application. This project demonstrates how to create a voice agent that can engage in natural conversations using Deepgram's advanced AI capabilities.
Deepgram's voice AI platform provides APIs for speech-to-text, text-to-speech, and full speech-to-speech voice agents. Over 200,000+ developers use Deepgram to build voice AI products and features.
Before you start, it's essential to generate a Deepgram API key to use in this project. Sign-up now for Deepgram and create an API key.
Before you start, you'll need:
- Node.js (version 18 or higher)
- npm or yarn package manager
- A Deepgram API key (Sign-up now for Deepgram)
Follow these steps to get started with this starter application.
Go to GitHub and clone the repository.
Install the project dependencies:
npm install
Create a .env
file by copying the contents from sample.env
:
cp sample.env .env
Then edit the .env
file and replace the placeholder with your actual Deepgram API key:
DEEPGRAM_API_KEY=your_deepgram_api_key_here
You can get your API key from the Deepgram Console.
There are two ways to run this starter application:
Development Mode:
npm run dev
Web Server Mode:
npm run start
Once running, you can access the application in your browser at http://localhost:3000
(development) or the port specified for server mode.
- Allow microphone access when prompted.
- Speak into your microphone to interact with the Deepgram Voice Agent.
- You should hear the agent's responses played back in your browser.
This application fully supports Firefox as well as Chrome and Safari. Firefox requires special handling due to its unique Web Audio API implementation:
- Sample Rate: Firefox uses a default 48kHz sample rate and ignores
getUserMedia
sample rate constraints, while Chrome/Safari use 24kHz as requested - Technical Solution: Audio from Firefox is automatically downsampled from 48kHz to 24kHz before transmission to ensure consistent voice recognition across all browsers
- User Experience: No differences in functionality - all browsers provide the same real-time voice interaction capabilities. But you'll likley see better performance in Chrome and Safari.
If you see "48kHz" in the browser logs when using Firefox, this is expected behavior.
- Clone or Fork this repo.
- Modify the
app-requirements.mdc
- Add the necessary configuration settings in the file.
- You can refer to the MDC file used to help build this starter application by reviewing app-requirements.mdc
npm test
We love to hear from you so if you have questions, comments or find a bug in the project, let us know! You can either:
- Open an issue in this repository
- Join the Deepgram Github Discussions Community
- Join the Deepgram Discord Community
We welcome contributions! Please see our Contributing Guidelines for details on how to get started.
For security concerns and vulnerability reporting, please refer to our Security Policy.
This project adheres to the Deepgram Code of Conduct. By participating, you are expected to uphold this code.
This project is licensed under the MIT license. See the LICENSE file for more info.