DeepgramCoach is an interactive AI-powered language learning application that provides real-time voice-based language practice. Using Deepgram's advanced Voice Agent API, this application offers personalized conversation practice and pronunciation coaching across multiple languages.
The purpose of this demo is to showcase Deepgram's voice agent working in different languages seamlessly and quickly. Using Deepgram's powerful languages, STT and TTS in different languages can be done in real time. Currently the Spanish TTS is still a demo, but it will be available soon!
DeepgramCoach is your personal AI language tutor that helps you:
- Practice Conversations: Engage in natural, flowing conversations in your target language
- Improve Pronunciation: Get real-time feedback on your pronunciation and speaking clarity
- Learn Multiple Languages: Support for 13+ languages including English, Spanish, French, German, Japanese, Chinese, and more
- Adaptive Learning: AI adapts to your proficiency level and learning pace
- Immersive Experience: Voice-first learning that prioritizes speaking practice
π― Two Learning Modes:
- Conversation Practice: Focus on fluency, vocabulary, and natural communication
- Pronunciation Practice: Detailed feedback on accent, clarity, and pronunciation accuracy
π Multi-Language Support:
- English, Spanish, French, German, Italian, Portuguese
- Japanese, Korean, Chinese (Mandarin), Russian
- Dutch, Hindi, Arabic
- Automatic language detection and voice adaptation
π€ AI-Powered Coaching:
- GPT-4o-mini powered conversation engine
- Deepgram's Nova-3 speech recognition
- Aura-2 text-to-speech with native speaker voices
- Contextual grammar correction and vocabulary expansion
π» Modern Web Interface:
- Real-time voice interaction
- Visual audio feedback
- Conversation history
- Responsive design for desktop and mobile
- Backend: Flask with Flask-SocketIO for real-time communication
- AI Services: Deepgram Voice Agent API (Speech-to-Text, Text-to-Speech, AI Conversation)
- Frontend: Vanilla JavaScript with Web Audio API
- Styling: Modern CSS with responsive design
- Testing: pytest for comprehensive test coverage
Before installing DeepgramCoach, ensure you have:
- Python 3.8 or higher installed
- Modern web browser with microphone support (Chrome, Firefox, Safari, Edge)
- Deepgram API key (sign up at deepgram.com for free)
- Active internet connection for AI services
- Operating System: Windows 10+, macOS 10.14+, or Linux (Ubuntu 18.04+)
- RAM: Minimum 4GB, Recommended 8GB+
- Audio: Working microphone and speakers/headphones
- Browser: Chrome 88+, Firefox 85+, Safari 14+, Edge 88+
git clone https://github.com/deepgram-starters/flask-voice-agent.git
cd flask-voice-agent
Create a virtual environment to isolate dependencies:
# Create virtual environment
python -m venv venv
# Activate virtual environment
# On Windows:
venv\Scripts\activate
# On macOS/Linux:
source venv/bin/activate
pip install -r requirements.txt
Required Dependencies:
flask==3.0.2
- Web frameworkdeepgram-sdk==4.0.0
- Deepgram Voice Agent APIpython-dotenv==1.0.1
- Environment variable managementPyAudio==0.2.14
- Audio processing (may require system audio libraries)flask-cors==4.0.0
- Cross-origin resource sharingflask-socketio==5.3.6
- Real-time WebSocket communicationpytest==8.0.0
- Testing framework
- Create a
.env
file in the project root:
touch .env
- Add your Deepgram API key to the
.env
file:
DEEPGRAM_API_KEY=your_deepgram_api_key_here
Getting a Deepgram API Key:
- Visit console.deepgram.com
- Sign up for a free account
- Navigate to API Keys section
- Create a new API key
- Copy the key to your
.env
file
PyAudio may require additional system dependencies:
On macOS:
brew install portaudio
pip install pyaudio
On Ubuntu/Debian:
sudo apt-get install portaudio19-dev python3-pyaudio
pip install pyaudio
On Windows: PyAudio should install automatically with pip. If you encounter issues:
pip install pipwin
pipwin install pyaudio
python app.py
The application will start on http://localhost:3000
- Open your browser and navigate to
http://localhost:3000
- Allow microphone access when prompted by your browser
- Select your target language using the language buttons or by speaking (e.g., "I want to practice Spanish")
- Choose your learning mode:
- "I want to practice conversation" for fluency training
- "Let's work on pronunciation" for pronunciation coaching
- Click "Start Speaking" to begin your language learning session
- Start talking! The AI will respond in real-time and adapt to your level
- "Hello, I'm learning [language]. Can we have a conversation about daily activities?"
- "I'd like to practice ordering food in a restaurant."
- "Can you help me with pronunciation of difficult words?"
- "Let's talk about travel and culture."
We welcome contributions! Please see our Contributing Guidelines for details on:
- Code style and standards
- Pull request process
- Issue reporting
- Feature requests
This project is licensed under the MIT License. See the LICENSE file for details.
- Issues: GitHub Issues
- Discussions: GitHub Discussions
- Discord: Deepgram Community
- Documentation: Deepgram Docs
Deepgram is the AI speech platform providing real-time speech-to-text, text-to-speech, and voice AI solutions. Over 200,000+ developers use Deepgram to build voice AI products and features.
Why Deepgram?
- Industry-leading accuracy and speed
- Support for 30+ languages
- Real-time and batch processing
- Easy-to-use APIs
- Comprehensive documentation and support
Ready to master a new language? Start your journey with DeepgramCoach today! π―π£οΈ