This project enables real-time speech-to-text transcription using AssemblyAI, generates AI responses with DeepSeek R1 (7B model) via Ollama, and converts text responses into speech using ElevenLabs. The entire process happens in real-time, allowing for seamless interaction.
- Real-time speech-to-text using AssemblyAI
- AI-powered responses with DeepSeek R1 (7B model) via Ollama
- Instant text-to-speech conversion with ElevenLabs
- Live audio streaming for an interactive experience
- AssemblyAI (for speech-to-text): Sign up for a free API key
- ElevenLabs (for text-to-speech): Sign up for an account
DeepSeek R1 is accessed via Ollama. Install Ollama from:
🔗 Download Ollama
-
Debian/Ubuntu:
apt install portaudio19-dev
MacOS:
brew install portaudio
####✅ Install Python Libraries
Before running the script, install the required dependencies:
pip install "assemblyai[extras]"
pip install ollama
pip install elevenlabs
✅ (MacOS Only) Install MPV for Audio Streaming
brew install mpv
Since this script uses DeepSeek R1 via Ollama, download the model locally by running:
ollama pull deepseek-r1:7b
Once all dependencies are installed and the model is downloaded, simply run:
python AIVoiceAgent.py