This project is a real-time AI-powered voice assistant designed to help tourists explore London, UK. It transcribes live speech, generates AI responses using OpenAI's GPT, and provides voice feedback using ElevenLabs.
-
Real-time speech-to-text transcription using AssemblyAI
-
AI-generated responses powered by OpenAI's GPT-4o
-
Text-to-speech voice output using ElevenLabs
-
Interactive and conversational travel guide experience
1οΈβ£ The assistant starts by greeting the user with a voice message.
2οΈβ£ It listens to the user's voice and transcribes it in real-time.
3οΈβ£ The query is sent to GPT-4o for a smart AI-generated response.
4οΈβ£ The assistant then speaks the response using ElevenLabs.
5οΈβ£ The loop continues, making it an interactive conversation! π£οΈ
-
Python π
-
OpenAI GPT-4o π€
-
AssemblyAI (Speech-to-Text) ποΈ
-
ElevenLabs (Text-to-Speech) π
Before running the script, make sure the following dependencies are installed on your system
π΅ MPV (Required for ElevenLabs Audio Streaming)
This is required for ElevenLabs to stream audio.
-
π₯οΈ Windows
-
Download mpv from here
-
Add it to your system PATH.
-
-
π Mac (macOS)
brew install mpv
-
π§ Linux (Ubuntu/Debian)
sudo apt update && sudo apt install mpv
π€ PortAudio & PyAudio (Required for AssemblyAI Transcription)
PortAudio is required to use PyAudio, which AssemblyAI needs for real-time transcription.
-
π₯οΈ Windows
pip install pyaudio
-
π Mac (macOS)
brew install portaudio pip install pyaudio
-
π§ Linux (Ubuntu/Debian)
sudo apt update && sudo apt install portaudio19-dev pip install pyaudio