Welcome to my journey through the Murf AI Voice Agent 30-Day Challenge! I'm building a smart and interactive voice agent using Murf AI's powerful TTS capabilities and integrating it with real-time tech like AssemblyAI, FastAPI, LLM APIs, and WebSockets.
This project is a complete voice agent system that enables natural voice conversations with AI. The system features:
- Real-time audio streaming via WebSockets
- Speech-to-text transcription using AssemblyAI
- AI-powered responses via Google Gemini API
- Text-to-speech conversion using Murf AI
- Session-based chat memory for contextual conversations
- Web search capabilities using Tavily API
- Streaming responses for natural conversation flow
Diagram showing the complete voice agent workflow from user input to AI response
Voice-Agent/
├── Agent/
│ ├── Routes/
│ │ └── transcriber.py
│ ├── Services/
│ │ ├── Badmosh.py
│ │ └── Gemini_service.py
│ ├── utils/
│ │ └── logging.py
│ ├── index.html
│ ├── main.py
│ ├── script.js
│ └── style.css
├── .env
├── .gitignore
├── README.md
├── requirements.txt
└── flow.png
- Set up FastAPI backend with Murf AI TTS integration
- Built basic UI with text input and audio playback
- Polished UX for natural voice interactions
- Added microphone recording with MediaRecorder API
- Implemented audio upload to server
- Integrated AssemblyAI for speech-to-text
- Created voice-to-voice echo pipeline
- Integrated Google Gemini API for AI responses
- Built audio-to-AI conversation pipeline
- Added session-based chat memory
- Implemented robust error handling
- Revamped UI for better user experience
- Organized codebase with proper folder structure
- Created helper functions and services
- Implemented WebSocket endpoint for real-time communication
- Built client-side audio streaming to server via WebSockets
- Integrated AssemblyAI's Streaming API for real-time transcription
- Implemented turn detection for natural conversation flow
- Added Tavily web search integration
- Implemented streaming responses for natural conversation
- Enhanced error handling and user feedback
- Optimized performance and reliability
- Added comprehensive documentation
- Python 3.8+
- API keys for:
- Murf AI
- AssemblyAI
- Google Gemini
- Tavily (optional)
git clone https://github.com/Vishalpandey1799/Murf-AI-Voice-Agent.git
cd Murf-AI-Voice-Agent
# Windows
python -m venv .venv
.venv\Scripts\activate
# Mac/Linux
python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
Create a .env
file in the root directory:
MURF_API_KEY=your_murf_api_key_here
ASSEMBLY_API_KEY=your_assemblyai_api_key_here
GEMINI_API_KEY=your_gemini_api_key_here
TAVILY_API_KEY=your_tavily_api_key_here (optional)
uvicorn main:app --reload
Open your browser and navigate to http://localhost:8000
- Start a Conversation: Click the microphone button to start speaking
- Real-time Transcription: Watch as your speech is transcribed in real-time
- AI Processing: The system processes your query using Gemini AI
- Voice Response: Listen to the AI's response generated with Murf AI
- Continuous Conversation: The system maintains context throughout the conversation
- Web Search: Enable web search for queries requiring current information
- Session Management: Conversations are maintained with session-based memory
- Streaming Responses: Responses are streamed for natural conversation flow
POST /agent/chat/{session_id}
- Session-based chatGET /ws
- WebSocket for real-time streaming
- Backend: FastAPI, Python
- Frontend: HTML, CSS, JavaScript
- APIs: Murf AI, AssemblyAI, Google Gemini, Tavily
- Real-time: WebSockets, MediaRecorder API
- Audio Processing: Wave, pydub
Special thanks to:
- Murf AI for organizing this challenge and providing excellent TTS capabilities
- AssemblyAI for accurate speech-to-text transcription
- Google Gemini for powerful AI conversation capabilities
- Tavily for web search functionality
Follow my progress on LinkedIn with the hashtag #30DayVoiceAgent
Let's build the future of voice interfaces together! 🚀
This project was developed as part of the Murf AI Voice Agent 30-Day Challenge