Calm Guide is the AI persona powering this app. Designed to be friendly, calm, supportive, and concise, Calm Guide provides thoughtful responses and maintains a composed tone in every interaction. Calm Guide is developed by Harshit Sharma and is always here to help you as best as possible.
- π€ Live Voice Input β Real-time browser audio capture (WebAudio API)
- π WebSocket Streaming β Instant, low-latency audio streaming to backend
- π Speech-to-Text β High-accuracy, streaming transcription (AssemblyAI)
- π€ AI Processing β Google Gemini LLM for intelligent, contextual responses
- π Web Search β Built-in DuckDuckGo search for up-to-date answers
- π Text-to-Speech β Natural, streaming voice synthesis (Murf AI)
- π§ Audio Output β Seamless, real-time playback in browser
- π¬ Conversational Memory β Maintains context across turns
- π± Responsive UI β Works on desktop and mobile
- π Session Management β Persistent, isolated conversations
- β‘ True Real-Time β WebSocket pipeline for instant feedback
- π‘οΈ Robust Error Handling β Graceful fallback and health checks
βββββββββββββββ ββββββββββββββββββββββ βββββββββββββββββββ
β Browser β<ββ>β FastAPI Backend β<ββ>β AI Services β
β (WebAudio) β β (WebSocket/REST) β β (STT/LLM/TTS) β
βββββββββββββββ ββββββββββββββββββββββ βββββββββββββββββββ
β β β
βΌ βΌ βΌ
Audio Input β Real-Time Processing β Audio/Text Output
- π€ Capture β User records voice in browser
- οΏ½ Stream β Audio streamed via WebSocket to backend
- π Transcribe β AssemblyAI provides live transcription
- π Web Search β (Optional) AI can trigger web search for up-to-date info
- π€ Respond β Gemini LLM generates contextual reply
- π Synthesize β Murf AI streams natural speech back
- π§ Playback β Audio streamed to browser for instant feedback
- FastAPI β Modern async Python web framework
- Uvicorn β ASGI server
- Python 3.12+
- Pydantic β Data validation
- WebSocket β Real-time streaming
- AssemblyAI β Streaming speech-to-text
- Google Gemini β LLM for conversation and search
- DuckDuckGo β Web search integration
- Murf AI β Streaming text-to-speech
- Vanilla JavaScript β WebAudio API, WebSocket
- Tailwind CSS β Responsive, modern UI
πFastAPI/
βββπmain.py # FastAPI app entry, WebSocket/REST routes
βββπwebsocket_handler.py # WebSocket handler for real-time pipeline
βββπapp/
β βββπapi/
β β βββπhealth.py # Health check endpoints
β β βββπsearch.py # Web search endpoints
β βββπcore/
β β βββπconfig.py # Settings, API key management
β β βββπlogging.py # (Optional) Logging config
β βββπmodels/
β β βββπschemas.py # Pydantic models
β βββπservices/
β βββπstt_service.py # Streaming STT (AssemblyAI)
β βββπllm_service.py # LLM (Gemini) with context & search
β βββπtts_service.py # Streaming TTS (Murf AI)
β βββπhealth_service.py # Health monitoring
βββπstatic/
β βββπscript.js # Main app JavaScript
β βββπstyles.css # Global styles
β βββπsettings.js # API key configuration
βββπtemplates/
β βββπindex.html # Main HTML template
β βββπabout.html # About page template
β βββπsettings.html # Settings page template
βββπrequirements.txt # Python dependencies
- Python 3.12 or higher
- API keys for AssemblyAI, Google Gemini, Murf AI
git clone https://github.com/HsAhRaSrHmIaT/FastAPI-Murf.git
cd FastAPI
pip install -r requirements.txt
Create a .env
file in the root directory:
GOOGLE_API_KEY=your_gemini_api_key_here
MURF_API_KEY=your_murf_api_key_here
WS_MURF_URL=your_murf_websocket_url_here
ASSEMBLYAI_API_KEY=your_assemblyai_api_key_here
# Start with Uvicorn
uvicorn main:app --reload --host 0.0.0.0 --port 8000 or
python main.py
Visit http://localhost:8000 to start your voice conversations!
Endpoint | Description |
---|---|
/ws |
Real-time voice chat (audio/text) |
Endpoint | Method | Description |
---|---|---|
/ |
GET | Main web interface |
/health/ |
GET | System health status |
/api/search/duckduckgo |
GET | Web search (DuckDuckGo) |
/settings |
GET | API key management UI |
/about |
GET | About page |
| /docs
| GET | Interactive API documentation |
- Context Awareness β Maintains conversation history for natural flow
- Web Search β AI can fetch up-to-date info from the web
- Session Isolation β Multiple users, independent conversations
- Streaming STT/TTS β Real-time, low-latency audio pipeline
- High-Quality Recording β WebAudio API, noise suppression
- Multiple Formats β Supports WAV, MP3, WebM, OGG, MP4
- Real-time Feedback β Visual indicators for recording, processing, playback
- Responsive Design β Works on all devices
- Accessibility β Keyboard navigation, screen reader support
- Real-time, bidirectional audio/text streaming
- Handles turn detection, session management
- Streaming transcription (AssemblyAI)
- Real-time, multi-format audio support
- Google Gemini LLM, context memory
- Web search integration
- Streaming TTS (Murf AI)
- Natural, low-latency voice output
- Monitors all external service availability
- Provides health status for UI and API
- Fork the repository
- Create a feature branch:
git checkout -b feature/amazing-feature
- Make your changes with proper testing
- Commit:
git commit -m 'Add amazing feature'
- Push:
git push origin feature/amazing-feature
- Open a Pull Request
Built with modern AI, real-time streaming, and web search for seamless voice interaction.
Production Ready & Actively Maintained π