Skip to content

HsAhRaSrHmIaT/Calm-Guide

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

47 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Meet Calm Guide: Your AI Assistant

Calm Guide is the AI persona powering this app. Designed to be friendly, calm, supportive, and concise, Calm Guide provides thoughtful responses and maintains a composed tone in every interaction. Calm Guide is developed by Harshit Sharma and is always here to help you as best as possible.

πŸ“Έ Screenshots

Main Interface Main Interface

AI Conversation AI Conversation AI Conversation

About About

Settings Settings

Reset Reset

Disconnect Disconnect

🌟 Core Features

Real-Time Speech-to-Speech Pipeline (WebSocket)

  • 🎀 Live Voice Input β€” Real-time browser audio capture (WebAudio API)
  • πŸ”— WebSocket Streaming β€” Instant, low-latency audio streaming to backend
  • πŸ“ Speech-to-Text β€” High-accuracy, streaming transcription (AssemblyAI)
  • πŸ€– AI Processing β€” Google Gemini LLM for intelligent, contextual responses
  • 🌐 Web Search β€” Built-in DuckDuckGo search for up-to-date answers
  • πŸ”Š Text-to-Speech β€” Natural, streaming voice synthesis (Murf AI)
  • 🎧 Audio Output β€” Seamless, real-time playback in browser

Advanced Capabilities

  • πŸ’¬ Conversational Memory β€” Maintains context across turns
  • πŸ“± Responsive UI β€” Works on desktop and mobile
  • πŸ”„ Session Management β€” Persistent, isolated conversations
  • ⚑ True Real-Time β€” WebSocket pipeline for instant feedback
  • πŸ›‘οΈ Robust Error Handling β€” Graceful fallback and health checks

πŸ—οΈ Modern Architecture

Real-Time Streaming Pipeline

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚   Browser   β”‚<──>β”‚   FastAPI Backend  β”‚<──>β”‚   AI Services   β”‚
β”‚  (WebAudio) β”‚    β”‚ (WebSocket/REST)   β”‚    β”‚ (STT/LLM/TTS)   β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
       β”‚                     β”‚                        β”‚
       β–Ό                     β–Ό                        β–Ό
   Audio Input  ⇄  Real-Time Processing  ⇄  Audio/Text Output

Workflow

  1. 🎀 Capture β€” User records voice in browser
  2. οΏ½ Stream β€” Audio streamed via WebSocket to backend
  3. πŸ“ Transcribe β€” AssemblyAI provides live transcription
  4. 🌐 Web Search β€” (Optional) AI can trigger web search for up-to-date info
  5. πŸ€– Respond β€” Gemini LLM generates contextual reply
  6. πŸ”Š Synthesize β€” Murf AI streams natural speech back
  7. 🎧 Playback β€” Audio streamed to browser for instant feedback

πŸ› οΈ Technology Stack

Backend

  • FastAPI β€” Modern async Python web framework
  • Uvicorn β€” ASGI server
  • Python 3.12+
  • Pydantic β€” Data validation
  • WebSocket β€” Real-time streaming

AI & Audio Services

  • AssemblyAI β€” Streaming speech-to-text
  • Google Gemini β€” LLM for conversation and search
  • DuckDuckGo β€” Web search integration
  • Murf AI β€” Streaming text-to-speech

Frontend

  • Vanilla JavaScript β€” WebAudio API, WebSocket
  • Tailwind CSS β€” Responsive, modern UI

πŸ“ Project Structure

πŸ“FastAPI/
β”œβ”€β”€πŸ“„main.py                       # FastAPI app entry, WebSocket/REST routes
β”œβ”€β”€πŸ“„websocket_handler.py          # WebSocket handler for real-time pipeline
β”œβ”€β”€πŸ“app/
β”‚   β”œβ”€β”€πŸ“api/
β”‚   β”‚   β”œβ”€β”€πŸ“„health.py             # Health check endpoints
β”‚   β”‚   β””β”€β”€πŸ“„search.py             # Web search endpoints
β”‚   β”œβ”€β”€πŸ“core/
β”‚   β”‚   β”œβ”€β”€πŸ“„config.py             # Settings, API key management
β”‚   β”‚   β””β”€β”€πŸ“„logging.py            # (Optional) Logging config
β”‚   β”œβ”€β”€πŸ“models/
β”‚   β”‚   β””β”€β”€πŸ“„schemas.py            # Pydantic models
β”‚   β””β”€β”€πŸ“services/
β”‚       β”œβ”€β”€πŸ“„stt_service.py        # Streaming STT (AssemblyAI)
β”‚       β”œβ”€β”€πŸ“„llm_service.py        # LLM (Gemini) with context & search
β”‚       β”œβ”€β”€πŸ“„tts_service.py        # Streaming TTS (Murf AI)
β”‚       β””β”€β”€πŸ“„health_service.py     # Health monitoring
β”œβ”€β”€πŸ“static/
β”‚   β”œβ”€β”€πŸ“„script.js                 # Main app JavaScript
β”‚   β”œβ”€β”€πŸ“„styles.css                # Global styles
β”‚   β””β”€β”€πŸ“„settings.js               # API key configuration
β”œβ”€β”€πŸ“templates/
β”‚   β”œβ”€β”€πŸ“„index.html                # Main HTML template
β”‚   β”œβ”€β”€πŸ“„about.html                # About page template
β”‚   β””β”€β”€πŸ“„settings.html             # Settings page template
β””β”€β”€πŸ“„requirements.txt              # Python dependencies

πŸš€ Quick Start

Prerequisites

  • Python 3.12 or higher
  • API keys for AssemblyAI, Google Gemini, Murf AI

Installation

git clone https://github.com/HsAhRaSrHmIaT/FastAPI-Murf.git
cd FastAPI
pip install -r requirements.txt

Environment Setup

Create a .env file in the root directory:

GOOGLE_API_KEY=your_gemini_api_key_here
MURF_API_KEY=your_murf_api_key_here
WS_MURF_URL=your_murf_websocket_url_here
ASSEMBLYAI_API_KEY=your_assemblyai_api_key_here

Run the Application

# Start with Uvicorn
uvicorn main:app --reload --host 0.0.0.0 --port 8000 or
python main.py

Visit http://localhost:8000 to start your voice conversations!

πŸ”§ API & WebSocket Endpoints

WebSocket

Endpoint Description
/ws Real-time voice chat (audio/text)

REST API

Endpoint Method Description
/ GET Main web interface
/health/ GET System health status
/api/search/duckduckgo GET Web search (DuckDuckGo)
/settings GET API key management UI
/about GET About page

Docs

| /docs | GET | Interactive API documentation |

🎯 Feature Highlights

Conversation Intelligence

  • Context Awareness β€” Maintains conversation history for natural flow
  • Web Search β€” AI can fetch up-to-date info from the web
  • Session Isolation β€” Multiple users, independent conversations

Audio Processing

  • Streaming STT/TTS β€” Real-time, low-latency audio pipeline
  • High-Quality Recording β€” WebAudio API, noise suppression
  • Multiple Formats β€” Supports WAV, MP3, WebM, OGG, MP4

User Experience

  • Real-time Feedback β€” Visual indicators for recording, processing, playback
  • Responsive Design β€” Works on all devices
  • Accessibility β€” Keyboard navigation, screen reader support

🧩 Service Overview

WebSocket Handler (websocket_handler.py)

  • Real-time, bidirectional audio/text streaming
  • Handles turn detection, session management

STT Service (stt_service.py)

  • Streaming transcription (AssemblyAI)
  • Real-time, multi-format audio support

LLM Service (llm_service.py)

  • Google Gemini LLM, context memory
  • Web search integration

TTS Service (tts_service.py)

  • Streaming TTS (Murf AI)
  • Natural, low-latency voice output

Health Service (health_service.py)

  • Monitors all external service availability
  • Provides health status for UI and API

🀝 Contributing

  1. Fork the repository
  2. Create a feature branch: git checkout -b feature/amazing-feature
  3. Make your changes with proper testing
  4. Commit: git commit -m 'Add amazing feature'
  5. Push: git push origin feature/amazing-feature
  6. Open a Pull Request

Built with modern AI, real-time streaming, and web search for seamless voice interaction.

Production Ready & Actively Maintained πŸš€

About

A voice agent that has the functionality to search the web

Resources

Stars

Watchers

Forks

Releases

No releases published