A real-time voice AI assistant built with LiveKit and Gemini, providing ultra-low latency voice interactions with screen sharing capabilities.
- Real-time Voice AI: Natural voice conversations with AI
- Screen Sharing: AI can see and guide you through your screen
- Ultra-low Latency: WebRTC-based media streaming via LiveKit
- Multimodal AI: Gemini Live API for voice, vision, and text processing
- Modern UI: React-based interface with real-time controls
- Scalable Architecture: Production-ready backend with FastAPI
Frontend (React) β LiveKit Client β LiveKit Server β Backend (FastAPI) β Gemini API
- LiveKit: WebRTC media streaming and room management
- Gemini Live API: Multimodal AI processing (STT, VAD, VLM, LLM, TTS)
- FastAPI: Modern async backend with REST API
- React: Frontend with real-time voice controls
- Node.js 18+
- Python 3.12+
- LiveKit API credentials
- Gemini API key
-
Clone the repository:
git clone <repository-url> cd Cloudy
-
Run the setup script:
# Windows ./setup-livekit.bat # Linux/Mac ./setup-livekit.sh
-
Configure API keys:
- Get LiveKit API keys from https://cloud.livekit.io
- Get Gemini API key from https://makersuite.google.com/app/apikey
- Update
.envandbackend/.envfiles
-
Start the application:
./start-all.bat # Windows ./start-all.sh # Linux/Mac
npm installcd backend
pip install -e .Create .env in project root:
LIVEKIT_URL=ws://localhost:7880
LIVEKIT_API_KEY=your-livekit-api-key
LIVEKIT_API_SECRET=your-livekit-api-secretCreate backend/.env:
LIVEKIT_URL=ws://localhost:7880
LIVEKIT_API_KEY=your-livekit-api-key
LIVEKIT_API_SECRET=your-livekit-api-secret
GEMINI_API_KEY=your-gemini-api-key
SECRET_KEY=your-secret-key-here./start-all.bat # Windows
./start-all.sh # Linux/Mac# Terminal 1: LiveKit Server
livekit-server --dev
# Terminal 2: Backend
cd backend
uvicorn src.realtime_assistant_service.main:app --reload
# Terminal 3: Frontend
npm run dev- Frontend: http://localhost:5173
- Backend API: http://localhost:8000
- API Documentation: http://localhost:8000/docs
- LiveKit Server: ws://localhost:7880
- LIVEKIT_INTEGRATION.md - Complete architecture overview
- PROJECT_STRUCTURE.md - Clean project structure
- CONFIGURATION_GUIDE.md - Detailed setup instructions
- Open http://localhost:5173 in your browser
- Login with any credentials (demo mode)
- Navigate to "Voice AI Assistant"
- Start audio capture and screen sharing
- Begin voice interaction with AI
-
LiveKit Connection Failed
- Check if LiveKit server is running
- Verify API keys in .env files
- Check network connectivity
-
Audio Not Working
- Check browser microphone permissions
- Verify audio settings in browser
- Test with browser audio tools
-
AI Not Responding
- Check Gemini API key
- Verify backend logs
- Test API endpoints
# Check LiveKit server
curl http://localhost:7880/health
# Check backend
curl http://localhost:8000/livekit/health
# Check frontend
curl http://localhost:5173Cloudy/
βββ components/ # React components
βββ services/ # Frontend services
βββ config/ # Configuration files
βββ backend/ # FastAPI backend
βββ App.tsx # Main React app
βββ package.json # Frontend dependencies
services/livekitService.ts- LiveKit client servicecomponents/VoiceAgentPage.tsx- Voice AI interfacebackend/src/realtime_assistant_service/connectors/livekit_connector.py- LiveKit backendconfig/livekit.ts- LiveKit configuration
- Fork the repository
- Create a feature branch
- Make your changes
- Add tests if applicable
- Submit a pull request
This project is licensed under the MIT License - see the LICENSE file for details.
- LiveKit - Real-time media infrastructure
- Google Gemini - Multimodal AI capabilities
- FastAPI - Modern web framework
- React - Frontend framework