Cloudy - Voice AI Assistant

A real-time voice AI assistant built with LiveKit and Gemini, providing ultra-low latency voice interactions with screen sharing capabilities.

🚀 Features

Real-time Voice AI: Natural voice conversations with AI
Screen Sharing: AI can see and guide you through your screen
Ultra-low Latency: WebRTC-based media streaming via LiveKit
Multimodal AI: Gemini Live API for voice, vision, and text processing
Modern UI: React-based interface with real-time controls
Scalable Architecture: Production-ready backend with FastAPI

🏗️ Architecture

Frontend (React) → LiveKit Client → LiveKit Server → Backend (FastAPI) → Gemini API

Key Components

LiveKit: WebRTC media streaming and room management
Gemini Live API: Multimodal AI processing (STT, VAD, VLM, LLM, TTS)
FastAPI: Modern async backend with REST API
React: Frontend with real-time voice controls

📦 Installation

Prerequisites

Node.js 18+
Python 3.12+
LiveKit API credentials
Gemini API key

Quick Setup

Clone the repository:
```
git clone <repository-url>
cd Cloudy
```

Run the setup script:

# Windows
./setup-livekit.bat

# Linux/Mac
./setup-livekit.sh

Configure API keys:
- Get LiveKit API keys from https://cloud.livekit.io
- Get Gemini API key from https://makersuite.google.com/app/apikey
- Update .env and backend/.env files

Start the application:

./start-all.bat  # Windows
./start-all.sh   # Linux/Mac

🔧 Manual Setup

Frontend Dependencies

npm install

Backend Dependencies

cd backend
pip install -e .

Environment Variables

Create .env in project root:

LIVEKIT_URL=ws://localhost:7880
LIVEKIT_API_KEY=your-livekit-api-key
LIVEKIT_API_SECRET=your-livekit-api-secret

Create backend/.env:

LIVEKIT_URL=ws://localhost:7880
LIVEKIT_API_KEY=your-livekit-api-key
LIVEKIT_API_SECRET=your-livekit-api-secret
GEMINI_API_KEY=your-gemini-api-key
SECRET_KEY=your-secret-key-here

🚀 Running the Application

Option 1: All Services

./start-all.bat  # Windows
./start-all.sh   # Linux/Mac

Option 2: Individual Services

# Terminal 1: LiveKit Server
livekit-server --dev

# Terminal 2: Backend
cd backend
uvicorn src.realtime_assistant_service.main:app --reload

# Terminal 3: Frontend
npm run dev

🌐 Access Points

Frontend: http://localhost:5173
Backend API: http://localhost:8000
API Documentation: http://localhost:8000/docs
LiveKit Server: ws://localhost:7880

📚 Documentation

LIVEKIT_INTEGRATION.md - Complete architecture overview
PROJECT_STRUCTURE.md - Clean project structure
CONFIGURATION_GUIDE.md - Detailed setup instructions

🎯 Usage

Open http://localhost:5173 in your browser
Login with any credentials (demo mode)
Navigate to "Voice AI Assistant"
Start audio capture and screen sharing
Begin voice interaction with AI

🔍 Troubleshooting

Common Issues

LiveKit Connection Failed
- Check if LiveKit server is running
- Verify API keys in .env files
- Check network connectivity
Audio Not Working
- Check browser microphone permissions
- Verify audio settings in browser
- Test with browser audio tools
AI Not Responding
- Check Gemini API key
- Verify backend logs
- Test API endpoints

Debug Commands

# Check LiveKit server
curl http://localhost:7880/health

# Check backend
curl http://localhost:8000/livekit/health

# Check frontend
curl http://localhost:5173

🏗️ Development

Project Structure

Cloudy/
├── components/          # React components
├── services/           # Frontend services
├── config/             # Configuration files
├── backend/            # FastAPI backend
├── App.tsx             # Main React app
└── package.json        # Frontend dependencies

Key Files

services/livekitService.ts - LiveKit client service
components/VoiceAgentPage.tsx - Voice AI interface
backend/src/realtime_assistant_service/connectors/livekit_connector.py - LiveKit backend
config/livekit.ts - LiveKit configuration

🤝 Contributing

Fork the repository
Create a feature branch
Make your changes
Add tests if applicable
Submit a pull request

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

🙏 Acknowledgments

LiveKit - Real-time media infrastructure
Google Gemini - Multimodal AI capabilities
FastAPI - Modern web framework
React - Frontend framework

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
.cursor/rules		.cursor/rules
backend		backend
components		components
config		config
services		services
.gitignore		.gitignore
App.tsx		App.tsx
LIVEKIT_INTEGRATION.md		LIVEKIT_INTEGRATION.md
PROJECT_STRUCTURE.md		PROJECT_STRUCTURE.md
README.md		README.md
index.html		index.html
index.tsx		index.tsx
metadata.json		metadata.json
package-lock.json		package-lock.json
package.json		package.json
setup-livekit.bat		setup-livekit.bat
setup-livekit.sh		setup-livekit.sh
tsconfig.json		tsconfig.json
types.ts		types.ts
vite-env.d.ts		vite-env.d.ts
vite.config.ts		vite.config.ts

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Cloudy - Voice AI Assistant

🚀 Features

🏗️ Architecture

Key Components

📦 Installation

Prerequisites

Quick Setup

🔧 Manual Setup

Frontend Dependencies

Backend Dependencies

Environment Variables

🚀 Running the Application

Option 1: All Services

Option 2: Individual Services

🌐 Access Points

📚 Documentation

🎯 Usage

🔍 Troubleshooting

Common Issues

Debug Commands

🏗️ Development

Project Structure

Key Files

🤝 Contributing

📄 License

🙏 Acknowledgments

About

Uh oh!

Releases

Packages

Languages

AliZaiN-157/Cloudy-An-Voice-AI-assistant-for-AWS

Folders and files

Latest commit

History

Repository files navigation

Cloudy - Voice AI Assistant

🚀 Features

🏗️ Architecture

Key Components

📦 Installation

Prerequisites

Quick Setup

🔧 Manual Setup

Frontend Dependencies

Backend Dependencies

Environment Variables

🚀 Running the Application

Option 1: All Services

Option 2: Individual Services

🌐 Access Points

📚 Documentation

🎯 Usage

🔍 Troubleshooting

Common Issues

Debug Commands

🏗️ Development

Project Structure

Key Files

🤝 Contributing

📄 License

🙏 Acknowledgments

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages