AI Voice Assistant Chrome Extension

A Chrome extension with voice command capabilities and AI integration, featuring modular architecture, comprehensive testing, and secure environment variable management.

🚀 Quick Setup

Prerequisites

Node.js (v14 or higher)
Chrome browser
Google AI API key

Environment Variables Setup

Backend Setup:

cd backend/server
cp .env.example .env
# Edit .env and add your Google AI API key

Frontend Setup:

cd frontend
cp .env.example .env
# Edit .env and add your LLM Gateway API key (optional)

Required Environment Variables

Backend (`backend/server/.env`)

GOOGLE_AI_API_KEY - Required - Get from Google AI Studio
PORT - Optional (defaults to 4000)

Frontend (`frontend/.env`)

LLM_GATEWAY_API_KEY - Optional - For LLM gateway integration

Installation & Running

Install Backend Dependencies:
```
cd backend/server
npm install
```
Install Frontend Dependencies:
```
cd frontend
npm install
```
Start Backend Server:
```
cd backend/server
npm start
```
Load Chrome Extension:
- Open Chrome and go to chrome://extensions/
- Enable "Developer mode"
- Click "Load unpacked" and select the frontend directory

🏗️ Architecture Overview

The extension features a modular architecture with clear separation of concerns:

Frontend Structure

frontend/src/
├── config/
│   └── config.js          # Configuration constants and settings
├── services/
│   ├── speechService.js   # Text-to-speech functionality
│   ├── webrtcService.js   # WebRTC and WebSocket communication
│   └── recognitionService.js  # Speech recognition handling
├── ui/
│   ├── micButton.js       # Microphone button component
│   ├── subtitleBar.js     # Subtitle display component
│   └── styles.js          # CSS styling injection
├── utils/
│   └── constants.js       # Global state and shared utilities
└── main.js               # Application entry point

assets/
├── icons/
│   └── einstein.png       # Extension icon
└── images/
    └── bot3d.png          # 3D bot icon for mic button

Backend Structure

backend/
├── client/          # Test client for development
└── server/          # Node.js backend server
    ├── .env.example # Environment variables template
    ├── server.js    # Main server file
    └── screenshots/ # Captured screenshots directory

🚀 Features

Voice Recognition: Continuous speech recognition with automatic retry logic
Screen Sharing: WebRTC-based screen sharing with AI analysis
Text-to-Speech: High-quality voice synthesis with subtitle display
Real-time Communication: WebSocket connection for instant AI responses
Visual Feedback: Animated UI components with speaking waves and ripple effects
Secure Configuration: Environment variable management for API keys

📦 Module Descriptions

Configuration (`config/`)

config.js: Centralized configuration including WebSocket URLs, API keys, speech settings, and UI positioning

Services (`services/`)

speechService.js: Handles text-to-speech synthesis with voice selection and subtitle coordination
webrtcService.js: Manages screen sharing, WebRTC peer connections, and WebSocket communication
recognitionService.js: Speech recognition with retry logic and error handling

UI Components (`ui/`)

micButton.js: Floating microphone button with ripple effects and visual states
subtitleBar.js: Subtitle display component for speech feedback
styles.js: CSS injection for all UI components

Utilities (`utils/`)

constants.js: Global state management and shared constants

🔐 Security

Never commit API keys - They are now stored in .env files which are gitignored
Environment variables - All sensitive data is now handled via environment variables
API Key Management - Get your Google AI API key from the official Google AI Studio
Permission management for microphone and screen access
WebSocket connections should use secure protocols in production

🧪 Testing

Comprehensive unit tests for all modules using Jest.

Running Tests

cd frontend

# Run all tests
npm test

# Run tests in watch mode (for development)
npm run test:watch

# Run tests with coverage report
npm run test:coverage

Test Coverage

The test suite covers:

✅ Configuration: All settings and constants validation
✅ Speech Services: Text-to-speech functionality and voice selection
✅ WebRTC Services: Screen sharing, WebSocket communication, peer connections
✅ Recognition Services: Speech recognition, retry logic, error handling
✅ UI Components: Button states, subtitle display, DOM manipulation
✅ Utilities: Global state management and helper functions

🛠️ Development

Configuration

Update frontend/src/config/config.js to modify:

WebSocket server URLs
Speech recognition settings
Voice synthesis preferences
UI positioning and styling

Module System

Uses ES6 modules with import/export syntax
Configured as "type": "module" in manifest.json
Clear dependency injection and service composition

Error Handling

Comprehensive error handling across all services
Graceful degradation for missing browser features
User-friendly error messages via speech synthesis

🎯 Benefits of Modular Architecture

Maintainability: Each module has a single responsibility
Testability: Services can be tested in isolation
Reusability: Components can be reused across different contexts
Scalability: Easy to add new features without affecting existing code
Debugging: Easier to locate and fix issues

📝 Notes

Ensure your .env files are properly configured before running
The backend server must be running for full functionality
Check the console for any environment variable errors
Grant necessary permissions for microphone and screen access when prompted

🔧 Troubleshooting

"GOOGLE_AI_API_KEY environment variable is required"
- Make sure you've created backend/server/.env with your API key
Extension not working
- Check that the backend server is running on the correct port
- Verify WebSocket URL in frontend config matches your server
API calls failing
- Verify your Google AI API key is valid and has proper permissions
Speech recognition not working
- Ensure microphone permissions are granted
- Check browser speech recognition support
Screen sharing issues
- Grant screen sharing permissions when prompted
- Verify WebRTC peer connection establishment

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
backend		backend
frontend		frontend
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

AI Voice Assistant Chrome Extension

🚀 Quick Setup

Prerequisites

Environment Variables Setup

Required Environment Variables

Backend (`backend/server/.env`)

Frontend (`frontend/.env`)

Installation & Running

🏗️ Architecture Overview

Frontend Structure

Backend Structure

🚀 Features

📦 Module Descriptions

Configuration (`config/`)

Services (`services/`)

UI Components (`ui/`)

Utilities (`utils/`)

🔐 Security

🧪 Testing

Running Tests

Test Coverage

🛠️ Development

Configuration

Module System

Error Handling

🎯 Benefits of Modular Architecture

📝 Notes

🔧 Troubleshooting

About

Uh oh!

Releases

Packages

Languages

rajrawat37/ai-voice-assistant-chrome

Folders and files

Latest commit

History

Repository files navigation

AI Voice Assistant Chrome Extension

🚀 Quick Setup

Prerequisites

Environment Variables Setup

Required Environment Variables

Backend (backend/server/.env)

Frontend (frontend/.env)

Installation & Running

🏗️ Architecture Overview

Frontend Structure

Backend Structure

🚀 Features

📦 Module Descriptions

Configuration (config/)

Services (services/)

UI Components (ui/)

Utilities (utils/)

🔐 Security

🧪 Testing

Running Tests

Test Coverage

🛠️ Development

Configuration

Module System

Error Handling

🎯 Benefits of Modular Architecture

📝 Notes

🔧 Troubleshooting

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Backend (`backend/server/.env`)

Frontend (`frontend/.env`)

Configuration (`config/`)

Services (`services/`)

UI Components (`ui/`)

Utilities (`utils/`)

Packages