Skip to content

🧩 A Salesforce-native Chrome extension integrating πŸ—£οΈ Speech Recognition, πŸ“Ή WebRTC, and 🌐 WebSocket protocols to stream πŸ‘€ screen/audio context and fetch πŸ€– AI responses using Gemini LLMs.

Notifications You must be signed in to change notification settings

rajrawat37/ai-voice-assistant-chrome

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

1 Commit
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

AI Voice Assistant Chrome Extension

A Chrome extension with voice command capabilities and AI integration, featuring modular architecture, comprehensive testing, and secure environment variable management.

πŸš€ Quick Setup

Prerequisites

  • Node.js (v14 or higher)
  • Chrome browser
  • Google AI API key

Environment Variables Setup

  1. Backend Setup:

    cd backend/server
    cp .env.example .env
    # Edit .env and add your Google AI API key
  2. Frontend Setup:

    cd frontend
    cp .env.example .env
    # Edit .env and add your LLM Gateway API key (optional)

Required Environment Variables

Backend (backend/server/.env)

  • GOOGLE_AI_API_KEY - Required - Get from Google AI Studio
  • PORT - Optional (defaults to 4000)

Frontend (frontend/.env)

  • LLM_GATEWAY_API_KEY - Optional - For LLM gateway integration

Installation & Running

  1. Install Backend Dependencies:

    cd backend/server
    npm install
  2. Install Frontend Dependencies:

    cd frontend
    npm install
  3. Start Backend Server:

    cd backend/server
    npm start
  4. Load Chrome Extension:

    • Open Chrome and go to chrome://extensions/
    • Enable "Developer mode"
    • Click "Load unpacked" and select the frontend directory

πŸ—οΈ Architecture Overview

The extension features a modular architecture with clear separation of concerns:

Frontend Structure

frontend/src/
β”œβ”€β”€ config/
β”‚   └── config.js          # Configuration constants and settings
β”œβ”€β”€ services/
β”‚   β”œβ”€β”€ speechService.js   # Text-to-speech functionality
β”‚   β”œβ”€β”€ webrtcService.js   # WebRTC and WebSocket communication
β”‚   └── recognitionService.js  # Speech recognition handling
β”œβ”€β”€ ui/
β”‚   β”œβ”€β”€ micButton.js       # Microphone button component
β”‚   β”œβ”€β”€ subtitleBar.js     # Subtitle display component
β”‚   └── styles.js          # CSS styling injection
β”œβ”€β”€ utils/
β”‚   └── constants.js       # Global state and shared utilities
└── main.js               # Application entry point

assets/
β”œβ”€β”€ icons/
β”‚   └── einstein.png       # Extension icon
└── images/
    └── bot3d.png          # 3D bot icon for mic button

Backend Structure

backend/
β”œβ”€β”€ client/          # Test client for development
└── server/          # Node.js backend server
    β”œβ”€β”€ .env.example # Environment variables template
    β”œβ”€β”€ server.js    # Main server file
    └── screenshots/ # Captured screenshots directory

πŸš€ Features

  • Voice Recognition: Continuous speech recognition with automatic retry logic
  • Screen Sharing: WebRTC-based screen sharing with AI analysis
  • Text-to-Speech: High-quality voice synthesis with subtitle display
  • Real-time Communication: WebSocket connection for instant AI responses
  • Visual Feedback: Animated UI components with speaking waves and ripple effects
  • Secure Configuration: Environment variable management for API keys

πŸ“¦ Module Descriptions

Configuration (config/)

  • config.js: Centralized configuration including WebSocket URLs, API keys, speech settings, and UI positioning

Services (services/)

  • speechService.js: Handles text-to-speech synthesis with voice selection and subtitle coordination
  • webrtcService.js: Manages screen sharing, WebRTC peer connections, and WebSocket communication
  • recognitionService.js: Speech recognition with retry logic and error handling

UI Components (ui/)

  • micButton.js: Floating microphone button with ripple effects and visual states
  • subtitleBar.js: Subtitle display component for speech feedback
  • styles.js: CSS injection for all UI components

Utilities (utils/)

  • constants.js: Global state management and shared constants

πŸ” Security

  • Never commit API keys - They are now stored in .env files which are gitignored
  • Environment variables - All sensitive data is now handled via environment variables
  • API Key Management - Get your Google AI API key from the official Google AI Studio
  • Permission management for microphone and screen access
  • WebSocket connections should use secure protocols in production

πŸ§ͺ Testing

Comprehensive unit tests for all modules using Jest.

Running Tests

cd frontend

# Run all tests
npm test

# Run tests in watch mode (for development)
npm run test:watch

# Run tests with coverage report
npm run test:coverage

Test Coverage

The test suite covers:

  • βœ… Configuration: All settings and constants validation
  • βœ… Speech Services: Text-to-speech functionality and voice selection
  • βœ… WebRTC Services: Screen sharing, WebSocket communication, peer connections
  • βœ… Recognition Services: Speech recognition, retry logic, error handling
  • βœ… UI Components: Button states, subtitle display, DOM manipulation
  • βœ… Utilities: Global state management and helper functions

πŸ› οΈ Development

Configuration

Update frontend/src/config/config.js to modify:

  • WebSocket server URLs
  • Speech recognition settings
  • Voice synthesis preferences
  • UI positioning and styling

Module System

  • Uses ES6 modules with import/export syntax
  • Configured as "type": "module" in manifest.json
  • Clear dependency injection and service composition

Error Handling

  • Comprehensive error handling across all services
  • Graceful degradation for missing browser features
  • User-friendly error messages via speech synthesis

🎯 Benefits of Modular Architecture

  1. Maintainability: Each module has a single responsibility
  2. Testability: Services can be tested in isolation
  3. Reusability: Components can be reused across different contexts
  4. Scalability: Easy to add new features without affecting existing code
  5. Debugging: Easier to locate and fix issues

πŸ“ Notes

  • Ensure your .env files are properly configured before running
  • The backend server must be running for full functionality
  • Check the console for any environment variable errors
  • Grant necessary permissions for microphone and screen access when prompted

πŸ”§ Troubleshooting

  1. "GOOGLE_AI_API_KEY environment variable is required"

    • Make sure you've created backend/server/.env with your API key
  2. Extension not working

    • Check that the backend server is running on the correct port
    • Verify WebSocket URL in frontend config matches your server
  3. API calls failing

    • Verify your Google AI API key is valid and has proper permissions
  4. Speech recognition not working

    • Ensure microphone permissions are granted
    • Check browser speech recognition support
  5. Screen sharing issues

    • Grant screen sharing permissions when prompted
    • Verify WebRTC peer connection establishment

About

🧩 A Salesforce-native Chrome extension integrating πŸ—£οΈ Speech Recognition, πŸ“Ή WebRTC, and 🌐 WebSocket protocols to stream πŸ‘€ screen/audio context and fetch πŸ€– AI responses using Gemini LLMs.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published