A Chrome extension with voice command capabilities and AI integration, featuring modular architecture, comprehensive testing, and secure environment variable management.
- Node.js (v14 or higher)
- Chrome browser
- Google AI API key
-
Backend Setup:
cd backend/server cp .env.example .env # Edit .env and add your Google AI API key
-
Frontend Setup:
cd frontend cp .env.example .env # Edit .env and add your LLM Gateway API key (optional)
GOOGLE_AI_API_KEY
- Required - Get from Google AI StudioPORT
- Optional (defaults to 4000)
LLM_GATEWAY_API_KEY
- Optional - For LLM gateway integration
-
Install Backend Dependencies:
cd backend/server npm install
-
Install Frontend Dependencies:
cd frontend npm install
-
Start Backend Server:
cd backend/server npm start
-
Load Chrome Extension:
- Open Chrome and go to
chrome://extensions/
- Enable "Developer mode"
- Click "Load unpacked" and select the
frontend
directory
- Open Chrome and go to
The extension features a modular architecture with clear separation of concerns:
frontend/src/
βββ config/
β βββ config.js # Configuration constants and settings
βββ services/
β βββ speechService.js # Text-to-speech functionality
β βββ webrtcService.js # WebRTC and WebSocket communication
β βββ recognitionService.js # Speech recognition handling
βββ ui/
β βββ micButton.js # Microphone button component
β βββ subtitleBar.js # Subtitle display component
β βββ styles.js # CSS styling injection
βββ utils/
β βββ constants.js # Global state and shared utilities
βββ main.js # Application entry point
assets/
βββ icons/
β βββ einstein.png # Extension icon
βββ images/
βββ bot3d.png # 3D bot icon for mic button
backend/
βββ client/ # Test client for development
βββ server/ # Node.js backend server
βββ .env.example # Environment variables template
βββ server.js # Main server file
βββ screenshots/ # Captured screenshots directory
- Voice Recognition: Continuous speech recognition with automatic retry logic
- Screen Sharing: WebRTC-based screen sharing with AI analysis
- Text-to-Speech: High-quality voice synthesis with subtitle display
- Real-time Communication: WebSocket connection for instant AI responses
- Visual Feedback: Animated UI components with speaking waves and ripple effects
- Secure Configuration: Environment variable management for API keys
- config.js: Centralized configuration including WebSocket URLs, API keys, speech settings, and UI positioning
- speechService.js: Handles text-to-speech synthesis with voice selection and subtitle coordination
- webrtcService.js: Manages screen sharing, WebRTC peer connections, and WebSocket communication
- recognitionService.js: Speech recognition with retry logic and error handling
- micButton.js: Floating microphone button with ripple effects and visual states
- subtitleBar.js: Subtitle display component for speech feedback
- styles.js: CSS injection for all UI components
- constants.js: Global state management and shared constants
- Never commit API keys - They are now stored in
.env
files which are gitignored - Environment variables - All sensitive data is now handled via environment variables
- API Key Management - Get your Google AI API key from the official Google AI Studio
- Permission management for microphone and screen access
- WebSocket connections should use secure protocols in production
Comprehensive unit tests for all modules using Jest.
cd frontend
# Run all tests
npm test
# Run tests in watch mode (for development)
npm run test:watch
# Run tests with coverage report
npm run test:coverage
The test suite covers:
- β Configuration: All settings and constants validation
- β Speech Services: Text-to-speech functionality and voice selection
- β WebRTC Services: Screen sharing, WebSocket communication, peer connections
- β Recognition Services: Speech recognition, retry logic, error handling
- β UI Components: Button states, subtitle display, DOM manipulation
- β Utilities: Global state management and helper functions
Update frontend/src/config/config.js
to modify:
- WebSocket server URLs
- Speech recognition settings
- Voice synthesis preferences
- UI positioning and styling
- Uses ES6 modules with
import/export
syntax - Configured as
"type": "module"
in manifest.json - Clear dependency injection and service composition
- Comprehensive error handling across all services
- Graceful degradation for missing browser features
- User-friendly error messages via speech synthesis
- Maintainability: Each module has a single responsibility
- Testability: Services can be tested in isolation
- Reusability: Components can be reused across different contexts
- Scalability: Easy to add new features without affecting existing code
- Debugging: Easier to locate and fix issues
- Ensure your
.env
files are properly configured before running - The backend server must be running for full functionality
- Check the console for any environment variable errors
- Grant necessary permissions for microphone and screen access when prompted
-
"GOOGLE_AI_API_KEY environment variable is required"
- Make sure you've created
backend/server/.env
with your API key
- Make sure you've created
-
Extension not working
- Check that the backend server is running on the correct port
- Verify WebSocket URL in frontend config matches your server
-
API calls failing
- Verify your Google AI API key is valid and has proper permissions
-
Speech recognition not working
- Ensure microphone permissions are granted
- Check browser speech recognition support
-
Screen sharing issues
- Grant screen sharing permissions when prompted
- Verify WebRTC peer connection establishment