A sophisticated voice-powered meeting assistant that provides real-time transcription, automatic question detection, and AI-generated summaries in Persian (Farsi). Built with Preact, Node.js, and integrated with Google Cloud Speech-to-Text and OpenAI APIs.
- Real-time Voice Transcription: Live speech-to-text conversion in Persian using Google Cloud Speech-to-Text API
- Automatic Participant Detection: Automatically detects and adds meeting participants by scanning Google Meet interface elements
- Speaker Diarization: Automatic speaker identification and management with intelligent participant mapping
- Intelligent Question Detection: Automatically identifies and highlights questions during meetings
- AI-Powered Summaries: Generate comprehensive meeting summaries using OpenAI GPT
- Multi-speaker Support: Add, edit, and manage multiple meeting participants with automatic detection
- Dynamic Participant Management: Real-time updates when participants join or leave the meeting
- Export Functionality: Download meeting transcripts and summaries with participant information
- Offline Fallback: Works with browser-based speech recognition when server is unavailable
- Real-time Processing: Continuous audio processing with 30-second chunks
- Browser Extension: Fully functional as a Chrome/Firefox browser extension
- Dual Mode Support: Works as both standalone web app and browser extension
- Express.js Server: RESTful API with middleware for CORS, file upload handling
- Google Cloud Speech-to-Text: Advanced Persian language transcription
- OpenAI Integration: GPT-powered summary generation
- Multer: File upload handling for audio processing
- Real-time Audio Processing: Chunked audio processing for continuous transcription
- Preact: Lightweight React alternative (3KB) with full React compatibility
- React Hooks Support: Complete hooks compatibility via preact/compat
- Lucide Icons: Clean and consistent iconography
- Tailwind CSS: Responsive and modern styling
- Web APIs: MediaRecorder, SpeechRecognition for browser-based functionality
- DOM Observer: Intelligent participant detection through Google Meet interface monitoring
POST /api/transcribe- Audio file transcription with speaker diarizationPOST /api/detect-questions- Persian question detectionPOST /api/generate-summary- AI-powered meeting summary generationGET /api/health- Server health check
- Node.js 22+
- Docker (optional)
- Google Cloud Speech-to-Text API credentials
- OpenAI API key
- Clone the repository
git clone https://github.com/Alaleh-Mohseni/Meeting-assistant.git
cd Meeting-assistant- Install dependencies
npm install- Environment Setup
Create a
.envfile in the root directory:
OPENAI_API_KEY=your_openai_api_key_here
GOOGLE_APPLICATION_CREDENTIALS=path/to/your/google-credentials.json
PORT=5173-
Google Cloud Setup
- Create a Google Cloud project
- Enable Speech-to-Text API
- Create service account credentials
- Download the JSON key file
-
Start Development Server
npm run nodemonThe application will be available at http://localhost:5173
# Build and run with Docker Compose
docker-compose up -dThe application can be packaged and used as a browser extension for Chrome, Firefox, and other Chromium-based browsers.
- Build the extension files:
npm run build:extension- Package the extension:
npm run pack:extensionThis creates meeting-assistant-extension.zip in dist/extension/ directory.
- Open Chrome and go to
chrome://extensions/ - Enable "Developer mode" (toggle in top right)
- Click "Load unpacked"
- Select the
dist/extensionfolder (or extract and select the zip contents) - The extension icon will appear in your browser toolbar
- Go to
about:debugging - Click "This Firefox"
- Click "Load Temporary Add-on"
- Select any file from the
dist/extensionfolder
- Popup Interface: Clean, responsive UI optimized for browser extension popup
- Background Processing: Runs background scripts for continuous functionality
- Content Script Integration: Can interact with web pages when needed
- Offline Capable: Works without constant server connection using browser APIs
- Cross-browser Compatible: Works on Chrome, Firefox, Edge, and other modern browsers
dist/extension/
βββ manifest.json # Extension configuration
βββ background.js # Background script for persistent functionality
βββ content.js # Content script for web page interaction
βββ index-extension.html # Extension popup interface
βββ popup.js # Popup functionality
βββ assets/ # Compiled CSS and JS
βββ icons/ # Extension icons (16x16, 32x32, 128x128)
The extension requests minimal permissions:
- Microphone: For audio recording and transcription
- Active Tab: To interact with the current webpage (optional)
- Storage: To save user preferences and temporary data
- Start Recording: Click the microphone button to begin recording
- Automatic Participant Detection: The system automatically detects meeting participants from Google Meet
- Manage Speakers: Add, edit, or remove meeting participants (auto-detected participants are included)
- Real-time Transcription: View live transcription with speaker identification
- Question Detection: Questions are automatically highlighted
- Generate Summary: Create AI-powered meeting summaries
- Export Data: Download transcripts and summaries as text files
- Click Extension Icon: Open the meeting assistant popup
- Grant Permissions: Allow microphone access when prompted
- Automatic Detection: Extension automatically detects Google Meet participants
- Start Meeting: Begin recording directly from the popup
- Minimal Interface: Streamlined UI optimized for extension usage
- Background Processing: Meeting continues recording even when popup is closed
- Quick Access: Instant access from any webpage
The system automatically detects meeting participants using multiple DOM selectors:
- User Detection: Self-identification from
[data-self-name]attributes - Participant Elements: Detection from
[data-participant-id]and related selectors - Name Extraction: Clean extraction of participant names with filtering
- Dynamic Updates: Real-time monitoring for participant changes
- Fallback Names: Default participant names when detection fails
- Sample Rate: 16kHz (configurable in transcribe.js)
- Encoding: WEBM_OPUS (auto-detected)
- Language: Persian (fa-IR)
- Processing Chunks: 30-second intervals
- Default Speakers: Auto-detected from meeting (minimum 2)
- Maximum Speakers: Unlimited
- Speaker Labels: Automatically mapped to detected participant names
- Manual Override: Users can add additional speakers manually
The system detects Persian questions using:
- Question mark (Ψ)
- Question words: ΪΫΨ ΪΩΨ Ϊ©ΫΨ Ϊ©Ψ¬Ψ§Ψ ΪΨ±Ψ§Ψ ΪΨ·ΩΨ±Ψ Ψ’ΫΨ§
- Regular expression patterns for Persian interrogatives
- Popup Size: 400x600px (optimized for various screen sizes)
- Background Persistence: Service worker for Chrome, background page for Firefox
- Storage: Local storage for user preferences and temporary meeting data
- Icon Themes: Adaptive icons that work with light/dark browser themes
Meeting-assistant/
βββ backend/
β βββ apis/
β βββ index.js # API routes aggregation
β βββ transcribe.js # Audio transcription endpoint
β βββ questions.js # Question detection logic
β βββ summary.js # AI summary generation
βββ src/
β βββ App.jsx # Main Preact component
β βββ App.css # Component styles
β βββ index.html # Web app HTML template
β βββ index-extension.html # Extension popup HTML template
β βββ server.js # Express server setup
βββ public/
β βββ manifest.json # Browser extension manifest
β βββ background.js # Extension background script
β βββ content.js # Extension content script (includes participant detection)
β βββ icons/ # Extension icons
βββ docker-compose.yml # Docker configuration
βββ vite.config.js # Vite configuration with extension mode
βββ package.json # Dependencies and scripts
βββ .env.example # Environment variables template
βββ README.md # Project documentation
POST /api/transcribe
Content-Type: multipart/form-data
Parameters:
- audio: Audio file (WebM/WAV/MP3)
- speakerCount: Number of speakers (auto-detected or specified)POST /api/generate-summary
Content-Type: application/json
{
"transcript": [...],
"speakerNames": [...] // Includes auto-detected participant names
}POST /api/detect-questions
Content-Type: application/json
{
"transcript": [...]
}npm run nodemon- Start development server with auto-reloadnpm run dev- Start Vite development servernpm run build- Build for web application productionnpm run build:extension- Build for browser extensionnpm run dev:extension- Development mode with extension build watchingnpm run pack:extension- Build and package extension as ZIP file
- Start development with extension mode:
npm run dev:extension-
This will:
- Start the backend server
- Build extension files with watch mode
- Auto-rebuild when source files change
-
Testing:
- Load the unpacked extension from
dist/extension - Changes will be reflected after reloading the extension
- Load the unpacked extension from
The system uses advanced DOM scanning techniques to identify meeting participants:
Primary Selectors:
[data-self-name]- User's own name[data-participant-id]- Participant identifiers.zWGUib- Google Meet participant name elements[jsname="xvfV4b"]- Name container elements
Advanced Detection:
- Name Cleaning: Removes parenthetical information and extra content
- Duplicate Prevention: Ensures unique participant names
- Length Validation: Filters realistic name lengths (1-50 characters)
- Real-time Updates: Monitors DOM changes for participant additions/removals
Fallback Handling:
- Default participant names when detection fails
- Manual participant addition capability
- Graceful degradation for unsupported Meeting interfaces
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'Add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
- Follow Persian language conventions for UI text
- Maintain RTL (Right-to-Left) text direction
- Test with various Persian accents and dialects
- Ensure mobile responsiveness
- Test both web app and extension modes
- Test participant detection with different Meeting configurations
- Follow extension store guidelines for submissions
- Real-time Processing: < 2 second latency for transcription
- Audio Chunk Size: 30 seconds for optimal processing
- Memory Usage: Efficient with chunked processing
- Bundle Size: Preact keeps bundle size under 150KB
- Extension Performance: Minimal background resource usage
- Participant Detection: Lightweight DOM monitoring with minimal performance impact
- Offline Capability: Browser-based fallback when server unavailable
- No audio data stored permanently on server
- Temporary file cleanup after processing
- API key validation for external services
- CORS protection for cross-origin requests
- Extension permissions follow principle of least privilege
- No sensitive data stored in extension local storage
- Participant names are processed locally and not transmitted unnecessarily
Microphone Permission Denied
- Ensure browser microphone permissions are granted
- Check HTTPS requirement for audio access
- In extension: grant microphone permission in popup
Participant Detection Not Working
- Ensure you're using the extension on Google Meet pages
- Check that Meeting interface has loaded completely
- Verify participants are visible in the Meeting interface
- Try refreshing the page if detection seems incomplete
Transcription Not Working
- Verify Google Cloud credentials
- Check API quotas and billing
- Ensure Persian language model availability
Summary Generation Fails
- Validate OpenAI API key
- Check API usage limits
- Verify internet connection
Docker Issues
- Ensure user permissions (UID 1000)
- Check port availability (5173, 24678)
- Verify volume mounts
Extension Issues
- Reload extension after code changes
- Check browser console for errors
- Verify manifest.json is valid
- Ensure all required files are in dist/extension
Chrome
- Use Manifest V3 format (already implemented)
- Service worker background script
Firefox
- Background page instead of service worker
- Different permission handling
- β Chrome 80+
- β Firefox 75+
- β Safari 14+
- β Edge 80+
- β Chrome/Chromium (Manifest V3)
- β Firefox (Manifest V2/V3)
- β Edge
- β Opera
β οΈ Safari (requires conversion to Safari Web Extension)
- β Standard Google Meet interface
- β Grid view and spotlight view
- β Multiple participants support
- β Real-time participant changes
β οΈ Some custom Meeting themes may affect detection
This project is licensed under the MIT License - see the LICENSE file for details.
- Google Cloud Speech-to-Text for Persian language support
- OpenAI for advanced summary generation capabilities
- Preact team for the lightweight React alternative
- Persian NLP community for language processing insights
- Browser extension development community
- Google Meet for providing accessible DOM structure for participant detection
For support, please open an issue on GitHub or contact alalamohseni@gmail.com
Made with β€οΈ for the Persian-speaking community

