OnVoice is a real-time lecture transcription and translation service. Speakers can conduct lectures through Bluetooth microphones, and participants can scan QR codes to view real-time subtitles and translations.
- Real-Time Speech Recognition: Browser-based STT using Web Speech API (instant transcription)
- Session Management: Lecture title, description, language settings
- Automatic QR Code Generation: Real-time QR codes for easy participant access
- Session Persistence: Automatic session recovery after browser restart
- Real-Time Caption Display: Instant audio processing for text conversion
- Live Participant Monitoring: Real-time display of connected participants
- Lifetime Storage: Unlimited storage for speaker sessions
- Auto-Restart: Automatic restart every 4.5 minutes to prevent Web Speech API timeout
- QR Code Access: Scan QR codes with smartphones for instant participation
- No Authentication Required: Public links for online sessions
- Multi-Language Translation: Real-time translation in 50+ languages
- Personalized Settings: Font size, dark mode, auto-scroll, etc.
- Remote Access: Support for online conferences, webinars, and remote participation
- 30-Day Free Storage: Free storage of participated sessions for 30 days (with login)
- Frontend: Next.js 15, React 19, TypeScript
- UI: Tailwind CSS, Radix UI, react-qr-code
- Authentication: Supabase Auth (Google OAuth)
- Database: Supabase PostgreSQL
- Real-time Communication: Supabase Realtime
- Speech Recognition: Web Speech API (browser-based STT)
- QR Code: react-qr-code, qrcode
- Audio Processing: MediaRecorder API (WebRTC)
- Translation: Google Translate API / Azure Translator
git clone <repository-url>
cd onvoice
pnpm install
pnpm dev
- Create a new project on Supabase
- Enable Google OAuth in Authentication > Providers
- Create OAuth 2.0 Client ID in Google Cloud Console
- Configure Google OAuth in Supabase project settings
Create .env.local
file and add the following:
# Supabase
NEXT_PUBLIC_SUPABASE_URL=your_supabase_url
NEXT_PUBLIC_SUPABASE_PUBLISHABLE_KEY=your_supabase_publishable_key
SUPABASE_SECRET_KEY=your_supabase_secret_key
# Google OAuth
NEXT_PUBLIC_GOOGLE_CLIENT_ID=your_google_client_id
# OpenAI (Optional - for future STT fallback)
OPENAI_API_KEY=your_openai_api_key
# Gemini (Required - for STT review and translation)
GEMINI_API_KEY=your_gemini_api_key
# Google Translate (Optional - for translation features)
GOOGLE_TRANSLATE_API_KEY=your_google_translate_api_key
# Feature Flags (Optional)
ENABLE_AI_REVIEW=false # Set to 'true' to enable AI review of transcripts (disabled by default)
# Next.js (Optional)
NEXTAUTH_SECRET=your_nextauth_secret
NEXTAUTH_URL=http://localhost:3000
-
Supabase Keys:
- Supabase Dashboard → Settings → API
- Copy URL and publishable key
SUPABASE_SECRET_KEY
is the secret key (never expose!)
-
Google Client ID:
- Google Cloud Console → APIs & Services → Credentials
- Create OAuth 2.0 Client ID for Web application
- Add your domain to authorized domains
-
OpenAI API Key (Optional):
- OpenAI Platform → API Keys
- Create new secret key
- Note: Currently used for fallback STT functionality
-
Gemini API Key (Required):
- Google AI Studio → API Keys
- Create new API key
- Note: Used for STT review and translation
-
Google Translate API Key (Optional):
- Google Cloud Console → APIs & Services → Library
- Enable Cloud Translation API
- Create API key
Execute the SQL files in the sqls/
directory in Supabase SQL Editor in the following order:
- Initial Schema:
supabase-schema.sql
- Creates all tables and policies - Session Migration:
migrate-sessions-table.sql
- Adds category and summary columns - Category & Summary:
add-session-category-summary.sql
- Adds category constraints - Summary Cache:
create-session-summary-cache.sql
- Creates summary translation cache - Fix Schema:
fix-db-schema.sql
- Fixes translation cache structure - Fix Summary Cache:
fix-session-summary-cache.sql
- Fixes summary cache structure - Add Reviewed Text:
add-reviewed-text-column.sql
- Adds reviewed_text column for Gemini review
Important: Execute these SQL files in order to ensure proper database structure.
pnpm dev
- Login with Google account
- Click "Start as Host"
- Set session title, description, and language
- Click "Start Session" to begin speech recognition
- Display QR code for participants to access
- Scan QR code provided by speaker
- Login with Google account (optional)
- Select desired language
- View real-time captions and translations
onvoice/
├── app/ # Next.js App Router
│ ├── api/ # API routes
│ │ ├── session/ # Session management APIs
│ │ ├── stt/ # Speech-to-text API
│ │ ├── stt-stream/ # Real-time STT streaming
│ │ └── translate/ # Translation API
│ ├── auth/ # Authentication pages
│ ├── host/ # Speaker dashboard
│ ├── session/ # Session participation
│ ├── profile/ # My sessions management
│ ├── s/[slug]/ # Public session access
│ └── demo/ # Demo page
├── components/ # React components
│ ├── auth/ # Authentication components
│ ├── ui/ # UI components
│ ├── RealtimeSTT.tsx # Legacy Web Speech API component
│ └── OpenAIRealtimeSTT.tsx # OpenAI Realtime transcription component
├── lib/ # Utilities and configuration
│ ├── supabase.ts # Supabase client
│ ├── types.ts # TypeScript type definitions
│ └── utils.ts # Utility functions
└── supabase-schema.sql # Database schema
- Web Speech API Integration: Real-time browser-based speech recognition
- Instant Processing: Zero-latency transcription with immediate results
- Auto-Restart: Automatic restart every 4.5 minutes to prevent API timeout
- Cost-Free: No external API costs for speech recognition
- On-Demand Translation: Only translates when translation tab is active
- Cost Efficiency: 50-70% cost reduction through selective translation
- Multiple Providers: Support for Google Translate and Azure Translator
- Language Auto-Detection: Automatic language detection from browser settings
- Configurable AI Review: Toggle AI review and translation of transcripts using the
ENABLE_AI_REVIEW
environment variable (disabled by default) - Cost Control: AI review is disabled by default to reduce API costs - enable only when high accuracy review is needed
- Fallback Mode: When disabled, original transcripts are saved directly without AI processing
- Status Tracking: Transcripts are marked with 'skipped' status when AI review is disabled
- Network IP Detection: Automatic network IP detection using WebRTC
- Public/Private URLs: Support for both public and private session access
- Mobile Optimization: Responsive design for mobile devices
- Real-Time Updates: Live participant count and transcript updates
- Session Persistence: Automatic session recovery and state management
- Guest Access: Support for unauthenticated guest participation
- Real-Time STT: Web Speech API for instant transcription
- High Accuracy: Optimized recognition settings for lecture content
- Multi-Language Support: Auto-detection and manual language selection
- Continuous Recognition: Seamless speech-to-text conversion
- 5-Minute Timeout Prevention: Automatic restart every 4.5 minutes to prevent Web Speech API timeout
- Network Error Recovery: Automatic reconnection on network issues
- Push code to GitHub
- Connect project in Vercel
- Configure environment variables
- Deploy
After deployment, verify these environment variables are correctly set:
NEXT_PUBLIC_SUPABASE_URL
NEXT_PUBLIC_SUPABASE_PUBLISHABLE_KEY
NEXT_PUBLIC_GOOGLE_CLIENT_ID
OPENAI_API_KEY
- Web Speech API: Free (browser-based, no server costs)
- Deepgram: $0.0043/minute (requires Growth plan for WebSocket streaming)
- Google Translate: $20/1M characters (~$1.2 for 1-hour lecture)
- Azure Translator: ~50% cheaper than Google Translate
- Optimization: Only translate when translation tab is active
- STT Not Working: Check browser compatibility and microphone permissions
- QR Code Not Generating: Verify network connectivity and IP detection
- Translation Failing: Ensure Google Translate API key is set
- Session Not Saving: Check Supabase connection and permissions
- Transcript Not Showing on Summary Page: Database access policy issue for ended sessions
Problem: After a session ends, audience members cannot view transcripts on the summary page, even though the summary appears correctly.
Root Cause: Supabase RLS (Row Level Security) policies only allow transcript access for:
- Active sessions (anyone can view)
- Session hosts (can always view their own sessions)
Solution: Execute the following SQL in your Supabase SQL Editor:
-- Fix transcript access policy for ended sessions
-- File: sqls/fix-transcript-access-policy.sql
-- Add new policy for users who have saved sessions
CREATE POLICY "Users can view transcripts for saved sessions" ON transcripts
FOR SELECT USING (
EXISTS (
SELECT 1 FROM user_sessions
WHERE user_sessions.session_id = transcripts.session_id
AND user_sessions.user_id = auth.uid()
)
);
-- Add new policy for public summary pages (anyone can view transcripts for ended sessions)
CREATE POLICY "Anyone can view transcripts for ended sessions on summary pages" ON transcripts
FOR SELECT USING (
EXISTS (
SELECT 1 FROM sessions
WHERE sessions.id = transcripts.session_id
AND sessions.status = 'ended'
)
);
-- Update sessions policy to allow viewing ended sessions for summary pages
DROP POLICY IF EXISTS "Anyone can view ended sessions" ON sessions;
CREATE POLICY "Anyone can view ended sessions" ON sessions
FOR SELECT USING (status = 'ended');
Steps to Fix:
- Go to your Supabase Dashboard
- Navigate to SQL Editor
- Execute the SQL commands above
- Test by accessing a completed session's summary page
- Use browser developer tools to monitor WebSocket connections
- Check Supabase logs for database errors
- Monitor API usage to optimize costs
- Check browser console for detailed transcript loading logs
MIT License
- Fork the Project
- Create your Feature Branch (
git checkout -b feature/AmazingFeature
) - Commit your Changes (
git commit -m 'Add some AmazingFeature'
) - Push to the Branch (
git push origin feature/AmazingFeature
) - Open a Pull Request
For support and questions, please open an issue on GitHub or contact the development team.
OnVoice - Making lectures accessible to everyone, everywhere. 🌍