OnVoice - Real-Time Lecture Transcription & Translation

OnVoice is a real-time lecture transcription and translation service. Speakers can conduct lectures through Bluetooth microphones, and participants can scan QR codes to view real-time subtitles and translations.

Key Features

Speaker (Host)

Real-Time Speech Recognition: Browser-based STT using Web Speech API (instant transcription)
Session Management: Lecture title, description, language settings
Automatic QR Code Generation: Real-time QR codes for easy participant access
Session Persistence: Automatic session recovery after browser restart
Real-Time Caption Display: Instant audio processing for text conversion
Live Participant Monitoring: Real-time display of connected participants
Lifetime Storage: Unlimited storage for speaker sessions
Auto-Restart: Automatic restart every 4.5 minutes to prevent Web Speech API timeout

Audience (Participants)

QR Code Access: Scan QR codes with smartphones for instant participation
No Authentication Required: Public links for online sessions
Multi-Language Translation: Real-time translation in 50+ languages
Personalized Settings: Font size, dark mode, auto-scroll, etc.
Remote Access: Support for online conferences, webinars, and remote participation
30-Day Free Storage: Free storage of participated sessions for 30 days (with login)

Technology Stack

Frontend: Next.js 15, React 19, TypeScript
UI: Tailwind CSS, Radix UI, react-qr-code
Authentication: Supabase Auth (Google OAuth)
Database: Supabase PostgreSQL
Real-time Communication: Supabase Realtime
Speech Recognition: Web Speech API (browser-based STT)
QR Code: react-qr-code, qrcode
Audio Processing: MediaRecorder API (WebRTC)
Translation: Google Translate API / Azure Translator

Installation & Setup

1. Clone Project

git clone <repository-url>
cd onvoice
pnpm install
pnpm dev

2. Supabase Setup

Create a new project on Supabase
Enable Google OAuth in Authentication > Providers
Create OAuth 2.0 Client ID in Google Cloud Console
Configure Google OAuth in Supabase project settings

3. Environment Variables

Create .env.local file and add the following:

# Supabase
NEXT_PUBLIC_SUPABASE_URL=your_supabase_url
NEXT_PUBLIC_SUPABASE_PUBLISHABLE_KEY=your_supabase_publishable_key
SUPABASE_SECRET_KEY=your_supabase_secret_key

# Google OAuth
NEXT_PUBLIC_GOOGLE_CLIENT_ID=your_google_client_id

# OpenAI (Optional - for future STT fallback)
OPENAI_API_KEY=your_openai_api_key

# Gemini (Required - for STT review and translation)
GEMINI_API_KEY=your_gemini_api_key

# Google Translate (Optional - for translation features)
GOOGLE_TRANSLATE_API_KEY=your_google_translate_api_key

# Feature Flags (Optional)
ENABLE_AI_REVIEW=false  # Set to 'true' to enable AI review of transcripts (disabled by default)

# Next.js (Optional)
NEXTAUTH_SECRET=your_nextauth_secret
NEXTAUTH_URL=http://localhost:3000

How to Obtain Environment Variables

Supabase Keys:
- Supabase Dashboard → Settings → API
- Copy URL and publishable key
- SUPABASE_SECRET_KEY is the secret key (never expose!)
Google Client ID:
- Google Cloud Console → APIs & Services → Credentials
- Create OAuth 2.0 Client ID for Web application
- Add your domain to authorized domains
OpenAI API Key (Optional):
- OpenAI Platform → API Keys
- Create new secret key
- Note: Currently used for fallback STT functionality
Gemini API Key (Required):
- Google AI Studio → API Keys
- Create new API key
- Note: Used for STT review and translation
Google Translate API Key (Optional):
- Google Cloud Console → APIs & Services → Library
- Enable Cloud Translation API
- Create API key

4. Database Schema Setup

Execute the SQL files in the sqls/ directory in Supabase SQL Editor in the following order:

Initial Schema: supabase-schema.sql - Creates all tables and policies
Session Migration: migrate-sessions-table.sql - Adds category and summary columns
Category & Summary: add-session-category-summary.sql - Adds category constraints
Summary Cache: create-session-summary-cache.sql - Creates summary translation cache
Fix Schema: fix-db-schema.sql - Fixes translation cache structure
Fix Summary Cache: fix-session-summary-cache.sql - Fixes summary cache structure
Add Reviewed Text: add-reviewed-text-column.sql - Adds reviewed_text column for Gemini review

Important: Execute these SQL files in order to ensure proper database structure.

5. Start Development Server

pnpm dev

Usage

Starting a Session as Speaker

Login with Google account
Click "Start as Host"
Set session title, description, and language
Click "Start Session" to begin speech recognition
Display QR code for participants to access

Joining a Session as Participant

Scan QR code provided by speaker
Login with Google account (optional)
Select desired language
View real-time captions and translations

📁 Project Structure

onvoice/
├── app/                    # Next.js App Router
│   ├── api/               # API routes
│   │   ├── session/       # Session management APIs
│   │   ├── stt/           # Speech-to-text API
│   │   ├── stt-stream/    # Real-time STT streaming
│   │   └── translate/     # Translation API
│   ├── auth/              # Authentication pages
│   ├── host/              # Speaker dashboard
│   ├── session/           # Session participation
│   ├── profile/       # My sessions management
│   ├── s/[slug]/          # Public session access
│   └── demo/              # Demo page
├── components/            # React components
│   ├── auth/             # Authentication components
│   ├── ui/               # UI components
│   ├── RealtimeSTT.tsx   # Legacy Web Speech API component
│   └── OpenAIRealtimeSTT.tsx # OpenAI Realtime transcription component
├── lib/                  # Utilities and configuration
│   ├── supabase.ts       # Supabase client
│   ├── types.ts          # TypeScript type definitions
│   └── utils.ts          # Utility functions
└── supabase-schema.sql   # Database schema

🔧 Key Features Explained

Real-Time STT System

Web Speech API Integration: Real-time browser-based speech recognition
Instant Processing: Zero-latency transcription with immediate results
Auto-Restart: Automatic restart every 4.5 minutes to prevent API timeout
Cost-Free: No external API costs for speech recognition

Translation System

On-Demand Translation: Only translates when translation tab is active
Cost Efficiency: 50-70% cost reduction through selective translation
Multiple Providers: Support for Google Translate and Azure Translator
Language Auto-Detection: Automatic language detection from browser settings

AI Review System

Configurable AI Review: Toggle AI review and translation of transcripts using the ENABLE_AI_REVIEW environment variable (disabled by default)
Cost Control: AI review is disabled by default to reduce API costs - enable only when high accuracy review is needed
Fallback Mode: When disabled, original transcripts are saved directly without AI processing
Status Tracking: Transcripts are marked with 'skipped' status when AI review is disabled

QR Code System

Network IP Detection: Automatic network IP detection using WebRTC
Public/Private URLs: Support for both public and private session access
Mobile Optimization: Responsive design for mobile devices

Session Management

Real-Time Updates: Live participant count and transcript updates
Session Persistence: Automatic session recovery and state management
Guest Access: Support for unauthenticated guest participation

Speech Recognition System

Real-Time STT: Web Speech API for instant transcription
High Accuracy: Optimized recognition settings for lecture content
Multi-Language Support: Auto-detection and manual language selection
Continuous Recognition: Seamless speech-to-text conversion
5-Minute Timeout Prevention: Automatic restart every 4.5 minutes to prevent Web Speech API timeout
Network Error Recovery: Automatic reconnection on network issues

Deployment

Vercel Deployment

Push code to GitHub
Connect project in Vercel
Configure environment variables
Deploy

Environment Variables Verification

After deployment, verify these environment variables are correctly set:

NEXT_PUBLIC_SUPABASE_URL
NEXT_PUBLIC_SUPABASE_PUBLISHABLE_KEY
NEXT_PUBLIC_GOOGLE_CLIENT_ID
OPENAI_API_KEY

💡 Cost Optimization

STT Costs

Web Speech API: Free (browser-based, no server costs)
Deepgram: $0.0043/minute (requires Growth plan for WebSocket streaming)

Translation Costs

Google Translate: $20/1M characters (~$1.2 for 1-hour lecture)
Azure Translator: ~50% cheaper than Google Translate
Optimization: Only translate when translation tab is active

🐛 Troubleshooting

Common Issues

STT Not Working: Check browser compatibility and microphone permissions
QR Code Not Generating: Verify network connectivity and IP detection
Translation Failing: Ensure Google Translate API key is set
Session Not Saving: Check Supabase connection and permissions
Transcript Not Showing on Summary Page: Database access policy issue for ended sessions

🔧 Fix for Transcript Access Issue

Problem: After a session ends, audience members cannot view transcripts on the summary page, even though the summary appears correctly.

Root Cause: Supabase RLS (Row Level Security) policies only allow transcript access for:

Active sessions (anyone can view)
Session hosts (can always view their own sessions)

Solution: Execute the following SQL in your Supabase SQL Editor:

-- Fix transcript access policy for ended sessions
-- File: sqls/fix-transcript-access-policy.sql

-- Add new policy for users who have saved sessions
CREATE POLICY "Users can view transcripts for saved sessions" ON transcripts
  FOR SELECT USING (
    EXISTS (
      SELECT 1 FROM user_sessions
      WHERE user_sessions.session_id = transcripts.session_id
      AND user_sessions.user_id = auth.uid()
    )
  );

-- Add new policy for public summary pages (anyone can view transcripts for ended sessions)
CREATE POLICY "Anyone can view transcripts for ended sessions on summary pages" ON transcripts
  FOR SELECT USING (
    EXISTS (
      SELECT 1 FROM sessions
      WHERE sessions.id = transcripts.session_id
      AND sessions.status = 'ended'
    )
  );

-- Update sessions policy to allow viewing ended sessions for summary pages
DROP POLICY IF EXISTS "Anyone can view ended sessions" ON sessions;
CREATE POLICY "Anyone can view ended sessions" ON sessions
  FOR SELECT USING (status = 'ended');

Steps to Fix:

Go to your Supabase Dashboard
Navigate to SQL Editor
Execute the SQL commands above
Test by accessing a completed session's summary page

Development Tips

Use browser developer tools to monitor WebSocket connections
Check Supabase logs for database errors
Monitor API usage to optimize costs
Check browser console for detailed transcript loading logs

📄 License

MIT License

🤝 Contributing

Fork the Project
Create your Feature Branch (git checkout -b feature/AmazingFeature)
Commit your Changes (git commit -m 'Add some AmazingFeature')
Push to the Branch (git push origin feature/AmazingFeature)
Open a Pull Request

📞 Support

For support and questions, please open an issue on GitHub or contact the development team.

OnVoice - Making lectures accessible to everyone, everywhere. 🌍

Name		Name	Last commit message	Last commit date
Latest commit History 343 Commits
app		app
components		components
hooks		hooks
lib		lib
public		public
schema		schema
sqls		sqls
.gitignore		.gitignore
.prettierignore		.prettierignore
.prettierrc		.prettierrc
DEVELOPMENT.md		DEVELOPMENT.md
Installation.md		Installation.md
README.md		README.md
SETUP_GUIDE.md		SETUP_GUIDE.md
SUPABASE_SETUP.md		SUPABASE_SETUP.md
TRANSLATION_LOGIC.md		TRANSLATION_LOGIC.md
components.json		components.json
dfx.json		dfx.json
eslint.config.mjs		eslint.config.mjs
middleware.ts		middleware.ts
next.config.ts		next.config.ts
package-lock.json		package-lock.json
package.json		package.json
pnpm-lock.yaml		pnpm-lock.yaml
postcss.config.mjs		postcss.config.mjs
tsconfig.json		tsconfig.json

mooner92/onvoice

Folders and files

Latest commit

History

Repository files navigation