Real-time “Silent Disco” with Live Language Toggle for Any Stream
Late-night streamers often mute their mics to avoid waking family, making chat the only way to follow the action. Non-English viewers, meanwhile, rely on slow, post-stream captions. We wanted to give every viewer instant, private audio—in any language—without forcing creators to change their setup.

WhisperPlay is a browser extension + local server combo that:
- Captures a creator’s mic audio (or any tab's audio) locally.
- Transmits it over an encrypted WebRTC data channel directly from an audio source to the viewer's browser.
- Lets each viewer toggle between live AI-translated dubs in different languages.
All processing happens with < 1.5s latency, and the architecture ensures privacy and performance.
- Google Chrome
- Python 3.9+
- An environment that can run PyTorch (for the Whisper model)
You will need API keys from the following services:
- DeepL API (for translation)
- ElevenLabs API (for text-to-speech)
-
Clone the repository:
git clone <your-repo-url> cd WhisperPlay/server
-
Install Python dependencies:
pip install -r requirements.txt
-
Set Environment Variables: Set your API keys in your terminal session before running the server. On Windows (PowerShell):
$env:DEEPL_API_KEY="your_deepl_key_here" $env:ELEVEN_API_KEY="your_elevenlabs_key_here"
On macOS/Linux:
export DEEPL_API_KEY="your_deepl_key_here" export ELEVEN_API_KEY="your_elevenlabs_key_here"
- Open Google Chrome and navigate to
chrome://extensions
. - Enable “Developer mode” using the toggle in the top-right corner.
- Click “Load unpacked”.
- Select the
WhisperPlay/extension
folder from the project directory.
-
Start the Signaling Server: This server manages the initial WebRTC connection.
# In the /server directory python signaling_server.py
-
Start the AI Server: This server handles transcription, translation, and speech synthesis.
# In a new terminal, in the /server directory python server.py
-
Use the Extension:
- Open a browser tab with audio you want to translate (e.g., a YouTube video).
- Click the WhisperPlay icon in your browser's toolbar.
- Click "Connect". The status should change to "Connected."
- Select your desired language.
- The translated audio will begin playing automatically.
Component | Tech Stack |
---|---|
Signaling | Python websockets |
Backend | Python, Flask, WebRTC |
Browser Extension | Manifest V3, Web Audio API, WebRTC |
Speech-to-Text | OpenAI Whisper tiny model |
Translation | DeepL API |
Text-to-Speech | ElevenLabs Streaming API |
UI | HTML, CSS, JavaScript |
Architecture Flow:
Audio Source Tab → WebRTC → Extension → AI Server (Transcribe/Translate/Synthesize) → Translated Audio → Headphones
- Sub-second TTS: ElevenLabs streaming was key to reducing round-trip latency.
- WebRTC in Extensions: Service workers don’t natively support WebRTC, requiring a signaling server and careful state management between the popup, background, and content scripts.
- API Rate-Limits: The current implementation is for demo purposes; a production version would need caching and graceful fallbacks.
- Client-Side Transcription: Move Whisper transcription into the browser using
onnxruntime-web
to reduce server load. - Multi-speaker Separation: Identify and separate game audio vs. creator voice.
- Voice-Clone Consent Flow: Let streamers upload an audio sample for personalized TTS.
- Offline Mode: Bundle a lightweight WASM-based TTS engine for when APIs are unreachable.
A real-time audio streaming and translation system with OBS plugin and browser extension.
-
OBS Plugin (
/obs-plugin
)- Audio capture and encryption
- WebRTC streaming
- Written in C++
-
Browser Extension (
/extension
)- Audio reception and decryption
- Real-time translation
- Language toggle
- Written in JavaScript/TypeScript
-
Signaling Server (
/server
)- WebRTC signaling
- Written in Node.js
- CMake 3.x
- Visual Studio 2019+ (Windows)
- OBS Studio SDK
- libwebrtc
- OpenSSL
- Node.js 18.x+
- npm/yarn
- Node.js 18.x+
- npm/yarn
Detailed setup instructions for each component coming soon.
- Real-time audio streaming via WebRTC
- End-to-end encryption
- Multi-language support with real-time translation
- Low latency (<200ms for audio, <1s for translation)
- Cross-platform compatibility