Real-time streaming speech-to-text transcription running entirely in your browser using Rust and WebAssembly (WASM). This demo processes audio entirely offline on your CPU after downloading a ~950MB speech recognition model.
demo-video.mp4
Try it live: https://huggingface.co/spaces/efficient-nlp/wasm-streaming-speech
- Kyutai STT Model - 1B param streaming speech recognition model for English and French. This demo uses a 4-bit quantized version of the model.
- Candle - Hugging Face's ML framework for Rust
- Rayon - CPU parallelization for Rust
- wasm-bindgen-rayon - WASM bindings for Rayon
This is a research/tech demo. For more accurate cloud transcription and real-time LLM grammar correction, check out Voice Writer.
Performance varies by device.
- On Apple Silicon or other recent CPUs, it typically runs in real time.
- On older devices, it may not keep up (real-time factor < 1).
- Mobile devices are not supported.
-
Rust, Cargo, wasm32-unknown-unknown
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh rustup target add wasm32-unknown-unknown
-
wasm-bindgen-cli
cargo install wasm-bindgen-cli
-
wasm-opt (Binaryen) – optional but recommended
- macOS:
brew install binaryen
- Ubuntu/Debian:
sudo apt install binaryen
- macOS:
-
Python 3
-
curl
-
Clone the repository:
git clone https://github.com/lucky-bai/wasm-speech-streaming cd wasm-speech-streaming
-
Build the Rust/WASM library:
./build-lib.sh
-
Open your browser and go to:
http://localhost:8000
MIT License