OtosakuStreamingASR is a lightweight on-device streaming speech recognition engine for iOS. It performs real-time audio processing using a Conformer-based architecture and CTC decoding.
- ✅ Fully offline
- 🎙 Real-time streaming speech recognition
- 🛠 Modular architecture (feature extractor, encoder, decoder)
Watch the model running live on iPhone 13:
Add the Swift Package to your Xcode project:
https://github.com/Otosaku/OtosakuStreamingASR-iOS
import OtosakuStreamingASR
let asr = OtosakuStreamingASR()
try asr.prepareModel(from: modelURL)
asr.subscribe { text in
print("🗣 Recognized: \(text)")
}
// Raw audio chunk: [Double] in range [-1.0, 1.0], strictly 2559 samples per chunk (80ms at 16kHz)
try asr.predictChunk(rawChunk: yourRawAudioChunk)
try asr.stop() // Finalize and decode remaining buffer
asr.reset() // Reset internal model state
-
Architecture: Fast Conformer (Cache-Aware Streaming)
-
Language: 🇷🇺 Russian (fine-tuned from English)
-
Training: 250 hours of Russian speech (30 epochs)
-
WER (Word Error Rate):
- Russian (fine-tuned): 11%
- English (before fine-tuning): 6.5% on LibriSpeech
test-other
🔗 Download Russian model: Link to model
For other languages or custom domains, contact me:
Method | Description |
---|---|
prepareModel(from:) |
Load model from directory |
predictChunk(rawChunk:) |
Submit audio frame ([Double] ) |
stop() |
Finalize and decode remaining buffer |
reset() |
Reset model state |
subscribe { String in } |
Receive transcribed text in real time |
16kHz
and normalized to [-1.0, 1.0]
, strictly 2559 samples per chunk (80ms at 16kHz)
This package is designed with privacy in mind:
- Runs entirely on-device
- No cloud calls or external dependencies
If you have any questions, suggestions, or are interested in adapting the model to another language or domain:
Email: otosaku.dsp@gmail.com
MIT License