A multilingual voice recognition server that processes audio commands and communicates with Android applications via WebSocket. The system supports French and English voice commands for navigation, emergency calls, device control, and object detection.
Important: The model could not be deployed after many trials due to the size of the models that is not free to host. It's only deployed on a Docker image and used the day of the presentation with the IoT device and the mobile app.
- Multilingual Support: French and English voice recognition
- Real-time Audio Processing: WebSocket-based audio streaming
- Voice Activity Detection: Intelligent speech detection
- Command Categories:
- Language switching
- Navigation controls
- Emergency calls
- Device management
- Object detection
- Android App Integration: Seamless communication with mobile clients
- Audio Preprocessing: Noise filtering and normalization
- Python 3.7 or higher
- Audio input capability
- Network connectivity for WebSocket communication
pip install numpy scipy rapidfuzz websockets asyncio vosk pyaudio webrtcvadDownload the required Vosk models:
English Model (US):
wget https://alphacephei.com/vosk/models/vosk-model-en-us-0.22.zip
unzip vosk-model-en-us-0.22.zip
mv vosk-model-en-us-0.22 model_usFrench Model:
wget https://alphacephei.com/vosk/models/vosk-model-fr-0.22.zip
unzip vosk-model-fr-0.22.zip
mv vosk-model-fr-0.22 model_fr- Clone or download the project files
- Install dependencies:
pip install -r requirements.txt
- Download voice models (see Prerequisites section)
- Verify directory structure:
project/ ├── voice_command_system.py ├── model_us/ # English model ├── model_fr/ # French model └── requirements.txt
python voice_command_system.pyThe server will:
- Start on
ws://0.0.0.0:8765 - Load the default French model
- Wait for Android app connections
- Display available commands and status
Your Android app should connect to:
ws://[SERVER_IP]:8765
Send audio data as binary WebSocket messages and receive command responses as JSON or text. both the server host and the mobile app should be on the same network / use ngrok
English to French:
- "switch to french"
- "change to french"
- "français"
- "parler français"
French to English:
- "changer en anglais"
- "passer en anglais"
- "english"
- "speak english"
English:
- "main menu" → MainScreen
- "profile" → Profil
- "settings" → Parametre
- "information" → Information
French:
- "menu principal" → MainScreen
- "profil" → Profil
- "paramètres" → Parametre
- "informations" → Information
English:
- "call assistant" → CALL_ASSISTANT
- "emergency" → CALL_EMERGENCY
- "police" → CALL_POLICE
- "ambulance" → CALL_AMBULANCE
French:
- "appeler assistance" → CALL_ASSISTANT
- "urgence" → CALL_EMERGENCY
- "police" → CALL_POLICE
- "ambulance" → CALL_AMBULANCE
English:
- "battery" → CHECK_BATTERY
- "device status" → CHECK_DEVICE_STATUS
- "connection" → CHECK_CONNECTION
French:
- "batterie" → CHECK_BATTERY
- "état appareil" → CHECK_DEVICE_STATUS
- "connexion" → CHECK_CONNECTION
English:
- "detect objects" → START_OBJECT_DETECT
- "stop detection" → STOP_OBJECT_DETECT
French:
- "détecter objets" → START_OBJECT_DETECT
- "arrêter détection" → STOP_OBJECT_DETECT
Audio Data: Send raw audio as binary WebSocket messages (16kHz, 16-bit PCM)
Language Change:
{
"language": "en" // or "fr"
}Command Messages:
COMMAND:LANGUAGE_CHANGED:en
CONFIRM_LANGUAGE_CHANGE:fr
Command Execution:
COMMAND:NAVIGATE_TO:MainScreen
COMMAND:CALL_EMERGENCY
COMMAND:CHECK_BATTERY
Language Change Confirmation:
{
"type": "language_changed",
"language": "fr",
"status": "success",
"source": "voice_command"
}- Default Language: French (
selected_lang = "fr") - WebSocket Port: 8765
- Audio Sample Rate: 16kHz
- Frame Duration: 30ms
- Similarity Threshold: 70% (65% for language commands)
Modify these variables in the script:
selected_lang: Change default languageLANGUAGE_COMMANDS: Add new language switch phrasesNAVIGATION_COMMANDS: Add navigation destinationsEMERGENCY_COMMANDS: Customize emergency actions