Voice Command System

A multilingual voice recognition server that processes audio commands and communicates with Android applications via WebSocket. The system supports French and English voice commands for navigation, emergency calls, device control, and object detection.

Deployment Notes

Important: The model could not be deployed after many trials due to the size of the models that is not free to host. It's only deployed on a Docker image and used the day of the presentation with the IoT device and the mobile app.

Features

Multilingual Support: French and English voice recognition
Real-time Audio Processing: WebSocket-based audio streaming
Voice Activity Detection: Intelligent speech detection
Command Categories:
- Language switching
- Navigation controls
- Emergency calls
- Device management
- Object detection
Android App Integration: Seamless communication with mobile clients
Audio Preprocessing: Noise filtering and normalization

Prerequisites

System Requirements

Python 3.7 or higher
Audio input capability
Network connectivity for WebSocket communication

Required Python Packages

pip install numpy scipy rapidfuzz websockets asyncio vosk pyaudio webrtcvad

Voice Recognition Models

Download the required Vosk models:

English Model (US):

wget https://alphacephei.com/vosk/models/vosk-model-en-us-0.22.zip
unzip vosk-model-en-us-0.22.zip
mv vosk-model-en-us-0.22 model_us

French Model:

wget https://alphacephei.com/vosk/models/vosk-model-fr-0.22.zip
unzip vosk-model-fr-0.22.zip
mv vosk-model-fr-0.22 model_fr

Installation

Clone or download the project files
Install dependencies:
```
pip install -r requirements.txt
```
Download voice models (see Prerequisites section)

Verify directory structure:

project/
├── voice_command_system.py
├── model_us/          # English model
├── model_fr/          # French model
└── requirements.txt

Usage

Starting the Server

python voice_command_system.py

The server will:

Start on ws://0.0.0.0:8765
Load the default French model
Wait for Android app connections
Display available commands and status

Android App Connection

Your Android app should connect to:

ws://[SERVER_IP]:8765

Send audio data as binary WebSocket messages and receive command responses as JSON or text. both the server host and the mobile app should be on the same network / use ngrok

Voice Commands

Language Switching

English to French:

"switch to french"
"change to french"
"français"
"parler français"

French to English:

"changer en anglais"
"passer en anglais"
"english"
"speak english"

Navigation Commands

English:

"main menu" → MainScreen
"profile" → Profil
"settings" → Parametre
"information" → Information

French:

"menu principal" → MainScreen
"profil" → Profil
"paramètres" → Parametre
"informations" → Information

Emergency Commands

English:

"call assistant" → CALL_ASSISTANT
"emergency" → CALL_EMERGENCY
"police" → CALL_POLICE
"ambulance" → CALL_AMBULANCE

French:

"appeler assistance" → CALL_ASSISTANT
"urgence" → CALL_EMERGENCY
"police" → CALL_POLICE
"ambulance" → CALL_AMBULANCE

Device Commands

English:

"battery" → CHECK_BATTERY
"device status" → CHECK_DEVICE_STATUS
"connection" → CHECK_CONNECTION

French:

"batterie" → CHECK_BATTERY
"état appareil" → CHECK_DEVICE_STATUS
"connexion" → CHECK_CONNECTION

Object Detection

English:

"detect objects" → START_OBJECT_DETECT
"stop detection" → STOP_OBJECT_DETECT

French:

"détecter objets" → START_OBJECT_DETECT
"arrêter détection" → STOP_OBJECT_DETECT

WebSocket Communication

Incoming Messages

Audio Data: Send raw audio as binary WebSocket messages (16kHz, 16-bit PCM)

Language Change:

{
  "language": "en" // or "fr"
}

Command Messages:

COMMAND:LANGUAGE_CHANGED:en
CONFIRM_LANGUAGE_CHANGE:fr

Outgoing Messages

Command Execution:

COMMAND:NAVIGATE_TO:MainScreen
COMMAND:CALL_EMERGENCY
COMMAND:CHECK_BATTERY

Language Change Confirmation:

{
  "type": "language_changed",
  "language": "fr",
  "status": "success",
  "source": "voice_command"
}

Configuration

Default Settings

Default Language: French (selected_lang = "fr")
WebSocket Port: 8765
Audio Sample Rate: 16kHz
Frame Duration: 30ms
Similarity Threshold: 70% (65% for language commands)

Customization

Modify these variables in the script:

selected_lang: Change default language
LANGUAGE_COMMANDS: Add new language switch phrases
NAVIGATION_COMMANDS: Add navigation destinations
EMERGENCY_COMMANDS: Customize emergency actions

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
__pycache__		__pycache__
model_fr		model_fr
model_us		model_us
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
requirements.txt		requirements.txt
voicecommand.py		voicecommand.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Voice Command System

Deployment Notes

Features

Prerequisites

System Requirements

Required Python Packages

Voice Recognition Models

Installation

Usage

Starting the Server

Android App Connection

Voice Commands

Language Switching

Navigation Commands

Emergency Commands

Device Commands

Object Detection

WebSocket Communication

Incoming Messages

Outgoing Messages

Configuration

Default Settings

Customization

About

Uh oh!

Releases

Packages

Languages

IrchadX/IrchadSTT

Folders and files

Latest commit

History

Repository files navigation

Voice Command System

Deployment Notes

Features

Prerequisites

System Requirements

Required Python Packages

Voice Recognition Models

Installation

Usage

Starting the Server

Android App Connection

Voice Commands

Language Switching

Navigation Commands

Emergency Commands

Device Commands

Object Detection

WebSocket Communication

Incoming Messages

Outgoing Messages

Configuration

Default Settings

Customization

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages