Skip to content

IrchadX/IrchadSTT

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Voice Command System

A multilingual voice recognition server that processes audio commands and communicates with Android applications via WebSocket. The system supports French and English voice commands for navigation, emergency calls, device control, and object detection.

Deployment Notes

Important: The model could not be deployed after many trials due to the size of the models that is not free to host. It's only deployed on a Docker image and used the day of the presentation with the IoT device and the mobile app.

Features

  • Multilingual Support: French and English voice recognition
  • Real-time Audio Processing: WebSocket-based audio streaming
  • Voice Activity Detection: Intelligent speech detection
  • Command Categories:
    • Language switching
    • Navigation controls
    • Emergency calls
    • Device management
    • Object detection
  • Android App Integration: Seamless communication with mobile clients
  • Audio Preprocessing: Noise filtering and normalization

Prerequisites

System Requirements

  • Python 3.7 or higher
  • Audio input capability
  • Network connectivity for WebSocket communication

Required Python Packages

pip install numpy scipy rapidfuzz websockets asyncio vosk pyaudio webrtcvad

Voice Recognition Models

Download the required Vosk models:

English Model (US):

wget https://alphacephei.com/vosk/models/vosk-model-en-us-0.22.zip
unzip vosk-model-en-us-0.22.zip
mv vosk-model-en-us-0.22 model_us

French Model:

wget https://alphacephei.com/vosk/models/vosk-model-fr-0.22.zip
unzip vosk-model-fr-0.22.zip
mv vosk-model-fr-0.22 model_fr

Installation

  1. Clone or download the project files
  2. Install dependencies:
    pip install -r requirements.txt
  3. Download voice models (see Prerequisites section)
  4. Verify directory structure:
    project/
    ├── voice_command_system.py
    ├── model_us/          # English model
    ├── model_fr/          # French model
    └── requirements.txt
    

Usage

Starting the Server

python voice_command_system.py

The server will:

  • Start on ws://0.0.0.0:8765
  • Load the default French model
  • Wait for Android app connections
  • Display available commands and status

Android App Connection

Your Android app should connect to:

ws://[SERVER_IP]:8765

Send audio data as binary WebSocket messages and receive command responses as JSON or text. both the server host and the mobile app should be on the same network / use ngrok

Voice Commands

Language Switching

English to French:

  • "switch to french"
  • "change to french"
  • "français"
  • "parler français"

French to English:

  • "changer en anglais"
  • "passer en anglais"
  • "english"
  • "speak english"

Navigation Commands

English:

  • "main menu" → MainScreen
  • "profile" → Profil
  • "settings" → Parametre
  • "information" → Information

French:

  • "menu principal" → MainScreen
  • "profil" → Profil
  • "paramètres" → Parametre
  • "informations" → Information

Emergency Commands

English:

  • "call assistant" → CALL_ASSISTANT
  • "emergency" → CALL_EMERGENCY
  • "police" → CALL_POLICE
  • "ambulance" → CALL_AMBULANCE

French:

  • "appeler assistance" → CALL_ASSISTANT
  • "urgence" → CALL_EMERGENCY
  • "police" → CALL_POLICE
  • "ambulance" → CALL_AMBULANCE

Device Commands

English:

  • "battery" → CHECK_BATTERY
  • "device status" → CHECK_DEVICE_STATUS
  • "connection" → CHECK_CONNECTION

French:

  • "batterie" → CHECK_BATTERY
  • "état appareil" → CHECK_DEVICE_STATUS
  • "connexion" → CHECK_CONNECTION

Object Detection

English:

  • "detect objects" → START_OBJECT_DETECT
  • "stop detection" → STOP_OBJECT_DETECT

French:

  • "détecter objets" → START_OBJECT_DETECT
  • "arrêter détection" → STOP_OBJECT_DETECT

WebSocket Communication

Incoming Messages

Audio Data: Send raw audio as binary WebSocket messages (16kHz, 16-bit PCM)

Language Change:

{
  "language": "en" // or "fr"
}

Command Messages:

COMMAND:LANGUAGE_CHANGED:en
CONFIRM_LANGUAGE_CHANGE:fr

Outgoing Messages

Command Execution:

COMMAND:NAVIGATE_TO:MainScreen
COMMAND:CALL_EMERGENCY
COMMAND:CHECK_BATTERY

Language Change Confirmation:

{
  "type": "language_changed",
  "language": "fr",
  "status": "success",
  "source": "voice_command"
}

Configuration

Default Settings

  • Default Language: French (selected_lang = "fr")
  • WebSocket Port: 8765
  • Audio Sample Rate: 16kHz
  • Frame Duration: 30ms
  • Similarity Threshold: 70% (65% for language commands)

Customization

Modify these variables in the script:

  • selected_lang: Change default language
  • LANGUAGE_COMMANDS: Add new language switch phrases
  • NAVIGATION_COMMANDS: Add navigation destinations
  • EMERGENCY_COMMANDS: Customize emergency actions

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published