This has been vibe coded, do not use for training data
A web application that listens to voice commands and executes corresponding keyboard actions. Perfect for controlling games like Digital Combat Simulator (DCS) through voice.
- Voice command recognition using your system's default microphone
- Customizable keyboard scripts for each voice command
- Web interface for managing commands and monitoring speech recognition
- Toggle activation on/off through the web interface
- AI Sentiment Analysis Mode: Use OpenAI to understand voice commands by intent rather than exact matching
- Partial Matching: Trigger commands when phrases appear anywhere in speech, not just exact matches
- Script Execution Control: Enable/disable script execution while still listening for commands
- AI Mode Timeout: Automatically turn off AI mode after a specified period to save API usage
- Python 3.7+
- A microphone connected to your computer
- Web browser
- OpenAI API key (for AI Sentiment Analysis feature)
-
Clone this repository
-
Install the required Python packages:
pip install -r requirements.txt
-
Run the application:
python app.py
-
Open your web browser and navigate to:
http://localhost:5000
You can compile the application into a standalone executable using PyInstaller:
-
Install PyInstaller:
pip install pyinstaller
-
Compile the application:
pyinstaller --onefile --add-data "public;public" --icon=public/favicon.ico --name VoiceCommand app.py
Or use the included build script:
build.bat
-
The executable will be created in the
dist
folder
This repository includes a GitHub Action workflow that can build and release the application:
- Go to the "Actions" tab in the GitHub repository
- Select the "Build and Release" workflow
- Click "Run workflow"
- Enter the version number (e.g., v1.0.0) and release notes
- Click "Run workflow" to start the build process
- Once complete, the executables for Windows, Linux, and macOS will be available in the "Releases" section of the repository
The compiled executable:
- Requires no Python installation on the target machine
- Needs internet connection for speech recognition and OpenAI features
- Needs a working microphone
- May need administrator privileges to simulate keyboard input in some applications
-
Windows:
- May require running as administrator for keyboard simulation in some applications
- Compatible with Windows 10/11
-
Linux:
- Requires PortAudio and PyAudio dependencies (installed automatically in the build)
- May require
libxcb-xinerama0
for GUI display (sudo apt-get install libxcb-xinerama0
) - Needs appropriate permissions for microphone access
-
macOS:
- May require allowing microphone access in System Preferences/Settings
- May need to approve the app in Security & Privacy settings
- On newer versions, you might need to remove the quarantine attribute with:
xattr -d com.apple.quarantine /path/to/VoiceCommand
- Use the web interface to add voice commands and their corresponding keyboard scripts.
- Toggle the "Listening" button to activate voice recognition.
- Speak your command, and the system will execute the corresponding keyboard script.
The system offers three ways to match your voice to commands:
- Exact Match: Your phrase must appear exactly in what you say (always active)
- Partial Match: Enable this per command to match if phrase appears anywhere in speech
- AI Mode: Uses OpenAI to understand command intent even with different phrasing
When using AI Mode, your phrases should be treated as descriptions of what the command does, rather than exact text to match. This helps the AI better understand the intent of your command.
You can control system features from within scripts:
sentiment_on()
: Activate AI modesentiment_off()
: Deactivate AI modescripts_on()
: Enable script executionscripts_off()
: Disable script execution
Note: Even when scripts are disabled, a script containing scripts_on()
will still execute.
The application is organized into modular components:
app.py
- Main entry point for the applicationdb.py
- Database operationsapi.py
- API routes and endpointsspeech_recognition_handler.py
- Speech recognition functionalityinput_simulation.py
- Keyboard scripting and executionspeech_recognizer.py
- Speech recognition interfacepublic/
- Frontend static filesindex.html
- Main web interfacejs/main.js
- Alpine.js functionality
The application defaults to using pynput
for keyboard simulation, but you can alternatively use pyautogui
by:
- Editing
requirements.txt
to uncomment the pyautogui dependency - Installing it with
pip install pyautogui
- Modifying
input_simulation.py
accordingly
Both libraries can handle the same keyboard operations but offer different features:
- pynput: Lower-level with direct keyboard control, works well with background applications
- pyautogui: Higher-level with additional functions for mouse control and screen analysis
Scripts use a simple syntax to define keyboard actions:
-
Comments:
- Lines starting with
--
are ignored. - Text after
#
on any line is ignored.
- Lines starting with
-
Delays:
100ms
- Wait 100 milliseconds2s
- Wait 2 seconds
-
Typing:
type "Hello World"
- Types the text "Hello World"
-
Key Combinations:
ctrl+c
- Press Ctrl+Cshift+f10
- Press Shift+F10win+r
- Press Windows+R (opens Run dialog)
-
Single Keys:
enter
- Press Enter keyescape
- Press Escape keyf5
- Press F5 key
-
Special Functions:
sentiment_on()
- Activate AI modesentiment_off()
- Deactivate AI modescripts_on()
- Enable script executionscripts_off()
- Disable script execution
-- Open Notepad and type text
win+r # Open run dialog
500ms # Wait for dialog
type "notepad" # Type notepad
500ms
enter # Press enter to open notepad
1s # Wait for notepad to open
type "Hello, world!" # Type text into notepad
-- Save a file and disable scripts
ctrl+s # Press save hotkey
500ms # Wait half a second
type "filename.txt" # Type the filename
enter # Press enter
1s # Wait for save to complete
scripts_off() # Disable script execution for safety
- Microphone not working: Make sure your default microphone is working and correctly set up in your operating system.
- Speech recognition errors: Make sure you have an active internet connection for the speech recognition service.
- Keyboard commands not working in some applications: Some applications may require elevated privileges to receive simulated keyboard input.
- AI Mode not working: Ensure you have set your OpenAI API key in the settings.
- Script execution not working: Check if scripts are enabled using the "Scripts On/Off" toggle.
MIT