This project demonstrates an AI-powered voice control interface using various technologies such as Pygame, OpenCV, and speech recognition, with GPU acceleration support for improved performance. The interface allows users to control an ESP32 LED/bulb using voice commands or text input.
- Voice/text command processing with LLM
- Real-time visual feedback with GPU acceleration
- Asynchronous command handling
- ESP32 device control
- CUDA-enabled processing support
- CUDA-compatible GPU (recommended)
- ESP32 development board
- Microphone for voice input
π Why GPU Acceleration Matters - Full Details
Performance Comparison
Example matrix operation (20000x20000):
- CPU time: 213.05 seconds
- GPU time: 3.73 seconds
- GPU speedup: 57.15x faster
- Overview
- System Architecture
- Installation Guide
- Setup Instructions
- Interface Components
- Programming Guide
- Function Calling with Ollama
- ESP32 Integration
- Troubleshooting
- Training Exercises
This system demonstrates AI agency through voice/text control of ESP32 devices, featuring:
- Voice command processing
- Real-time visual feedback
- Asynchronous command handling
- ESP32 device control
graph TD
A[User Input] --> B[Interface Layer]
B --> C[Command Processor]
C --> D[Ollama LLM]
D --> E[Function Caller]
E --> F[ESP32 Controller]
F --> G[Device]
# Ubuntu/Debian Dependencies
sudo apt-get update
sudo apt-get install -y \
python3-dev \
portaudio19-dev \
python3-pyaudio \
ffmpeg \
libsm6 \
libxext6
python -m venv .venv
source .venv/bin/activate # Linux/Mac
.venv\Scripts\activate # Windows
pip install -r requirements.txt
# Install Ollama
curl -fsSL https://ollama.com/install.sh | sh
# Install Models
ollama pull llama3.2:3b
project/
βββ esp32_voice_control.py
βββ requirements.txt
βββ assets/
β βββ cool_ai_animation.mp4
βββ README.md
# ESP32 Settings
ESP_IP = "192.168.0.249"
VIDEO_PATH = "assets/cool_ai_animation.mp4"
# Screen Dimensions
SCREEN_WIDTH = VIDEO_DISPLAY_WIDTH
SCREEN_HEIGHT = VIDEO_DISPLAY_HEIGHT + PADDING
# UI Elements
START_BUTTON = pygame.Rect(...)
STOP_BUTTON = pygame.Rect(...)
SEND_BUTTON = pygame.Rect(...)
INPUT_BOX = pygame.Rect(...)
def process_command_thread():
while True:
user_input = command_queue.get()
response = ollama.chat(
model='llama3.2:3b',
messages=[{
'role': 'user',
'content': user_input
}],
tools=[control_esp_light],
)
def listen_for_command():
with sr.Microphone() as source:
recognizer.adjust_for_ambient_noise(source)
audio = recognizer.listen(source)
command = recognizer.recognize_google(audio)
def control_esp_light(state: str) -> str:
"""
Control ESP32 LED state.
Args:
state: "ON" or "OFF"
Returns:
str: Operation result
"""
tools = [{
"name": "control_esp_light",
"description": "Control LED state",
"parameters": {
"type": "object",
"properties": {
"state": {
"type": "string",
"enum": ["ON", "OFF"]
}
}
}
}]
#include <WiFi.h>
const char* ssid = "Your_WiFi_Name";
const char* password = "Your_WiFi_Password";
WiFiServer server(80);
void setup() {
pinMode(LED_PIN, OUTPUT);
WiFi.begin(ssid, password);
}
def control_device(state):
url = f"http://{ESP_IP}/{'H' if state == 'ON' else 'L'}"
response = requests.get(url)
return response.status_code == 200
-
Audio Device Not Found
sudo apt-get install python3-pyaudio
-
Ollama Connection Error
ollama serve
-
ESP32 Connection Failed
- Check WiFi connection
- Verify IP address
- Test ESP32 endpoint
- Install dependencies
- Configure ESP32
- Test voice recognition
- Add new functions
- Define LLM tools
- Test function calling
- Modify UI elements
- Add new controls
- Enhance visual feedback
This documentation is maintained by 3D & Robotics Lab. For updates and support, visit our GitHub repository.