Humanoid Robot Head with Real-Time Voice Interaction Using Ingescape

General Overview

This project, part of a distributed interaction course at UPSSITECH, aims to integrate Ingescape into a humanoid robot head's voice interaction interface. The head, designed in SolidWorks, incorporates 3D-printed components and multiple hardware and software integrations.

Key Features

Voice Interaction: Real-time response to vocal commands and questions using speech recognition, transcription and speech synthesis with gTTS (Google Text-To-Speech).
Dynamic Eye and Head Movements: Animated eyes on circular RGB LCD screens and head movements via MyActuator RMD-L-5005 Brushless Servomotors with CAN communication.
Real-Time Transcription with Whisper: Integration of OpenAI's Whisper for seamless audio transcription.
Future Development: Facial and gesture recognition via OpenCV and MediaPipe, leveraging a Raspberry Pi Camera Module v2 8MP.

Hardware Setup

Core Components:
- 1x Raspberry Pi 4 Model B - 8GB (running Ubuntu 24.04 LTS Server)
- 3x Waveshare 1.28inch Round LCD Module Displays for animated eyes
- 1x Raspberry Pi Camera Module v2 8MP for facial and gesture recognition in future iterations
- 3x MyActuator RMD-L-5005 brushless servomotors for head movement
- 1x USB to CAN Converter for servomotor communication
- 1x USB 2.0 Mini Microphone for audio input
- 1x USB Mini Speaker for sound output
Mechanical Components:
- Design: SolidWorks
- Slicing: PrusaSlicer
- Printing: Original Prusa MINI+ (180 x 180 x 180 mm volume)
- Materials: PETG filaments (Polymaker PolyLite PETG Black & Grey)
- Additional: Tinted visor

Software Setup

To ensure seamless installation and configuration, use the provided setup_software.sh script. This script handles all necessary installations, including Python 3.11, Ingescape, and other dependencies.

Key Software

Operating System: Ubuntu 24.04 LTS Server
Voice and Interaction Libraries:
- Ingescape (Circle & Whiteboard)
- OpenAI's Whisper
Python Version:
- Python 3.11

System Architecture

Installation

Please use the following command to install the dependencies :

sudo bash setup_laptop.sh

You can also modify all the parameters of the devices into the main.py file

# CONFIG RASPBERRY PI
"""
simulation_mode = False
device = "wlan0"
playback_device_name = "UACDemoV1.0"
sample_rate = 48000
speed_factor_tts = 1.15
recording_device_name = "USB PnP Sound Device"
mic_sample_rate = 44100
silence_threshold = 0.02
silence_duration = 0.5
"""

# YOUR LAPTOP CONFIG
simulation_mode = True
device = "wlo1"
playback_device_name = "UACDemoV1.0"
sample_rate = 48000
speed_factor_tts = 1.15
recording_device_name = "USB PnP Sound Device"
mic_sample_rate = 44100
silence_threshold = 0.02
silence_duration = 0.5

How It Works

The robot head is controlled by a Python-based system that listens to user input (voice commands) and responds with dynamic actions. The core functionalities are:

Voice Interaction:
- Whisper is used to transcribe speech in real-time.
- gTTS is used to generate speech responses from text.
Dynamic Eye and Mouth & Head Movements:
- Eye and mouth animations are shown using Waveshare LCD displays and on the Whiteboard simultaneously.
2.1. Whiteboard Interface: Animated Visual Feedback The whiteboard is a visual interface where animated graphics (GIFs) are displayed to represent the robot's "expressions." This interface leverages the LCD screens for a more engaging interaction experience. Key aspects include:

Dynamic Eye Movements: - Depending on the robot's emotional state or context, the eyes can blink, look left or right, and even display special animations (e.g., "amoureux" for love or "animal" for playful expressions). - The animations are displayed on Waveshare 1.28inch Round LCD modules, with GIFs or specific visuals representing the state.
Mouth Animations: - Along with eye movements, mouth visuals change to reflect emotions (e.g., smile, wide open). - These animations provide non-verbal feedback that complements voice responses.
Integration with Decisions: - The Decision class drives the updates on the whiteboard interface by selecting appropriate GIFs or animations based on user input and predefined responses.

Example Workflow:
- If the user says, "heureux," the robot's eyes will display the "star" GIF, and the mouth will show a "moving mouth."

2.2. Chat Interface: Voice Interaction and Transcription The chat interface provides real-time transcription of user speech and displays the robot's textual responses. It simulates a conversation log, making it easy for users to follow the interaction. Key components include:

Speech Recognition: - The Whisper model transcribes user speech into text, which is displayed in the chat. - Example: If the user says, "Bonjour, robot !", the chat log will show: User: Bonjour, robot !
Text-to-Speech Responses: - The robot generates a voice response using gTTS and simultaneously displays the response text in the chat. - Example: If the robot responds, "Bonjour, comment puis-je vous aider ?", the chat log will show: Robot: Bonjour, comment puis-je vous aider ?
- Seamless Integration with Decisions: - The Decision class matches the transcription to a predefined response and updates both the whiteboard and chat interfaces accordingly.

Decision-Making Class:
- The Decision class, using pre-programmed responses (e.g., greetings, commands), decides how the robot should respond based on the input.
- The get_response function processes the message, checks for greetings and keywords, and updates the robot’s movements and facial expressions accordingly.
Future Enhancements:
- Facial and gesture recognition using OpenCV and MediaPipe.
- Integration of the Raspberry Pi Camera for improved interaction.

V&V (Verification and Validation)

This section describes the testing strategy implemented to ensure proper functionality of the humanoid robot head.

1. Integration Testing

To perform integration testing:

Run the main.py script.
Ensure to adjust the device parameter to match your hardware setup. Replace "wlan0" with your specific device (e.g., "Wi-Fi") by modifying line 38 in the main script:
```
decision = Decision(device="wlan0", simulation_mode=simulation_mode)
```

2. Unit Testing

For unit testing:

Run the Python scripts for each module or file individually.
Adjust the device configuration in the agent initialization at the start of each script. Replace "Wi-Fi" with your specific device (e.g., "wlan0") as follows:
```
agent = RobotHead(device="Wi-Fi", simulation_mode=True)
```

3. Agent Testing

To evaluate the agent's performance:

Execute the test_robothead.igsscript script: This script contains predefined scenarios to test the agent’s behavior.

Project Goals

The main goals of this project are:

Real-time voice interaction: Provide a smooth, conversational interaction with the robot using voice commands.
Dynamic feedback: Display visual feedback through dynamic eye movements and animated facial expressions.
Extendable platform: Build a foundation for further features like facial recognition, gesture tracking, and more complex interactions.

Notes :

Check precommit errors : uv run pre-commit run -a ssh connection RaspberryPi : ...

For installation and setup, please refer to the setup_software.sh script. Follow its execution steps to prepare your environment.

Name		Name	Last commit message	Last commit date
Latest commit History 94 Commits
.github/workflows		.github/workflows
software		software
.gitattributes		.gitattributes
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Humanoid Robot Head with Real-Time Voice Interaction Using Ingescape

General Overview

Key Features

Hardware Setup

Software Setup

Key Software

System Architecture

Installation

How It Works

V&V (Verification and Validation)

1. Integration Testing

2. Unit Testing

3. Agent Testing

Project Goals

Notes :

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

License

acromtech/Voice_Driven_Humanoid_Head

Folders and files

Latest commit

History

Repository files navigation

Humanoid Robot Head with Real-Time Voice Interaction Using Ingescape

General Overview

Key Features

Hardware Setup

Software Setup

Key Software

System Architecture

Installation

How It Works

V&V (Verification and Validation)

1. Integration Testing

2. Unit Testing

3. Agent Testing

Project Goals

Notes :

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages