Skip to content

Humanoid robot head with real-time voice interaction, animated eyes, and head movement, powered by Ingescape and OpenAI Whisper on Raspberry Pi.

License

Notifications You must be signed in to change notification settings

acromtech/Voice_Driven_Humanoid_Head

Repository files navigation

Humanoid Robot Head with Real-Time Voice Interaction Using Ingescape

General Overview

This project, part of a distributed interaction course at UPSSITECH, aims to integrate Ingescape into a humanoid robot head's voice interaction interface. The head, designed in SolidWorks, incorporates 3D-printed components and multiple hardware and software integrations.

Key Features

  • Voice Interaction: Real-time response to vocal commands and questions using speech recognition, transcription and speech synthesis with gTTS (Google Text-To-Speech).
  • Dynamic Eye and Head Movements: Animated eyes on circular RGB LCD screens and head movements via MyActuator RMD-L-5005 Brushless Servomotors with CAN communication.
  • Real-Time Transcription with Whisper: Integration of OpenAI's Whisper for seamless audio transcription.
  • Future Development: Facial and gesture recognition via OpenCV and MediaPipe, leveraging a Raspberry Pi Camera Module v2 8MP.

Hardware Setup


Software Setup

To ensure seamless installation and configuration, use the provided setup_software.sh script. This script handles all necessary installations, including Python 3.11, Ingescape, and other dependencies.

Key Software

System Architecture

image

Installation

Please use the following command to install the dependencies :

sudo bash setup_laptop.sh

You can also modify all the parameters of the devices into the main.py file

# CONFIG RASPBERRY PI
"""
simulation_mode = False
device = "wlan0"
playback_device_name = "UACDemoV1.0"
sample_rate = 48000
speed_factor_tts = 1.15
recording_device_name = "USB PnP Sound Device"
mic_sample_rate = 44100
silence_threshold = 0.02
silence_duration = 0.5
"""

# YOUR LAPTOP CONFIG
simulation_mode = True
device = "wlo1"
playback_device_name = "UACDemoV1.0"
sample_rate = 48000
speed_factor_tts = 1.15
recording_device_name = "USB PnP Sound Device"
mic_sample_rate = 44100
silence_threshold = 0.02
silence_duration = 0.5

How It Works

The robot head is controlled by a Python-based system that listens to user input (voice commands) and responds with dynamic actions. The core functionalities are:

  1. Voice Interaction:

    • Whisper is used to transcribe speech in real-time.
    • gTTS is used to generate speech responses from text.
  2. Dynamic Eye and Mouth & Head Movements:

    • Eye and mouth animations are shown using Waveshare LCD displays and on the Whiteboard simultaneously.

    2.1. Whiteboard Interface: Animated Visual Feedback The whiteboard is a visual interface where animated graphics (GIFs) are displayed to represent the robot's "expressions." This interface leverages the LCD screens for a more engaging interaction experience. Key aspects include:

  • Dynamic Eye Movements: - Depending on the robot's emotional state or context, the eyes can blink, look left or right, and even display special animations (e.g., "amoureux" for love or "animal" for playful expressions). - The animations are displayed on Waveshare 1.28inch Round LCD modules, with GIFs or specific visuals representing the state.

  • Mouth Animations: - Along with eye movements, mouth visuals change to reflect emotions (e.g., smile, wide open). - These animations provide non-verbal feedback that complements voice responses.

  • Integration with Decisions: - The Decision class drives the updates on the whiteboard interface by selecting appropriate GIFs or animations based on user input and predefined responses.

    Example Workflow:

    • If the user says, "heureux," the robot's eyes will display the "star" GIF, and the mouth will show a "moving mouth." image

2.2. Chat Interface: Voice Interaction and Transcription The chat interface provides real-time transcription of user speech and displays the robot's textual responses. It simulates a conversation log, making it easy for users to follow the interaction. Key components include:

  • Speech Recognition: - The Whisper model transcribes user speech into text, which is displayed in the chat. - Example: If the user says, "Bonjour, robot !", the chat log will show: User: Bonjour, robot !

  • Text-to-Speech Responses: - The robot generates a voice response using gTTS and simultaneously displays the response text in the chat. - Example: If the robot responds, "Bonjour, comment puis-je vous aider ?", the chat log will show: Robot: Bonjour, comment puis-je vous aider ? image

    • Seamless Integration with Decisions: - The Decision class matches the transcription to a predefined response and updates both the whiteboard and chat interfaces accordingly.
  1. Decision-Making Class:

    • The Decision class, using pre-programmed responses (e.g., greetings, commands), decides how the robot should respond based on the input.
    • The get_response function processes the message, checks for greetings and keywords, and updates the robot’s movements and facial expressions accordingly.
  2. Future Enhancements:

    • Facial and gesture recognition using OpenCV and MediaPipe.
    • Integration of the Raspberry Pi Camera for improved interaction.

V&V (Verification and Validation)

This section describes the testing strategy implemented to ensure proper functionality of the humanoid robot head.

1. Integration Testing

To perform integration testing:

  • Run the main.py script.
  • Ensure to adjust the device parameter to match your hardware setup. Replace "wlan0" with your specific device (e.g., "Wi-Fi") by modifying line 38 in the main script:
    decision = Decision(device="wlan0", simulation_mode=simulation_mode)

2. Unit Testing

For unit testing:

  • Run the Python scripts for each module or file individually.
  • Adjust the device configuration in the agent initialization at the start of each script. Replace "Wi-Fi" with your specific device (e.g., "wlan0") as follows:
    agent = RobotHead(device="Wi-Fi", simulation_mode=True)

3. Agent Testing

To evaluate the agent's performance:

  • Execute the test_robothead.igsscript script: This script contains predefined scenarios to test the agent’s behavior.

Project Goals

The main goals of this project are:

  • Real-time voice interaction: Provide a smooth, conversational interaction with the robot using voice commands.
  • Dynamic feedback: Display visual feedback through dynamic eye movements and animated facial expressions.
  • Extendable platform: Build a foundation for further features like facial recognition, gesture tracking, and more complex interactions.

Notes :

Check precommit errors : uv run pre-commit run -a ssh connection RaspberryPi : ...

For installation and setup, please refer to the setup_software.sh script. Follow its execution steps to prepare your environment.

About

Humanoid robot head with real-time voice interaction, animated eyes, and head movement, powered by Ingescape and OpenAI Whisper on Raspberry Pi.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •