The TIAGo Speech Recognition package is responsible for enabling speech recognition capabilities on the TIAGo robot. This package uses advanced speech recognition models to interpret user instructions and convert them into actionable commands for the robot.
- Advanced Speech Recognition: Utilizes state-of-the-art models to accurately transcribe spoken language into text.
- Configurable Search Algorithms: Supports various search algorithms like beam search and diverse beam search for improved recognition accuracy.
- Error Handling: Includes mechanisms to handle common ASR (Automatic Speech Recognition) errors.
- ROS Integration: Seamlessly integrates with ROS, allowing easy communication with other ROS nodes.
- ROS version: Noetic
- Dependencies:
Follow the installation instructions in the socrob_speech_msgs. Then install the audio_common
package with:
sudo apt-get install ros-noetic-audio-common
cd ~/<your_workspace>/src
git clone https://github.com/certafonso/tiago_speech_recognition.git
Navigate to the cloned repository and install the required dependencies:
cd tiago_speech_recognition
pip install -r requirements.txt
Navigate to your catkin workspace and build the package:
cd ~/<your_workspace>
catkin build
After building, source the workspace to update the environment:
source ~/<your_workspace>/devel/setup.bash
To launch only the speech recognition node, use the following command:
roslaunch tiago_speech_recognition_ros speech_recognition_node.launch
The launch file speech_recognition_node.launch
accepts several arguments to customize the behavior of the speech recognition node:
silence_level
: The level of silence used to determine when to stop recording. Default is300
.energy_threshold_ratio
: This parameter defines the ratio used to determine the energy threshold for speech detection. For example, if thesilence_level
is set to 100 and theenergy_threshold_ratio
is 1.5, the recording will stop when the energy level drops below 150 (i.e., 100 * 1.5). The default value is1.5
.model
: The ID of the speech recognition model from the Hugging Face hub. Default isopenai/whisper-small.en
.save_wav
: If set totrue
, the system will save debug WAV files. Default isfalse
.node_name
: The name of the speech recognition node. Default istiago_speech_recognition
.transcript_topic
: The topic name where the transcribed text will be published. Default is~transcript
.audio_topic
: The topic name where the audio data will be published. Default is/microphone_node/audio
.generation_config
: Path to the ASR generation configuration file. Defaults to beam search.
These arguments allow you to fine-tune the speech recognition node's behavior to match your specific requirements and environment.
To launch the entire SocRob speech pipeline use:
roslaunch tiago_speech_recognition_ros tiago_speech_recognition.launch
This will launch the speech recognition node, the microphone node and keyword recognition node.
The launch file tiago_speech_recognition.launch
accepts several arguments to customize the behavior of the speech recognition system:
microphone_device
: Specifies the microphone device to be used. Default is"default"
which will use the default microphone of the computer.launch_mic
: Determines whether to launch the microphone node. Default istrue
.launch_keyword
: Determines whether to launch the keyword detection node. Default istrue
.save_wav
: If set totrue
, the system will save debug WAV files. Default istrue
.asr_node_name
: The name of the ASR (Automatic Speech Recognition) node. Default is"tiago_speech_recognition"
.ASR_generation_config
: Path to the ASR generation configuration file. Defaults to beam search.transcript_topic
: The topic name where the transcribed text will be published. Default is"/tiago_speech_recognition/transcript"
.audio_topic
: The topic name where the audio data will be published. Default is"/microphone_node/audio"
.
These arguments allow you to tailor the speech recognition setup to your specific needs and hardware configuration.
The decoding method for the transformer can be personalized with custom YAML files. This corresponds to setting keyword arguments in hugging face's model.generate
method.
The following config files are included in the config
folder:
beam_search.yaml
: Configuration for beam search algorithm.diverse_beam_search.yaml
: Configuration for diverse beam search algorithm.
When the node is initialized it will load a list of common ASR errors loaded in the common_asr_errors.yaml
file in the config folder. This information will be published in two parameters so it can be used in upstream modules to potentially correct these errors:
tiago_speech_recognition/common_asr_errors
: Will contain a dictionary mapping each word to every defined possible misspleling. See example:
pear: [bear]
crisps: [crisp]
pringles: [pringle]
tictac: [tic tac]
tiago_speech_recognition/common_asr_errors_categorized
: Will contain the full data present in thecommon_asr_errors.yaml
file, which is the same as thecommon_asr_errors
, but divided into categories. See example:
fruits:
pear: [bear]
snacks:
crisps: [crisp]
pringles: [pringle]
tictac: [tic tac]