End-to-End Robot Task Planning from Transcriptions of Voice Commands

This repository contains the code for the paper titled "End-to-End Robot Task Planning from Transcriptions of Voice Commands".

Introduction

One of the primary challenges in building a General Purpose Service Robot (GPSR), a robot capable of executing generic human commands, lies in understanding natural language instructions. These instructions often contain speech recognition errors and incomplete information, complicating the extraction of clear goals and the formulation of an efficient action plan. This work presents an end-to-end pipeline that leverages a Large Language Model to directly translate instruction transcripts into coherent action plans. Furthermore, the pipeline integrates environmental context into the model’s input, allowing for the generation of more efficient and context-aware plans. The system’s performance was evaluated using a simulator based on Generalized Stochastic Petri Nets, and the entire pipeline was successfully deployed at RoboCup 2024 in Eindhoven, where it secured second place in the GPSR task.

This repository contains the code for generating the dataset used in the paper, as well as the code for the ROS node used to implement the pipeline.

The model and dataset are available on Hugging Face's hub: certafonso/Phi-3-GPSR and certafonso/gpsr-dataset.

Requirements

ROS version: Noetic
Dependencies:
- socrob_speech_msgs
- socrob_planning_msgs

Installation

0. Install the message modules

Follow the installation instructions in the socrob_speech_msgs and socrob_planning_msgs repositories.

1. Clone the repository

cd ~/<your_workspace>/src
git clone https://github.com/socrob/llm_gpsr.git

2. Build the workspace

Navigate to your catkin workspace and build the package:

cd ~/<your_workspace>
catkin build

3. Source the setup file

After building, source the workspace to update the environment:

source ~/<your_workspace>/devel/setup.bash

Usage

Use the LLM planner with ROS

To use the LLM planner with ROS, you can either run the model locally or run the model in an external server. In either case you will launch a ROS node that listens to the instructions on a topic that you specify (with the type ASRNBestList) and outputs a series of actions to the ~actions topic as action_msg messages. The node will also create a parameter on the ROS server with the name ~ready that will be set to True when the model is ready to receive instructions.

Running the model locally

You can launch the LLM locally by running the following command:

roslaunch llm_gpsr llm_node.launch

The launch file has the following arguments:

base_model_id: The Hugging Face ID or path of the base model to be used. If you want to use the default model you can use it as default.
peft_model_id: The Hugging Face ID or path of the LoRA adapter model. You should download the model from the Hugging Face Hub and specify here the path to the model.
load_models: A boolean flag indicating whether to load the models on startup.
instruction_topic: The topic where new instructions will appear.
venv: The virtual environment used to run the node.

Running the model in an external server

You can opt to run the LLM in an external server to save resources on the robot. To do so, you need to run the following command on the external server:

cd src/llm_gpsr_ros/llm_node
python3 llm_server.py

The code for this server is independent of ROS, so you can run it in any environment with Python. Make sure to have the necessary dependencies installed in the server (see the readme in the llm_gpsr_ros/llm_node folder).

To launch the ROS node that connects to the server, you can run the following command:

roslaunch llm_gpsr llm_client.launch

Which has the following arguments:

server_url: The URL of the external server where the LLM is running.
instruction_topic: The topic where new instructions will appear.

Generate the dataset

For instructions in generating the dataset see the README on the src/llm_gpsr_ros/dataset folder.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
launch		launch
resources		resources
src/llm_gpsr_ros		src/llm_gpsr_ros
.gitignore		.gitignore
CMakeLists.txt		CMakeLists.txt
README.md		README.md
package.xml		package.xml
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

End-to-End Robot Task Planning from Transcriptions of Voice Commands

Table of Contents

Introduction

Requirements

Installation

0. Install the message modules

1. Clone the repository

2. Build the workspace

3. Source the setup file

Usage

Use the LLM planner with ROS

Running the model locally

Running the model in an external server

Generate the dataset

About

Uh oh!

Releases 1

Packages

Languages

socrob/llm_gpsr

Folders and files

Latest commit

History

Repository files navigation

End-to-End Robot Task Planning from Transcriptions of Voice Commands

Table of Contents

Introduction

Requirements

Installation

0. Install the message modules

1. Clone the repository

2. Build the workspace

3. Source the setup file

Usage

Use the LLM planner with ROS

Running the model locally

Running the model in an external server

Generate the dataset

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Languages

Packages