This repository contains the code for the paper titled "End-to-End Robot Task Planning from Transcriptions of Voice Commands".
One of the primary challenges in building a General Purpose Service Robot (GPSR), a robot capable of executing generic human commands, lies in understanding natural language instructions. These instructions often contain speech recognition errors and incomplete information, complicating the extraction of clear goals and the formulation of an efficient action plan. This work presents an end-to-end pipeline that leverages a Large Language Model to directly translate instruction transcripts into coherent action plans. Furthermore, the pipeline integrates environmental context into the model’s input, allowing for the generation of more efficient and context-aware plans. The system’s performance was evaluated using a simulator based on Generalized Stochastic Petri Nets, and the entire pipeline was successfully deployed at RoboCup 2024 in Eindhoven, where it secured second place in the GPSR task.
This repository contains the code for generating the dataset used in the paper, as well as the code for the ROS node used to implement the pipeline.
The model and dataset are available on Hugging Face's hub: certafonso/Phi-3-GPSR and certafonso/gpsr-dataset.
- ROS version: Noetic
- Dependencies:
Follow the installation instructions in the socrob_speech_msgs and socrob_planning_msgs repositories.
cd ~/<your_workspace>/src
git clone https://github.com/socrob/llm_gpsr.git
Navigate to your catkin workspace and build the package:
cd ~/<your_workspace>
catkin build
After building, source the workspace to update the environment:
source ~/<your_workspace>/devel/setup.bash
To use the LLM planner with ROS, you can either run the model locally or run the model in an external server. In either case you will launch a ROS node that listens to the instructions on a topic that you specify (with the type ASRNBestList) and outputs a series of actions to the ~actions
topic as action_msg messages. The node will also create a parameter on the ROS server with the name ~ready
that will be set to True
when the model is ready to receive instructions.
You can launch the LLM locally by running the following command:
roslaunch llm_gpsr llm_node.launch
The launch file has the following arguments:
base_model_id
: The Hugging Face ID or path of the base model to be used. If you want to use the default model you can use it as default.peft_model_id
: The Hugging Face ID or path of the LoRA adapter model. You should download the model from the Hugging Face Hub and specify here the path to the model.load_models
: A boolean flag indicating whether to load the models on startup.instruction_topic
: The topic where new instructions will appear.venv
: The virtual environment used to run the node.
You can opt to run the LLM in an external server to save resources on the robot. To do so, you need to run the following command on the external server:
cd src/llm_gpsr_ros/llm_node
python3 llm_server.py
The code for this server is independent of ROS, so you can run it in any environment with Python. Make sure to have the necessary dependencies installed in the server (see the readme in the llm_gpsr_ros/llm_node folder).
To launch the ROS node that connects to the server, you can run the following command:
roslaunch llm_gpsr llm_client.launch
Which has the following arguments:
server_url
: The URL of the external server where the LLM is running.instruction_topic
: The topic where new instructions will appear.
For instructions in generating the dataset see the README on the src/llm_gpsr_ros/dataset
folder.