Goal-Guided Reinforcement Learning: Leverging Large Language Models for Long-Horizon Task Decomposition
This is a demo implementation for ICRA 2025 paper Goal-Guided Reinforcement Learning: Leverging Large Language Models for Long-Horizon Task Decomposition. LLMRL utilizes LLMs to decompose complex long-horizon tasks into subgoals and generate policies to provide guidence for goal-based exploration and training speedup.
* The task environment is based on TWOSOME and ROMAN, the vanilla PPO implementation is based on PPO-Pytorch for which we modified to fit the proposed framework.
The code has been tested on Ubuntu 20.04, it should also work for other distributions. Simply the environment and dependencies can be installed by running the following commands:
git clone https://github.com/ChirikjianLab/LLMRL.git
cd LLMRL/
conda create -n LLMRL python=3.9
conda activate LLMRL
pip install -r requirements.txt
export OPENAI_API_KEY=#YOUR_OPENAI_API_KEY#
python LLMRL/train_llm.py
* The state-goal-action triplets are cached in tras_dic_VirtualHome-v*.json
, consider removing these files or commenting out related code if you would like to generate the subgoals and policies from scratch.
python LLMRL/train.py
After training is completed, the training logs and trained models are saved in directory /logs
and /models
, to visualize the learning curves, please install rl_plotter by pip install rl_plotter
and run command:
cd logs/VirtualHome-v*
rl_plotter --show --avg_group --shaded_std --filename result --filters LLMPPO PPO --style default --smooth 7 --xlabel timestep --xkey timestep --ylabel reward --ykey reward