Skip to content

The demo for ICRA 2025 paper Goal-Guided Reinforcement Learning: Leverging Large Language Models for Long-Horizon Task Decomposition

Notifications You must be signed in to change notification settings

ChirikjianLab/LLMRL

Repository files navigation

Goal-Guided Reinforcement Learning: Leverging Large Language Models for Long-Horizon Task Decomposition

This is a demo implementation for ICRA 2025 paper Goal-Guided Reinforcement Learning: Leverging Large Language Models for Long-Horizon Task Decomposition. LLMRL utilizes LLMs to decompose complex long-horizon tasks into subgoals and generate policies to provide guidence for goal-based exploration and training speedup.

* The task environment is based on TWOSOME and ROMAN, the vanilla PPO implementation is based on PPO-Pytorch for which we modified to fit the proposed framework.

drawing

Environment Installation

The code has been tested on Ubuntu 20.04, it should also work for other distributions. Simply the environment and dependencies can be installed by running the following commands:

git clone https://github.com/ChirikjianLab/LLMRL.git
cd LLMRL/
conda create -n LLMRL python=3.9
conda activate LLMRL
pip install -r requirements.txt

QuickStart

For running LLM-guided RL

export OPENAI_API_KEY=#YOUR_OPENAI_API_KEY#
python LLMRL/train_llm.py

* The state-goal-action triplets are cached in tras_dic_VirtualHome-v*.json, consider removing these files or commenting out related code if you would like to generate the subgoals and policies from scratch.

For running baseline RL

python LLMRL/train.py

Evaluation

After training is completed, the training logs and trained models are saved in directory /logs and /models, to visualize the learning curves, please install rl_plotter by pip install rl_plotter and run command:

cd logs/VirtualHome-v*
rl_plotter --show --avg_group --shaded_std --filename result --filters LLMPPO PPO --style default --smooth 7 --xlabel timestep --xkey timestep --ylabel reward --ykey reward

About

The demo for ICRA 2025 paper Goal-Guided Reinforcement Learning: Leverging Large Language Models for Long-Horizon Task Decomposition

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published