This repository contains the code for the paper "Efficient Reinforcement Learning via Large Language Model-based Search". The code is based on the PPO and A2C implementation on Minigrid environments from the rl-starter-files repository.
conda create -n llm_modulo_sparse_rl python=3.10 --file requirements.yml- 
Training baseline models (PPO or A2C)
python3 -m scripts.train --algo ppo --env MiniGrid-DoorKey-5x5-v0 --model DoorKey --save-interval 10 --frames 80000
 - 
Visualize agent's behavior
python3 -m scripts.visualize --env MiniGrid-DoorKey-5x5-v0 --model DoorKey
 - 
Evaluate agent's performance
python3 -m scripts.evaluate --env MiniGrid-DoorKey-5x5-v0 --model DoorKey
 
Note, that you will have to use an OpenAI API key to run the experiments. The following can be run for one layout of the DoorKey-5x5 environment. Please change the --env and --model arguments to run the experiments on other environments. See bash scripts in the scripts directory for more examples.
- 
Obtaining a plan for the relaxed problem:
./scripts/bash_scripts/run_llm_modulo_policy.sh
 - 
Construct and store the reward shaping function:
./scripts/bash_scripts/run_store_reward_shaping_policy.sh
 - 
Training the agent with the reward shaping function:
./scripts/bash_scripts/run_reward_shaped_rl_training.sh