This code reproduces results from the paper:
Stephen Kelly, Tatiana Voegerl, Wolfgang Banzhaf, and Cedric Gondro. Evolving Hierarchical Memory-Prediction Machines in Multi-Task Reinforcement Learning. Genetic Programming and Evolvable Machines, 2021. pdf
For the latest versions of this project, please visit Creative Algorithms Lab's GitLab page.
This code is designed to be used in Linux. If you use Windows, you can use Windows Subsystem for Linux (WSL). You can work with WSL in Visual Studio Code by following this tutorial.
From the tpg directory run:
sudo xargs --arg-file requirements.txt apt install
Note that MuJoco must be downloaded and unpacked separately.
In order to easily access tpg scripts, we add appropriate folders to the $PATH environment variable. To do so, add the following to ~/.profile
export TPG=<YOUR_PATH_HERE>/tpg
export PATH=$PATH:$TPG/scripts/plot
export PATH=$PATH:$TPG/scripts/run
export MUJOCO=<YOUR_PATH_TO_MUJOCO>/mujoco-3.2.2
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$MUJOCO/lib/
Then run:
source ~/.profile
From the tpg directory run:
scons --opt
The folder tpg/experiment_directories/classic_control contains scripts to evolve policies for classic control tasks. Parameters are set in parameters.txt. The default settings will evolve a policy for the CartPole task.
To run an experiment using 4 parallel MPI processes, make tpg/experiment_directories/classic_control your working directory and run:
tpg-run-mpi.sh -n 4
Note that as of right now, the number of assigned processes must be greater than the number of active tasks.
Generate classic_control_p0.pdf with various statistics:
tpg-plot-stats.sh
The first page will be a training curve looking something like the plot below. A fitness of 500 indicates the agent balances the pole for 500 timesteps, thus solving the task.
Display an OpenGL animation of the single best policy interacting with the environment:
tpg-run-mpi.sh -m 1
Delete all checkpoints and output files:
tpg-cleanup.sh