!MOST OF THE CODE WAS NOT PRODUCED BY US!
As specified in the project proposal, we have based the implementation of the Hierarchical Dreamer (HDreamer) based on publically available DreamerV3 implementation the original repository. The repository's author is Naoki Morihira who re-created the DreamerV3 (link to the paper) for pytorch.
HDreamer is a hierarchical extension of the DreamerV3 architecture for model-based reinforcement learning. Inspired by HQ-VAE, HDreamer introduces multiple levels of discrete latent variables to disentangle low-level sensory information from high-level semantic structure. This aims to mitigate codebook collapse - a failure mode in VQ-based models where the discrete latent space is underutilized due to entangled representations
The following files/classes/function were created/modified by us:
HierarchicalRSSM
innetworks.py
. The main hierarchical components (sequence model) for the HDreamer. We also made modifications toMultiEncoder
andMLP
forward passes to make HDreamer work../envs/pendulum.py
. The inverted double pendulum adaption to the dreamer's code. (Mujoco InvertedDoublePendulum-v5)./envs/minigrid.py
. The minigirid adaption to dreamer's code. (MiniGrid-Unlock-v0)- HDreamers' architectures specification in
configs.yaml
./test_masked_eval.py
. Ablation study, testing the reward return difference./eval*
Files for evaluation HDreamer experiments/training./togif.py
Showcase HDreamer perfomance via GIF- Code in
./dreamer.py
that enables launching multiple dreamer experiments p*.sh
,minigrid*.sh
andglobal_config.py
auxillary files.
Only code produced by us is properly documented and commented. More code was created (such as "Gumbel-Softmax differentiable multi-hot-encoded distribution" or "Global hidden state aligner") but was eventually discarded due to time constraints.
dreamer.py
main dreamer file containing the dreamer model and the code for launching training experimentsmodels.py
contains the world model (sequence model, reward and continue predictor) and the imaginary behaviour (actor and critic)networks.py
all the NN architectures (sequence model, hierarchical sequence model, encoder/decoder, MLPs/CNNs)./logdir/
folder please initialize this folder yourself and download the zip at this link containing the final checkpoints and unzip it hereconfigs.yaml
The individual dreamer architectures reside there (ours are minigrid* and pendulum_small*)./envs/
contains all the available environments (ours areminigrid.py
andpendulum.py
)eval.py
used for evaluating the trained dreamers and plotting graphseval_training.ipynb
graphs for trainingeval_dreamer.ipynb
graphs for evaluation and testing the statisiticall differencetools.py
utils code, most important contains tools.simulate function which is used to run the dreamers for N episodestest_masked_eval.py
anddecoder_eval.py
used for the ablation studies (decoder for testing the qualitative difference and eval for quantitative reward return difference)global_config.py
contains global flags for tracking the codebook usage and ablation masksreconstruct_minigrid_final.py
function that converts minigrid observation to an imagetogif.py
render Dreamer's evaluation perfomance into a gif./presentation/
contains figures for report and the presentation
Get dependencies with python 3.11 (make vritual env):
pip install -r requirements.txt
We did not want to force a particular Pytorch installation so install the pytorch version you need. We used:
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu126
if you install non-cuda capable installation then you need to go to configs.yaml
and change device to "cpu" (in the default settings)
To launch 8 training experiments run:
python dreamer.py --config pendulum[hnumber]_small
where [hnumber] corresponds to the HDreamer level you want (1 - vanilla, 2 and 3 possible). For Minigrid
python dreamer.py --config minigrid[hnumber]
The checkpoint with metric will appear in ./logdir
To run the evaluation on our pretrained models first download the checkpoints here (university account needed)
Place the folders to ./logdir/*
and then you can launch:
python eval.py
if you want to eval different environment go to eval.py
and change the env
variable on the bottom to either idp
or minigrid
. The results will apear in ./data/eval_[env].csv
, you can use the jupyter notebooks to analyze them.
To run the ablations on the same models first download the the checkpoints here (university account needed)
Place the folders to ./models/*
and then you can launch either the qualitative (decoder reconstruction) ablation using:
python decoder_eval.py --config [config_name] --checkpoint_dir [path_to_the_checkpoint_folder]
or the quantitative ablation (reward difference) using
python test_masked_eval.py.py --config [config_name] --checkpoint_dir [path_to_the_checkpoint_folder]