Go-Explore, exploration in Montezuma's Revenge

Filippo Ansalone, Reinforcement Learning 2023

Reimplementation of https://arxiv.org/abs/1901.10995
Environment: Montezuma's Revenge https://www.gymlibrary.dev/environments/atari/montezuma_revenge/
Task: first build trajectories by randomly exploring a deterministic environment and then learn imitating the best trajectory.

Installation

To install the needed packages:

pip install -r requirements.txt

Phase 1

To run the phase 1 of the algorithm just execute:

python main.py --phase1

By default, the algorithm samples cells and calculates scores as shown in the paper. If you want to sample cells with uniform probabilities, you can specify:

python main.py --phase1 --sameprob

This process will produce the best trajectory found for each checkpoint (points where a new reward is collected) as files named best_trajectory_rew_dist.npy.

Test phase 1

To test a trajectory produced by the phase 1:

python main.py --test1 --trajectory <string>

You can optionally decide whether to render or not:

python main.py --test1 --trajectory <string> --render

Phase 2

To run the phase 2:

python main.py --phase2 --trajectory <string>

Additionally, you can add the following arguments (optionally):

python main.py --phase2 --startpoint <int> --maxtimesteps <int> --patience <int>

trajectory: filename of the trajectory to imitate
startpoint: starting point from where the robustification, if not specified is length of the trajectory - 10
maxtimesteps: the maximum amount of timesteps for each robustification iteration
patience: if set to -1, the behavior remains unchanged. If set to a positive scalar, the maximum number of iterations without any improvement at a specific point in the trajectory; if this limit is reached, then the algorithm is repeated from the previous point of the trajectory.

The process will terminate once the policy is well performing from the initial state of the environment. As the paper states, the phase 2 is not guaranteed to converge both to a solution or an optimal solution for a given trajectory

Test phase 2

To test a policy produced by the phase 2:

python main.py --test2

If you renamed the produced policy:

python main.py --test2 --policy <string>

If you want to start the simulation from a certain point in a trajectory you need to specify:

python main.py --test2 --trajectory <string> --startpoint <int>

You can optionally add the same render flag as test1.

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
README.md		README.md
environment.py		environment.py
explore.py		explore.py
main.py		main.py
policy.py		policy.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Go-Explore, exploration in Montezuma's Revenge

Filippo Ansalone, Reinforcement Learning 2023

Installation

Phase 1

Test phase 1

Phase 2

Test phase 2

About

Uh oh!

Releases

Packages

Uh oh!

Languages

Filippo29/Go-Explore

Folders and files

Latest commit

History

Repository files navigation

Go-Explore, exploration in Montezuma's Revenge

Filippo Ansalone, Reinforcement Learning 2023

Installation

Phase 1

Test phase 1

Phase 2

Test phase 2

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages