Skip to content

Small university project for experimentally training and evaluating reinforcement learning agents in OpenAI Gym environments using DQN. Trained DQN agent wins entire 21-round Atari Pong game. Includes CartPole, Acrobot,, etc.

Notifications You must be signed in to change notification settings

University-Project-Repos/ReinforcementLearningGym

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Reinforcement Learning Gym

Train and evaluate reinforcement learning agents in OpenAI Gym environments using Deep Q-learning (DQN).

Pong-v5 graph Pong-v5 graph

Environments

The following environments are implemented for training and evaluating:

Classic Control

  • CartPole
  • MountainCar
  • Acrobot

Atari

  • Pong
  • Breakout

more can be added, and hyperparameters can be tuned in config.py

Trained Models

The following are trained models included in this repository for evaluation:

Classic Control

  • CartPole-v1
  • Acrobot-v1

Atari

  • Pong-v0
  • Pong-v5

Each trained model can be found in the /models directory.

Demonstration videos of trained models can be found in the OpenAI Gym YouTube playlist.

Setup

For training and evaluating the OpenAI Gym environments with a GPU, the following setup has been used:

  • Anaconda 2.11
  • CUDA 11.3
  • Python 3.10
  • PyTorch 1.11
  • OpenAI gym 0.24

Install

pip install -r requirements.txt

Run

CartPole

CartPole-v1 video

CartPole-v0

Train

python src/train.py --env CartPole-v0 --eval_freq <int> --num_eval_episodes <int>

Evaluate

python src/evaluate.py --env CartPole-v0 --num_eval_episodes <int> --is_render --is_record

CartPole-v1

Train

python src/train.py --env CartPole-v1 --eval_freq <int> --num_eval_episodes <int>

Evaluate

python src/evaluate.py --env CartPole-v1 --num_eval_episodes <int> --is_render --is_record

Acrobot

Acrobot-v1 video

Acrobot-v1

Train

python src/train.py --env Acrobot-v1 --eval_freq <int> --num_eval_episodes <int>

Evaluate

python src/evaluate.py --env Acrobot-v1 --num_eval_episodes <int> --is_render --is_record

MountainCar

MountainCar-v0

Train

python src/train.py --env MountainCar-v0 --eval_freq <int> --num_eval_episodes <int>

Evaluate

python src/evaluate.py --env MountainCar-v0 --num_eval_episodes <int> --is_render --is_record

Pong

Pong-v5 video

Pong-v0

Train

python src/train.py --env Pong-v0 --eval_freq <int> --num_eval_episodes <int>

Evaluate

python src/evaluate.py --env Pong-v0 --num_eval_episodes <int> --is_render --is_record

Pong-v4

Train

python src/train.py --env Pong-v4 --eval_freq <int> --num_eval_episodes <int>

Evaluate

python src/evaluate.py --env Pong-v4 --num_eval_episodes <int> --is_render --is_record

Pong-v5

Train

python src/train.py --env ALE/Pong-v5 --eval_freq <int> --num_eval_episodes <int>

Evaluate

python src/evaluate.py --env ALE/Pong-v5 --num_eval_episodes <int> --is_render --is_record

Breakout

Breakout-v0

Train

python src/train.py --env Breakout-v0 --eval_freq <int> --num_eval_episodes <int>

Evaluate

python src/evaluate.py --env Breakout-v0 --num_eval_episodes <int> --is_render --is_record

Breakout-v4

Train

python src/train.py --env Breakout-v4 --eval_freq <int> --num_eval_episodes <int>

Evaluate

python src/evaluate.py --env Breakout-v4 --num_eval_episodes <int> --is_render --is_record

Breakout-v5

Train

python src/train.py --env ALE/Breakout-v5 --eval_freq <int> --num_eval_episodes <int>

Evaluate

python src/evaluate.py --env ALE/Breakout-v5 --num_eval_episodes <int> --is_render --is_record