This project contains the implementation of various deep reinforcement learning algorithms. Four algorithms have already been implemented :
- REINFORCE 1
- Deep Q-Network (DQN) 2
- Double Duelling Deep Q-Network (3DQN) 3 4
- Proximal Policy Optimization (PPO) 5
- Twin Delayed Deep Deterministic Policy Gradient (TD3) 6
Here are some Gym environments that have been solved (or nearly solved) using the implemented algorithms.
![]() |
![]() |
---|---|
Acrobot-v1 with 3DQN | CartPole-v1 with REINFORCE |
![]() |
![]() |
---|---|
LunarLanderContinuous-v2 with TD3 | BipedalWalker-v3 with TD3 |
![]() |
![]() |
---|---|
ALE/Pong-v5 with 3DQN | ALE/Breakout-v5 with 3DQN |
![]() |
![]() |
ALE/BeamRider-v5 with PPO | ALE/SpaceInvaders-v5 with PPO |
Special thanks to fg91 and vwxyzjn whose work helped me achieve the results presented with the Atari environments.
This project uses Python 3.10.12. Use the package manager pip to install the dependencies :
pip install -r requirements.txt
The files that can be executed are train.py
, test.py
and mp4_to_gif.py
. train.py
trains a DRL agent using a specified algorithm on a sepcified Gym environment. The trained neural network used by the agent are saved in the files/
directory. An image displaying the sum of rewards for each episode as well as the average sum of rewards for the last 100 episodes is also saved in the images/
directory. The parameters of train.py
are :
-
-a
/--algorithm
: specify the algorithm to use. Accepted string are3DQN
,PPO
, andTD3
. -
-m
/--module
: the Gym environment to train the algorithm on.
test.py
displays an episode of a specified environment using an agent previously trained with a specified algorithm. -a
and -m
can be used with the same purposes as for train.py
. If the --save
parameter is used, the episode is not displayed but saved in the rgb_array/
directory instead. mp4_to_gif.py
can then be used to convert the video in a gif file and to save it in the images/
directory. The -a
and -m
parameters are used by mp4_to_gif.py
to name the GIF file.
Warning
PPO only works with discrete action spaces for now.
Footnotes
-
Sutton, Richard S., et al. "Policy gradient methods for reinforcement learning with function approximation." Advances in neural information processing systems 12 (1999). ↩
-
Mnih, Volodymyr, et al. "Human-level control through deep reinforcement learning." nature 518.7540 (2015): 529-533. ↩
-
Van Hasselt, Hado, Arthur Guez, and David Silver. "Deep reinforcement learning with double q-learning." Proceedings of the AAAI conference on artificial intelligence. Vol. 30. No. 1. 2016. ↩
-
Wang, Ziyu, et al. "Dueling network architectures for deep reinforcement learning." International conference on machine learning. PMLR, 2016. ↩
-
Schulman, John, et al. "Proximal policy optimization algorithms." arXiv preprint arXiv:1707.06347 (2017). ↩
-
Fujimoto, Scott, Herke Hoof, and David Meger. "Addressing function approximation error in actor-critic methods." International conference on machine learning. PMLR, 2018. ↩