Deep Reinforcement Learning with OpenAI Gym (CartPole-v0)
Hanifi Enes Gül 115200085
Atakan Kaya 115200074
İbrahim Doğan 11512112
gym==0.12.1
Keras==2.2.4
matplotlib==3.0.3
numpy==1.16.3
tensorflow==1.13.1
This Project is a Deep Q-Reinforcement Learning Project. the purpose of this project: without taking any human data, only actions that can be taken and the target specified model; to ensure that it reaches the target successfully. In this project, to ensure that the CartPole-v0 game selected from the OpenAI (GYM) library is successfully played in a way to achieve high scores. Simply we are building an autonomous agent that can successfully play the game.
There are three different sub-area of Machine Learning, which are Supervised Learning, Unsupervised Learning and Reinforcement Learning (RL). Reinforcement Learning technique will be implemented in this project. (Reinforcement Learning is the most academic paper written category in Machine Learning [2018]).
In this project, Reinforcement Learning methodology will be used with OpenAI Gym toolkit and Keras. Combination of RL and Neural Networks is called Deep Reinforcement Learning.
We create an agent which performs “actions” in an “environment” and the agent receives various “rewards” depending on what “state” it is in when it performs the action.
As it is already mentioned above, the game that selected from OpenAI Gym Toolkit is CartPole-v0. This game will be considered as our “environment”.
Environment (CartPole-v0) comes with its own rules and actions, observations etc. which are shown in next page in detail.
Source: https://github.com/openai/gym/wiki/CartPole-v0
As can be seen below picture, the agent plays out some activity in the environment. A mediator sees this action in the environment, and feeds back a refreshed state that the agent currently lives in, and furthermore the reward for making this move. The environment isn't known by the agent previously, but instead it is found by the agent making steady strides in time. In this way, for example, at time t the agent, in state st, may make a move a. This outcomes in another state st+1 and a reward r. This reward can be a positive real number, zero, or a negative real number. (In this environment negative real number reward is not included.) It is the objective of the agent to realize which state subordinate action to take which boosts its rewards.
As it is mentioned, one of the aim is creating neural network that can perform Q-Learning. Current state is a must as an input. Q values will bi created for each action in that specific state with respect to Bellman Equation.
Our implementation is the highlighted part below:
We build our model with this code block:
Keras has built-in model.png generator:
When we run this the output is:
To make it more understandable we illustrated it as:
-
Each episode runs until the environment sets the done=True.
-
We keep score values to be able to see the success of episode.
-
After applying epsilongreedy method we take step from environment and get reward.
-
We add reward to score.
-
We use memory to feed model regard to randomness with batchsize.
-
Q-Learning part with bellman equation applied and fitted to model.
-
Lastly, to check the accuracy of the agent we printed and plotted some values.
As we can see in the graph, when episode number increases our agent’s success also increased.
We tried to implement deep reinforcement learning using Keras to OpenAI’s Gym/CartPole-v0 environment. With doing some dynamic changes to code it can also work on other environments such as MountainCar-v0 etc. We commented one-hot encoding part of the code, with using correct sizes other neural networks can be also implemented such as TFLearn, DNN.
We get success in early steps nearly 40th episode. It can be improved, but it is quite successful.