Skip to content

ibrahim-dogan/cmpe373-term-project

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Project Title:

Deep Reinforcement Learning with OpenAI Gym (CartPole-v0)

Students:

Hanifi Enes Gül 115200085

Atakan Kaya 115200074

İbrahim Doğan 11512112

Dependencies (check requirements.txt)

gym==0.12.1

Keras==2.2.4

matplotlib==3.0.3

numpy==1.16.3

tensorflow==1.13.1

Abstract

This Project is a Deep Q-Reinforcement Learning Project. the purpose of this project: without taking any human data, only actions that can be taken and the target specified model; to ensure that it reaches the target successfully. In this project, to ensure that the CartPole-v0 game selected from the OpenAI (GYM) library is successfully played in a way to achieve high scores. Simply we are building an autonomous agent that can successfully play the game.

Introduction

There are three different sub-area of Machine Learning, which are Supervised Learning, Unsupervised Learning and Reinforcement Learning (RL). Reinforcement Learning technique will be implemented in this project. (Reinforcement Learning is the most academic paper written category in Machine Learning [2018]).

In this project, Reinforcement Learning methodology will be used with OpenAI Gym toolkit and Keras. Combination of RL and Neural Networks is called Deep Reinforcement Learning.

Environment

We create an agent which performs “actions” in an “environment” and the agent receives various “rewards” depending on what “state” it is in when it performs the action.

As it is already mentioned above, the game that selected from OpenAI Gym Toolkit is CartPole-v0. This game will be considered as our “environment”.

Environment (CartPole-v0) comes with its own rules and actions, observations etc. which are shown in next page in detail.

cartpole ile ilgili görsel sonucu

Documentation of Environment

Source: https://github.com/openai/gym/wiki/CartPole-v0

Reinforcement Learning

As can be seen below picture, the agent plays out some activity in the environment. A mediator sees this action in the environment, and feeds back a refreshed state that the agent currently lives in, and furthermore the reward for making this move. The environment isn't known by the agent previously, but instead it is found by the agent making steady strides in time. In this way, for example, at time t the agent, in state st, may make a move a. This outcomes in another state st+1 and a reward r. This reward can be a positive real number, zero, or a negative real number. (In this environment negative real number reward is not included.) It is the objective of the agent to realize which state subordinate action to take which boosts its rewards.

Keras + Reinforcement Learning

As it is mentioned, one of the aim is creating neural network that can perform Q-Learning. Current state is a must as an input. Q values will bi created for each action in that specific state with respect to Bellman Equation.

Our implementation is the highlighted part below:

We build our model with this code block:

Keras has built-in model.png generator:

When we run this the output is:

To make it more understandable we illustrated it as:

The main part:

  • Each episode runs until the environment sets the done=True.

  • We keep score values to be able to see the success of episode.

  • After applying epsilongreedy method we take step from environment and get reward.

  • We add reward to score.

  • We use memory to feed model regard to randomness with batchsize.

  • Q-Learning part with bellman equation applied and fitted to model.

  • Lastly, to check the accuracy of the agent we printed and plotted some values.

Plots and Prints

As we can see in the graph, when episode number increases our agent’s success also increased.

Discussion

We tried to implement deep reinforcement learning using Keras to OpenAI’s Gym/CartPole-v0 environment. With doing some dynamic changes to code it can also work on other environments such as MountainCar-v0 etc. We commented one-hot encoding part of the code, with using correct sizes other neural networks can be also implemented such as TFLearn, DNN.

We get success in early steps nearly 40th episode. It can be improved, but it is quite successful.

About

Deep Reinforcement Learning with OpenAI Gym Environment

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages