GitHub - albertogg99/p2-continuous-control: Udacidy DRL Nanodegree

Deep Reinforcement Learning Nanodegree

Policy-Based Methods | Project: Continuous Control

Author : Alberto García García

Introduction

The problem to be solved in this project consists in training a reinforcement learning agent controlling a double-jointed robotic arm. The robotic arm must reach a target location and stay within it, obtaining a positive reward of +0.1 for every step maintaining its position in the target location.

The state space consists of 33 variables including information like the velocity or the rotation of the arm among others, while the action space consists of 4 variables representing the torque applicable to the arm joints. The problem is considered to be solved when the agent’s average score of the last 100 episodes is equal or greater than 30 points.

Guidelines

The whole training of the agent is implemented in the ContinuosControl.ipynb notebook. You can either visualize the last execution or run it by yourself in a Jupyter server. To do so, you can take the following steps to fulfill the requirements:

Create (and activate) a new environment with Python 3.6.

Linux or Mac:

conda create --name drlnd python=3.6
source activate drlnd

Windows:

conda create --name drlnd python=3.6 
activate drlnd

Follow the instructions in this repository to perform a minimal install of OpenAI gym.
Install the dependencies in the python/ folder.
```
cd ./python/
pip install .
```

Create an IPython kernel for the drlnd environment.

python -m ipykernel install --user --name drlnd --display-name "drlnd"

Download the Unity Environment and unzip it inside the solution/ directory. If you are using Windows 64 bits, you can skip this step since the repository already contains the environment files. Otherwise, delete the environment files and place the ones matching your OS.
Before running the notebook, change the kernel to match the drlnd environment by using the drop-down Kernel menu.

Results

The agent is able to solve the problem in 68 episodes (averaging 30 points from episode 68 to episode 168). A detailed description of the implemented algorithm can be found in report.pdf. The weights of the actor network are stored in actor.pth and the weights of the critic network are stored in critic.pth. Here's the agent's score history:

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
python		python
resources		resources
solution		solution
README.md		README.md
report.pdf		report.pdf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Deep Reinforcement Learning Nanodegree

Policy-Based Methods | Project: Continuous Control

Author : Alberto García García

Introduction

Guidelines

Results

About

Uh oh!

Releases

Packages

Uh oh!

Languages

albertogg99/p2-continuous-control

Folders and files

Latest commit

History

Repository files navigation

Deep Reinforcement Learning Nanodegree

Policy-Based Methods | Project: Continuous Control

Author : Alberto García García

Introduction

Guidelines

Results

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages