The goal of this project is to train an agent to control a robotic arm, such that it stays close to a floating ball.
A reward of +0.1 is given for each time step that the arm is in the region of a floating ball. The environment is considered to be "solved" when the agent receives a reward of > 30.0 for 100 consecutive episodes of 1000 steps.
The environment is provided by Unity, a company that specializes in building worlds that can be used for video game development, simulation, animation, and architecture/design. The following is the description of the state space and actions available to the agent:
The observation space consists of 33 variables corresponding to position, rotation, velocity, and angular velocities of the arm. Each action is a vector with four numbers, corresponding to torque applicable to two joints. Every entry in the action vector should be a number between -1 and 1.
-
Download the x64 windows environment for the single agent, from here.
-
The code expects the Reacher.exe file to be located in the following directory of the repo "./Reacher_Windows_x86_64_SingleAgent/Reacher.exe"
-
Create a new conda environment with the provided requirements.txt file. Ex. conda create --name --file requirements.txt
The code may be run using the command python continuous_control.py <config.json file> [network_file.pth]
. The network file is an optional parameter that will first load a previous file.
Example python navigator.py config.json
The arguments to the program are provided using a .json file. See the utilities/config.py file for the default parameters. Setting train_mode: true
will train the agent. Setting train_mode: false
will simply run the agent for a single episode without training the networks.
{
"env_name": "reacher",
"train_mode": true,
"device": "cuda",
"actor_learning_rate": 1e-4,
"critic_learning_rate": 1e-4,
"batch_size": 8,
"gamma": 0.99,
"tau": 1e-3
}