Skip to content

fundmntlTheorem/udacity_p3_collaboration

Repository files navigation

Udacity Collaboration and Competition Project 3

Introduction

The goal of this project is to train a pair of agents to play tennis. The agents must bounce a ball back and forth over a net. Hitting the ball over the net gives the agent a reward of +0.1. If the ball hits the ground or goes out of bounds, the agent receives a reward of -0.01. The longer the agents keep the ball in play, the higher the reward. The environment is considered to be solved when the average maximum score over both agents is > +0.5 over 100 episodes.

Trained Agent

Environment Details

The environment is provided by Unity, a company that specializes in building worlds that can be used for video game development, simulation, animation, and architecture/design. The following is the description of the state space and actions available to the agent:

The observation space consists of 8 variables corresponding to the position and velocity of the ball and racket. Each agent receives its own, local observation. Two continuous actions are available, corresponding to movement toward (or away from) the net, and jumping.

Dependencies

  1. Download the x64 windows environment for the single agent, from here.

  2. The code expects the Tennis.exe file to be located in the following directory of the repo "./Tennis_Windows_x86_64/Tennis.exe"

  3. Create a new conda environment with the provided requirements.txt file. Ex. conda create --name --file requirements.txt

Using the Code

The code may be run using the command python play_tennis.py <config.json file> [network_file.pth]. The network file is an optional parameter that will first load a previous file.

Train/Run Mode

Example python play_tennis.py config.json

The arguments to the program are provided using a .json file. See the utilities/config.py file for the default parameters. Setting train_mode: true will train the agents. Setting train_mode: false will simply run the agents for a single episode without training the networks.

{
    "env_name": "tennis",
    "train_mode": true,
    "device": "cuda",
    "actor_learning_rate": 1e-4,
    "critic_learning_rate": 1e-4,
    "batch_size": 32,
    "gamma": 0.99,
    "tau": 1e-3,
    "num_episodes": 5000
}

About

Two agents learn to play tennis

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published