Udacity Navigation Project 1

Introduction

The goal of this project is to train an agent using the Deep Q Network algorithm, to collect blue bananas, and avoid blue bananas in a large, square world. A reward of +1 is given for collecting a yellow banana, and a reward of -1 is given for collecting a blue banana. The gif below depicts a trained agent carrying out its task. The environment is considered solved when the agent is able to achieve an average score of +13 over 100 consecutive episodes. See the Report.pdf for more details on the algorithms that I tried to implement, with varying amounts of success/failure.

Environment Details

The environment is provided by Unity, a company that specializes in building worlds that can be used for video game development, simulation, animation, and architecture/design. The following is the description of the state space and actions available to the agent:

The state space has 37 dimensions and contains the agent's velocity, along with ray-based perception of objects around agent's forward direction. Given this information, the agent has to learn how to best select actions. Four discrete actions are available, corresponding to:

0 - move forward.
1 - move backward.
2 - turn left.
3 - turn right.

Dependencies

If 64x Windows is being used, the repo already contains the correct environment. Other operating systems may download the correct environment from the list below.

Download the environment from one of the links below. You need only select the environment that matches your operating system:
- Linux: click here
- Mac OSX: click here
- Windows (32-bit): click here
- Windows (64-bit): click here
The code expects the Banana.exe file to be located in the following directory of the repo "./Banana_Windows_x86_64/Banana.exe"
Create a new conda environment with the provided requirements.txt file. Ex. conda create --name --file requirements.txt

Using the Code

The code may be run using the command python navigator.py <config.json file> for training mode, or python navigator.py <network .pth file> for running mode.

Training Mode

Example python navigator.py config.json

The arguments to the navigator program are provided using a .json file, with the following structure:

{
    "base_name": "final_model",     # A base file name given to the output networks.  For hyper-parameter tuning, each output network is post-pended with an integer 0, 1, ..., k
    "is_dueling": false,            # whether the dueling-DQN network architecture is used, or a simpler network
    "algorithms": [],               # list of 0 or more algorithm features, from this list : ["double_dqn", "prioritized_replay"].  The base behavior uses DQN
    "replay_buffer_size": 1e5,      # size of the replay buffer from which transitions are sampled
    "batch_size": 64,               # size of sampling batch siz
    "gamma": 0.99,                  # Reward discount factor
    "tau": 1e-3,                    # Q-learning update factor
    "learning_rate": 1e-4,          # Network training update factor
    "learn_every": 4,               # Periodic number of transitions to gather before training
    "num_episodes": 2000,           # Total number of episodes to run
    "max_time": 1000,               # Maximum number of time steps in an episode
    "eps_start": 1.0,               # Epsilon starting value, for Epsilon-Greedy policy evaluation
    "eps_end": 0.01,                # Epsilon stop value
    "eps_decay": 0.995,             # Factor by which to decrease epsilon on each episode
    "alpha": 0.5,                   # Priority factor, used for prioritized replay.  alpha=0 samples transitions from a uniform distribution, while alpha=1 uses sampling proportional to error
    "beta": 0.5                     # Priority weight scaling factor.  Beta = 0 turns off any weight scaling, while Beta=1 provides full compensation for the bias in prioritized sampling
}

Any parameter enclosed in list brackets will iterate its parameters. For example the following argument will train the agent with 4 different values of the learning rate.

"learning_rate": [1e-5, 1e-4, 1e-3, 1e-2],

Using a list of parameters for multiple arguments will execute training runs for the cartesian product of the argument lists. For example, if the arguments below would lead to 12 different training runs. Note how the algorithms are specified with lists of 0 or more algorithmic features.

"algorithms": [
    [], ['double_dqn'], ['double_dqn', 'prioritized_replay']
], 
"learning_rate": [1e-5, 1e-4, 1e-3, 1e-2],

Running Mode

Example python navigator.py final_model_0.pth

In this mode, the agent is initialized using a saved network file, so you can watch it collect bananas!

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
Banana_Windows_x86_64		Banana_Windows_x86_64
.gitignore		.gitignore
README.md		README.md
Report.html		Report.html
Report.ipynb		Report.ipynb
baseline_hyperparams_gamma_0.pth		baseline_hyperparams_gamma_0.pth
baseline_hyperparams_gamma_1.pth		baseline_hyperparams_gamma_1.pth
baseline_hyperparams_gamma_2.pth		baseline_hyperparams_gamma_2.pth
baseline_hyperparams_gamma_3.pth		baseline_hyperparams_gamma_3.pth
baseline_hyperparams_gamma_4.pth		baseline_hyperparams_gamma_4.pth
baseline_hyperparams_gamma_5.pth		baseline_hyperparams_gamma_5.pth
baseline_hyperparams_learn_every_0.pth		baseline_hyperparams_learn_every_0.pth
baseline_hyperparams_learn_every_1.pth		baseline_hyperparams_learn_every_1.pth
baseline_hyperparams_learn_every_2.pth		baseline_hyperparams_learn_every_2.pth
baseline_hyperparams_learn_every_3.pth		baseline_hyperparams_learn_every_3.pth
baseline_hyperparams_lr_0.pth		baseline_hyperparams_lr_0.pth
baseline_hyperparams_lr_1.pth		baseline_hyperparams_lr_1.pth
baseline_hyperparams_lr_2.pth		baseline_hyperparams_lr_2.pth
baseline_hyperparams_lr_3.pth		baseline_hyperparams_lr_3.pth
baseline_hyperparams_lr_4.pth		baseline_hyperparams_lr_4.pth
baseline_hyperparams_tau_0.pth		baseline_hyperparams_tau_0.pth
baseline_hyperparams_tau_1.pth		baseline_hyperparams_tau_1.pth
baseline_hyperparams_tau_2.pth		baseline_hyperparams_tau_2.pth
baseline_hyperparams_tau_3.pth		baseline_hyperparams_tau_3.pth
config.json		config.json
dqn_agent.py		dqn_agent.py
dueling_eval_0.pth		dueling_eval_0.pth
dueling_eval_1.pth		dueling_eval_1.pth
final_model_0.pth		final_model_0.pth
model.py		model.py
model_baseline.pth		model_baseline.pth
model_dqn.pth		model_dqn.pth
model_dqn_pr.pth		model_dqn_pr.pth
model_pr.pth		model_pr.pth
navigator.py		navigator.py
requirements.txt		requirements.txt
train.py		train.py
utilities.py		utilities.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Udacity Navigation Project 1

Introduction

Environment Details

Dependencies

Using the Code

Training Mode

Running Mode

About

Uh oh!

Releases

Packages

Uh oh!

Languages

fundmntlTheorem/udacity_p1_navigation

Folders and files

Latest commit

History

Repository files navigation

Udacity Navigation Project 1

Introduction

Environment Details

Dependencies

Using the Code

Training Mode

Running Mode

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages