HitLML_PPO_RLHF

Introduction

This repository contains the code for the Human-in-the-Loop Machine Learning course taught by Professor Eric Nalisnick at the University of Amsterdam. The goal project was to implement Car Racing V2 Gym environment with PPO with RLHF. This repo extends work previously done in this repo by Xiaoteng Ma. Experiments conducted include the impact of RLHF on the performance of PPO in the Car Racing environment, in both the k-wise and binary classification variants of the RLHF algorithm. All parameters currently set as the defaults in the main_train.py and main_test.py files are the parameters used in the experiments, unless states otherwise in the accompanying paper. In order the run the code, simply run the main_train.py file a number of times while changing the parameters as you go. Once the training is done, run the main_test.py file to plot the results on a single graph. The main_test.py file will produce a scatter plot of the different test runs for each experiment, as well as an error bar plot of said results. The test file will also print the mean and standard deviation of the results for each experiment.

Note: Some example outputs have been added to the output folders for reference. Please remove these if you wish to make use of the code.

Hyperparameters that were not changed: Training:

--num-epochs: 1000
--gamma: 0.99 (PPO discount)
--action-repeat: 8 (PPO training hyerparameter)
--batch-size: 128 (PPO training hyerparameter for the number of samples per batch)
--num-states: 500 Testing: These are hardcoded in the main_test.py file, but can be changed if necessary.
Number of test runs per configuration: 100
Number of steps per test run: 500

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
plotting		plotting
test_results		test_results
trained_parameters		trained_parameters
training_results		training_results
Human_in_the_Loop_ML_Project.pdf		Human_in_the_Loop_ML_Project.pdf
README.md		README.md
agent.py		agent.py
automated_labeler.py		automated_labeler.py
conv_network.py		conv_network.py
main_test.py		main_test.py
main_train.py		main_train.py
plotting_functions.py		plotting_functions.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

HitLML_PPO_RLHF

Introduction

About

Uh oh!

Releases

Packages

Languages

lukegtc/HitLML_PPO_RLHF

Folders and files

Latest commit

History

Repository files navigation

HitLML_PPO_RLHF

Introduction

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages