Active Reward Learning and Iterative Trajectory Improvement from Comparative Language Feedback (Journal Paper Under Review)
Authors: Eisuke Hirota*, Zhaojing Yang*, Ayano Hiranaka, Miru Jun, Jeremy Tien, Stuart Russell, Anca Dragan, Erdem Bıyık
First, let's create a virtual env using pyvenv or conda:
conda create -n lal python=3.8
conda activate lal
Next, we'll clone the repo and set our requirements up:
git clone https://github.com/USC-Lira/language_active_learning.git
cd language_active_learning/
pip install -r requirements.txt
There are 6 different methods to run:
- Comparison (non-active, random queries)
- Language (non-active, random queries)
- ActiveLanguage (active, information gain, linear nn)
- PurelyBayesian (active, information gain, Bayesian ML)
- BALD (active, information gain, deep ensemble)
- QbC (active, uncertainty, deep ensemble)
Methods 3-5 use some form of approximate Bayesian inference or variational inference. While this repo provides code for such methods, the paper only leverages Laplace approximation for speed purposes. The usage of other algorithms may consist of refactoring the code. Other algorithms consist of Metropolis-Hastings, Metropolis within Gibbs, and Expectation Propagation.
To run the code, we suggest using any of the slurm scripts like follows:
/home/user/language_active_learning$ sbatch scripts/nn_active_rs.slurm