Uncertainty-Aware Reward-Free Exploration with General Function Approximation

This is the codebase for our work Uncertainty-Aware Reward-Free Exploration with General Function Approximation.

Requirements

This environment needs access to a GPU that can run CUDA 10.2 and CUDNN 8. To install all required dependencies, create an anaconda environment by running

conda env create -f conda_env.yml

Once the installation is complete, you can activate your environment using the following command:

conda activate urlb

Agents

Agent	Command	Paper
dsquare	`agent=dsquare`	Our paper
ICM	`agent=icm`	paper
DIAYN	`agent=diayn`	paper
APT(ICM)	`agent=icm_apt`	paper
APS	`agent=aps`	paper
SMM	`agent=smm`	paper
RND	`agent=rnd`	paper
Disagreement	`agent=disagreement`	paper

Available Domains

We support the following domains.

Domain	Tasks
`walker`	`walker_stand`, `walker_walk`, `walker_run`, `walker_flip`
`quadruped`	`quadruped_walk`, `quadruped_run`, `quadruped_stand`, `quadruped_jump`

Instructions

Running the program

First add execute permission to run.sh script:

chmod +x run.sh

Then the online pretraining and offline finetuning can be executed by using the run.sh script:

./run.sh --domain "walker" --task "walker_walk" --agent "dsquare" --num_pretrain_frames 1000010 --device 0 --seed 0

The supported parameters are

Parameter	Meaning	Values
domain	The enviroment where the agent complete tasks	`walker`, `quadruped`
task	The tasks for the agent to complete	For `walker`: `walker_stand`, `walker_walk`, `walker_run`, `walker_flip`.For `quadruped`:`quadruped_walk`, `quadruped_run`, `quadruped_stand`, `quadruped_jump`
agent	The exploration algorithm for online pretrain	`dsquare`, `icm`, `diayn`, `icm_apt`, `aps`, `smm`, `rnd`, `disagreement`
num_pretrain_frames	The total number of frames in the pretraining phase	100010, 500010, 1000010, or any other integers
device	The cuda device for running the program	0 to 7
seed	The random seed for running the program	Any integers

The collected trajectories will be stored under

./online/<date>/<time>_<agent>_<task>

Monitoring

Logs for online pretraining and offline finetuning are stored in the online and offline folders, respectively. To launch tensorboard run:

tensorboard --logdir online (or offline)

The console output is also available in the form of:

| train | F: 6000 | S: 3000 | E: 6 | L: 1000 | R: 5.5177 | FPS: 96.7586 | T: 0:00:42

a training entry decodes as

F  : total number of environment frames
S  : total number of agent steps
E  : total number of episodes
R  : episode return
FPS: training throughput (frames per second)
T  : total training time

Acknowledgements

The codebase was adapted from URLB.

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
agent		agent
custom_dmc_tasks		custom_dmc_tasks
figures		figures
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
conda_env.yml		conda_env.yml
dmc.py		dmc.py
dmc_benchmark.py		dmc_benchmark.py
logger.py		logger.py
offline.py		offline.py
offline.yaml		offline.yaml
pretrain.py		pretrain.py
pretrain.yaml		pretrain.yaml
replay_buffer.py		replay_buffer.py
run.sh		run.sh
utils.py		utils.py
video.py		video.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Uncertainty-Aware Reward-Free Exploration with General Function Approximation

Requirements

Agents

Available Domains

Instructions

Running the program

Monitoring

Acknowledgements

About

Uh oh!

Releases

Packages

Languages

License

Jun-Kai-Zhang/gfa-rfe

Folders and files

Latest commit

History

Repository files navigation

Uncertainty-Aware Reward-Free Exploration with General Function Approximation

Requirements

Agents

Available Domains

Instructions

Running the program

Monitoring

Acknowledgements

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages