Bridging the Gap Between Soft RL and GFlowNets

Official code for the project Bridging the Gap Between Soft RL and GFlowNets.

Ayhan Suleymanzade, Zahra Bayramli

This repository is build upon the related work GFlowNets as Entropy-Regularized RL

Installation

Create conda environment:

conda create -n gflownet-rl python=3.10
conda activate gflownet-rl

Install PyTorch with CUDA. For our experiments we used the following versions:

conda install pytorch==2.0.0 torchvision==0.15.0 pytorch-cuda=11.8 -c pytorch -c nvidia

You can change pytorch-cuda=11.8 with pytorch-cuda=XX.X to match your version of CUDA.

Install core dependencies:

pip install -r requirements.txt

Hypergrids

Code for this part heavily utlizes library torchgfn (https://github.com/GFNOrg/torchgfn).

Path to configurations (utlizes ml-collections library):

General configuration: hypergrid/experiments/config/general.py
Algorithm: hypergrid/experiments/config/algo.py
Environment: hypergrid/experiments/config/hypergrid.py

List of available algorithms:

GFlowNets Baselines: db, tb, subtb from torchgfn library;
Soft RL Baselines: soft_dqn, munchausen_dqn, sac.
Our Algorithms: learnable_dqn, lambda_dqn, pcl_dqn, pcl_ms_dqn

Example of running the experiment on environment with height=20, ndim=4 with standard rewards, seed 3 on the algorithm lambda_dqn.

    python run_hypergrid_exp.py --general experiments/config/general.py:3 --env experiments/config/hypergrid.py:standard --algo experiments/config/algo.py:lambda_dqn --env.height 20 --env.ndim 4

Bit sequences

Examples of running TB, DB and SubTB baselines for word length k=8:

python bitseq/run.py --objective tb --k 8 --learning_rate 0.002

python bitseq/run.py --objective db --k 8 --learning_rate 0.002

python bitseq/run.py --objective subtb --k 8 --learning_rate 0.002 --subtb_lambda 1.9

Example of running SoftDQN:

python bitseq/run.py --objective softdqn --m_alpha 0.0 --k 8 --learning_rate 0.002 --leaf_coeff 2.0

Example of running MunchausenDQN:

python bitseq/run.py --objective softdqn --m_alpha 0.15 --k 8 --learning_rate 0.002 --leaf_coeff 2.0

Example of running LearnableDQN:

python bitseq/run.py --objective learnable_dqn --m_alpha 0.0 --k 8 --learning_rate 0.002 --leaf_coeff 2.0

Example of running LambdaDQN:

python bitseq/run.py --objective lambda_dqn --m_alpha 0.0 --k 8 --learning_rate 0.002 --leaf_coeff 2.0 --lambda_dist "Gamma"

Example of running PCLDQN:

python bitseq/run.py --objective pcl_dqn --m_alpha 0.0 --k 8 --learning_rate 0.002 --leaf_coeff 2.0

Example of running PCL_MS_DQN:

python bitseq/run.py --objective pcl_ms_dqn --m_alpha 0.0 --k 8 --learning_rate 0.002 --leaf_coeff 2.0 --v_learning_rate 0.001

Results

You can find all the results in the following directories:

/hypergrid/grid_results
/bitseq/bitseq_results

These directories correspond to the Hypergrid and Bit sequences experiments, respectively.

Hypergrid Results

The columns in the Hypergrid results files are as follows:

First Column: KL Divergence
Second Column: L1 Distance
Third Column: Timestep

Bit Sequences Results

The columns in the Bit Sequences results files are as follows:

First Column: Number of Modes Captured
Second Column: Spearman Correlation
Third Column: Timestep

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
bitseq		bitseq
hypergrid		hypergrid
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
task.py		task.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Bridging the Gap Between Soft RL and GFlowNets

Installation

Hypergrids

Bit sequences

Results

Hypergrid Results

Bit Sequences Results

About

Uh oh!

Releases

Packages

Languages

License

MisakiTaro0414/SoftRL_GFlowNets

Folders and files

Latest commit

History

Repository files navigation

Bridging the Gap Between Soft RL and GFlowNets

Installation

Hypergrids

Bit sequences

Results

Hypergrid Results

Bit Sequences Results

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages