Offline RL for Humanoid with Behavioral Cloning & Finetuning

This repository demonstrates how to:

Generate Offline Data using a pretrained policy (e.g., from Hugging Face Hub).
Train a Reinforcement Learning (RL) agent from offline data, with optional pretraining or fine-tuning using Behavioral Cloning (BC).

📜 Key Scripts

generate_offline_data.py: Generates offline trajectories by rolling out a pretrained Stable-Baselines policy (e.g. from the Hugging Face Hub) and saves them as .pkl.
train_bc.py: Trains a behavioral cloning policy using the generated offline dataset.
train_rl.py: Trains an RL agent (e.g. SAC, or a custom QAgent) using either the replay buffer or a combination of replay buffer + BC.

1. 🛠️ Setup Environment

1.1 Install MuJoCo (if not already installed)

Download the MuJoCo 2.1 binaries and extract them to:

~/.mujoco/mujoco210

1.2 Create Conda Environment (Recommended)

conda create --name offline_rl python=3.9
conda activate offline_rl

1.3 Install Dependencies

Adjust the CUDA version if needed.

# Example: PyTorch 2.1 with CUDA 12.1
pip install torch==2.1.0 torchvision==0.16.0 --index-url https://download.pytorch.org/whl/cu121

# Then install the rest
pip install -r requirements.txt

(Optional) Build the C++ extension in common_utils:

cd common_utils
make
cd ..

1.4 Set Up Environment Variables (Optional)

Update and source set_env.sh if it contains environment variables:

source set_env.sh

2. 📦 Generate Offline Data

Generate offline data from a pretrained policy (e.g., from Hugging Face Hub):

python generate_offline_data.py

By default:

Uses a Farama-Minari SAC expert policy on Humanoid-v5.
Filters out poor trajectories (e.g., rewards < 8000).
Saves data to release/data/humanoid-expert-v5.pkl.

You can modify:

Environment name.
Save location.
Number of episodes and reward threshold.

3. 🧠 Train a Behavioral Cloning (BC) Policy

Run:

python train_bc.py --config_path path/to/config.yaml

Example minimal config (humanoid_bc.yaml):

seed: 1
device: "cuda"
preload_datapath: "release/data/humanoid-expert-v5.pkl"
batch_size: 256
...

Checkpoints and logs will be saved to save_dir.

4. 🏋️ Train RL with Offline Data (+ Optional BC Policy)

Train SAC or custom Q-learning RL agent from offline data:

python train_rl.py --config_path path/to/config.yaml

Example config options:

preload_datapath: "release/data/humanoid-expert-v5.pkl"
preload_num_data: 100

bc_policy: "path/to/bc_policy_checkpoint.pt"
mix_rl_rate: 0.5

pretrain_num_epoch: 10
pretrain_epoch_len: 10000
pretrain_only: 0

✅ Example Commands

Generate Data

python generate_offline_data.py

Train BC Policy

python train_bc.py --config_path release/cfgs/mujoco_bc/humanoid_bc.yaml

Train RL (SAC) from Offline Data

python train_rl.py --config_path release/cfgs/mujoco_rl/humanoid_rl.yaml \
                   --preload_datapath release/data/humanoid-expert-v5.pkl \
                   --preload_num_data 50 \
                   --num_train_step 200000 \
                   --use_wb 1

Train RL with BC Initialization

python train_rl.py --config_path release/cfgs/mujoco_rl/humanoid_rl.yaml \
                   --bc_policy path/to/bc_policy.pt \
                   --pretrain_num_epoch 5 \
                   --pretrain_epoch_len 10000

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
bc		bc
common_utils		common_utils
evaluate		evaluate
networks		networks
release/cfgs		release/cfgs
rl		rl
.gitignore		.gitignore
.gitmodules		.gitmodules
GymnasiumWrapper.py		GymnasiumWrapper.py
README.md		README.md
generate_offline_data.py		generate_offline_data.py
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
set_env.sh		set_env.sh
train_bc.py		train_bc.py
train_rl.py		train_rl.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Offline RL for Humanoid with Behavioral Cloning & Finetuning

📜 Key Scripts

1. 🛠️ Setup Environment

1.1 Install MuJoCo (if not already installed)

1.2 Create Conda Environment (Recommended)

1.3 Install Dependencies

1.4 Set Up Environment Variables (Optional)

2. 📦 Generate Offline Data

3. 🧠 Train a Behavioral Cloning (BC) Policy

4. 🏋️ Train RL with Offline Data (+ Optional BC Policy)

✅ Example Commands

Generate Data

Train BC Policy

Train RL (SAC) from Offline Data

Train RL with BC Initialization

About

Uh oh!

Releases

Packages

Uh oh!

Languages

fmdazhar/demonstration_bc_rl

Folders and files

Latest commit

History

Repository files navigation

Offline RL for Humanoid with Behavioral Cloning & Finetuning

📜 Key Scripts

1. 🛠️ Setup Environment

1.1 Install MuJoCo (if not already installed)

1.2 Create Conda Environment (Recommended)

1.3 Install Dependencies

1.4 Set Up Environment Variables (Optional)

2. 📦 Generate Offline Data

3. 🧠 Train a Behavioral Cloning (BC) Policy

4. 🏋️ Train RL with Offline Data (+ Optional BC Policy)

✅ Example Commands

Generate Data

Train BC Policy

Train RL (SAC) from Offline Data

Train RL with BC Initialization

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages