Install Miniconda3, then set up your environment:
conda create --name ddrl-project python=3.10
conda activate ddrl-project
On CUDA CIMS, use /scratch
for installing packages and saving checkpoints.
On Greene, use /scratch/<NET-ID>
for the same.
Ensure the pip you're using points to your conda environment:
which pip
pip install --cache-dir=/scratch -r requirements.txt
There are two training modes: single GPU and multi-GPU with DDP.
Before training, set the authorize key as an environment variable inside train.py
or train_ddp.py
.
- CUDA CIMS: Use
screen
to run and detach (pressCtrl+A
, thenD
) - Greene: Use
sbatch
to submit jobs
cd train
# Single GPU
python train.py
# Multi-GPU DDP
python train_ddp.py
For now, prefer using single GPU (train.py
).
To use cuda:1
:
Learner(device="cuda:1")
pip uninstall rlgym_ppo
cd rlgym-ppo
pip install -e .
Then run:
python train_ddp.py
Reward functions are located in the rewards
directory.
To use a specific reward, import it in train.py
or train_ddp.py
.
# Install Rust from https://www.rust-lang.org/tools/install
# Clone and build
git clone https://github.com/VirxEC/rlviser.git
cd rlviser
git checkout v0.7.17
rustup install nightly
cargo +nightly build --release -Z build-std=std --target aarch64-apple-darwin
# Copy executable
cp target/aarch64-apple-darwin/release/rlviser ../eval
Download the weights from the training cluster to your local machine. Change lines 73-74 in visual_bot_match.py
to the policy weights checkpoint directory
on your local machine.
cd eval
python visual_bot_match.py
Used for simulating matches between two bots to compare performance of bots based on rewards cumulated and goals scored.
cd eval
python simulate_bot_match.py
You can test the reward functions by playing manually as the blue agent.
Controls:
W
,A
,S
,D
: MovementSpace
: JumpLeft Shift
: BoostQ
,E
: RollX
: Handbrake
cd eval
python human_match_improved.py
Challenge any of the bots by loading the agent weight and playing.
cd eval
python human_vs_bot.py
- Add metrics to
simulate_bot_match.py
for publishing to Weights & Biases (wandb) - (Srivats) Send updated
Learner
script for checkpoint save/load with wandb only - Brainstorm and implement new reward functions to improve performance
- Train using your custom reward
- Visualize or simulate your model every 500 steps on wandb; adjust reward weights based on insights