Skip to content

Commit 98a8b30

Browse files
authored
Updates to latest RSL-RL v2.3.0 release (#2154)
# Description This MR introduces multi-GPU training for RSL-RL library. Also adds configuration options for symmetry and RND. Compatible only with RSL-RL v2.3.0 onwards so fixing the version. Fixes #2180 ## Type of change - New feature (non-breaking change which adds functionality) ## Checklist - [x] I have run the [`pre-commit` checks](https://pre-commit.com/) with `./isaaclab.sh --format` - [x] I have made corresponding changes to the documentation - [x] My changes generate no new warnings - [ ] I have added tests that prove my fix is effective or that my feature works - [x] I have updated the changelog and the corresponding version in the extension's `config/extension.toml` file - [x] I have added my name to the `CONTRIBUTORS.md` or my name already exists there
1 parent b663ad1 commit 98a8b30

File tree

13 files changed

+303
-10
lines changed

13 files changed

+303
-10
lines changed

docs/source/_static/refs.bib

Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -154,3 +154,24 @@ @inproceedings{he2016deep
154154
pages={770--778},
155155
year={2016}
156156
}
157+
158+
@InProceedings{schwarke2023curiosity,
159+
title = {Curiosity-Driven Learning of Joint Locomotion and Manipulation Tasks},
160+
author = {Schwarke, Clemens and Klemm, Victor and Boon, Matthijs van der and Bjelonic, Marko and Hutter, Marco},
161+
booktitle = {Proceedings of The 7th Conference on Robot Learning},
162+
pages = {2594--2610},
163+
year = {2023},
164+
volume = {229},
165+
series = {Proceedings of Machine Learning Research},
166+
publisher = {PMLR},
167+
url = {https://proceedings.mlr.press/v229/schwarke23a.html},
168+
}
169+
170+
@InProceedings{mittal2024symmetry,
171+
author={Mittal, Mayank and Rudin, Nikita and Klemm, Victor and Allshire, Arthur and Hutter, Marco},
172+
booktitle={2024 IEEE International Conference on Robotics and Automation (ICRA)},
173+
title={Symmetry Considerations for Learning Task Symmetric Robot Policies},
174+
year={2024},
175+
pages={7433-7439},
176+
doi={10.1109/ICRA57147.2024.10611493}
177+
}

docs/source/features/multi_gpu.rst

Lines changed: 22 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@ Multi-GPU and Multi-Node Training
44
.. currentmodule:: isaaclab
55

66
Isaac Lab supports multi-GPU and multi-node reinforcement learning. Currently, this feature is only
7-
available for RL-Games and skrl libraries workflows. We are working on extending this feature to
7+
available for RL-Games, RSL-RL and skrl libraries workflows. We are working on extending this feature to
88
other workflows.
99

1010
.. attention::
@@ -57,6 +57,13 @@ To train with multiple GPUs, use the following command, where ``--nproc_per_node
5757
5858
python -m torch.distributed.run --nnodes=1 --nproc_per_node=2 scripts/reinforcement_learning/rl_games/train.py --task=Isaac-Cartpole-v0 --headless --distributed
5959
60+
.. tab-item:: rsl_rl
61+
:sync: rsl_rl
62+
63+
.. code-block:: shell
64+
65+
python -m torch.distributed.run --nnodes=1 --nproc_per_node=2 scripts/reinforcement_learning/rsl_rl/train.py --task=Isaac-Cartpole-v0 --headless --distributed
66+
6067
.. tab-item:: skrl
6168
:sync: skrl
6269

@@ -95,6 +102,13 @@ For the master node, use the following command, where ``--nproc_per_node`` repre
95102
96103
python -m torch.distributed.run --nproc_per_node=2 --nnodes=2 --node_rank=0 --rdzv_id=123 --rdzv_backend=c10d --rdzv_endpoint=localhost:5555 scripts/reinforcement_learning/rl_games/train.py --task=Isaac-Cartpole-v0 --headless --distributed
97104
105+
.. tab-item:: rsl_rl
106+
:sync: rsl_rl
107+
108+
.. code-block:: shell
109+
110+
python -m torch.distributed.run --nproc_per_node=2 --nnodes=2 --node_rank=0 --rdzv_id=123 --rdzv_backend=c10d --rdzv_endpoint=localhost:5555 scripts/reinforcement_learning/rsl_rl/train.py --task=Isaac-Cartpole-v0 --headless --distributed
111+
98112
.. tab-item:: skrl
99113
:sync: skrl
100114

@@ -128,6 +142,13 @@ For non-master nodes, use the following command, replacing ``--node_rank`` with
128142
129143
python -m torch.distributed.run --nproc_per_node=2 --nnodes=2 --node_rank=1 --rdzv_id=123 --rdzv_backend=c10d --rdzv_endpoint=ip_of_master_machine:5555 scripts/reinforcement_learning/rl_games/train.py --task=Isaac-Cartpole-v0 --headless --distributed
130144
145+
.. tab-item:: rsl_rl
146+
:sync: rsl_rl
147+
148+
.. code-block:: shell
149+
150+
python -m torch.distributed.run --nproc_per_node=2 --nnodes=2 --node_rank=1 --rdzv_id=123 --rdzv_backend=c10d --rdzv_endpoint=ip_of_master_machine:5555 scripts/reinforcement_learning/rsl_rl/train.py --task=Isaac-Cartpole-v0 --headless --distributed
151+
131152
.. tab-item:: skrl
132153
:sync: skrl
133154

docs/source/overview/reinforcement-learning/rl_frameworks.rst

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -27,7 +27,7 @@ Feature Comparison
2727
- Stable Baselines3
2828
* - Algorithms Included
2929
- PPO, SAC, A2C
30-
- PPO
30+
- PPO, Distillation
3131
- `Extensive List <https://skrl.readthedocs.io/en/latest/#agents>`__
3232
- `Extensive List <https://github.com/DLR-RM/stable-baselines3?tab=readme-ov-file#implemented-algorithms>`__
3333
* - Vectorized Training
@@ -37,7 +37,7 @@ Feature Comparison
3737
- No
3838
* - Distributed Training
3939
- Yes
40-
- No
40+
- Yes
4141
- Yes
4242
- No
4343
* - ML Frameworks Supported

scripts/benchmarks/benchmark_rsl_rl.py

Lines changed: 22 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -31,6 +31,9 @@
3131
parser.add_argument("--task", type=str, default=None, help="Name of the task.")
3232
parser.add_argument("--seed", type=int, default=42, help="Seed used for the environment")
3333
parser.add_argument("--max_iterations", type=int, default=10, help="RL Policy training iterations.")
34+
parser.add_argument(
35+
"--distributed", action="store_true", default=False, help="Run training with multiple GPUs or nodes."
36+
)
3437
parser.add_argument(
3538
"--benchmark_backend",
3639
type=str,
@@ -126,8 +129,27 @@ def main(env_cfg: ManagerBasedRLEnvCfg | DirectRLEnvCfg | DirectMARLEnvCfg, agen
126129
"""Train with RSL-RL agent."""
127130
# parse configuration
128131
benchmark.set_phase("loading", start_recording_frametime=False, start_recording_runtime=True)
132+
# override configurations with non-hydra CLI arguments
129133
agent_cfg = cli_args.update_rsl_rl_cfg(agent_cfg, args_cli)
130134
env_cfg.scene.num_envs = args_cli.num_envs if args_cli.num_envs is not None else env_cfg.scene.num_envs
135+
agent_cfg.max_iterations = (
136+
args_cli.max_iterations if args_cli.max_iterations is not None else agent_cfg.max_iterations
137+
)
138+
139+
# set the environment seed
140+
# note: certain randomizations occur in the environment initialization so we set the seed here
141+
env_cfg.seed = agent_cfg.seed
142+
env_cfg.sim.device = args_cli.device if args_cli.device is not None else env_cfg.sim.device
143+
144+
# multi-gpu training configuration
145+
if args_cli.distributed:
146+
env_cfg.sim.device = f"cuda:{app_launcher.local_rank}"
147+
agent_cfg.device = f"cuda:{app_launcher.local_rank}"
148+
149+
# set seed to have diversity in different threads
150+
seed = agent_cfg.seed + app_launcher.local_rank
151+
env_cfg.seed = seed
152+
agent_cfg.seed = seed
131153

132154
# specify directory for logging experiments
133155
log_root_path = os.path.join("logs", "rsl_rl", agent_cfg.experiment_name)

scripts/reinforcement_learning/rsl_rl/play.py

Lines changed: 17 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,21 @@
55

66
"""Script to play a checkpoint if an RL agent from RSL-RL."""
77

8+
import platform
9+
from importlib.metadata import version
10+
11+
if version("rsl-rl-lib") != "2.3.0":
12+
if platform.system() == "Windows":
13+
cmd = [r".\isaaclab.bat", "-p", "-m", "pip", "install", "rsl-rl-lib==2.3.0"]
14+
else:
15+
cmd = ["./isaaclab.sh", "-p", "-m", "pip", "install", "rsl-rl-lib==2.3.0"]
16+
print(
17+
f"Please install the correct version of RSL-RL.\nExisting version is: '{version('rsl-rl-lib')}'"
18+
" and required version is: '2.3.0'.\nTo install the correct version, run:"
19+
f"\n\n\t{' '.join(cmd)}\n"
20+
)
21+
exit(1)
22+
823
"""Launch Isaac Sim Simulator first."""
924

1025
import argparse
@@ -120,11 +135,9 @@ def main():
120135

121136
# export policy to onnx/jit
122137
export_model_dir = os.path.join(os.path.dirname(resume_path), "exported")
123-
export_policy_as_jit(
124-
ppo_runner.alg.actor_critic, ppo_runner.obs_normalizer, path=export_model_dir, filename="policy.pt"
125-
)
138+
export_policy_as_jit(ppo_runner.alg.policy, ppo_runner.obs_normalizer, path=export_model_dir, filename="policy.pt")
126139
export_policy_as_onnx(
127-
ppo_runner.alg.actor_critic, normalizer=ppo_runner.obs_normalizer, path=export_model_dir, filename="policy.onnx"
140+
ppo_runner.alg.policy, normalizer=ppo_runner.obs_normalizer, path=export_model_dir, filename="policy.onnx"
128141
)
129142

130143
dt = env.unwrapped.physics_dt

scripts/reinforcement_learning/rsl_rl/train.py

Lines changed: 28 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,21 @@
55

66
"""Script to train RL agent with RSL-RL."""
77

8+
import platform
9+
from importlib.metadata import version
10+
11+
if version("rsl-rl-lib") != "2.3.0":
12+
if platform.system() == "Windows":
13+
cmd = [r".\isaaclab.bat", "-p", "-m", "pip", "install", "rsl-rl-lib==2.3.0"]
14+
else:
15+
cmd = ["./isaaclab.sh", "-p", "-m", "pip", "install", "rsl-rl-lib==2.3.0"]
16+
print(
17+
f"Please install the correct version of RSL-RL.\nExisting version is: '{version('rsl-rl-lib')}'"
18+
" and required version is: '2.3.0'.\nTo install the correct version, run:"
19+
f"\n\n\t{' '.join(cmd)}\n"
20+
)
21+
exit(1)
22+
823
"""Launch Isaac Sim Simulator first."""
924

1025
import argparse
@@ -25,6 +40,9 @@
2540
parser.add_argument("--task", type=str, default=None, help="Name of the task.")
2641
parser.add_argument("--seed", type=int, default=None, help="Seed used for the environment")
2742
parser.add_argument("--max_iterations", type=int, default=None, help="RL Policy training iterations.")
43+
parser.add_argument(
44+
"--distributed", action="store_true", default=False, help="Run training with multiple GPUs or nodes."
45+
)
2846
# append RSL-RL cli arguments
2947
cli_args.add_rsl_rl_args(parser)
3048
# append AppLauncher cli args
@@ -90,6 +108,16 @@ def main(env_cfg: ManagerBasedRLEnvCfg | DirectRLEnvCfg | DirectMARLEnvCfg, agen
90108
env_cfg.seed = agent_cfg.seed
91109
env_cfg.sim.device = args_cli.device if args_cli.device is not None else env_cfg.sim.device
92110

111+
# multi-gpu training configuration
112+
if args_cli.distributed:
113+
env_cfg.sim.device = f"cuda:{app_launcher.local_rank}"
114+
agent_cfg.device = f"cuda:{app_launcher.local_rank}"
115+
116+
# set seed to have diversity in different threads
117+
seed = agent_cfg.seed + app_launcher.local_rank
118+
env_cfg.seed = seed
119+
agent_cfg.seed = seed
120+
93121
# specify directory for logging experiments
94122
log_root_path = os.path.join("logs", "rsl_rl", agent_cfg.experiment_name)
95123
log_root_path = os.path.abspath(log_root_path)

source/isaaclab_rl/config/extension.toml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
[package]
22

33
# Note: Semantic Versioning is used: https://semver.org/
4-
version = "0.1.1"
4+
version = "0.1.2"
55

66
# Description
77
title = "Isaac Lab RL"

source/isaaclab_rl/docs/CHANGELOG.rst

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,15 @@
11
Changelog
22
---------
33

4+
0.1.2 (2025-03-28)
5+
~~~~~~~~~~~~~~~~~~
6+
7+
Added
8+
^^^^^
9+
10+
* Added symmetry and curiosity-based exploration configurations for RSL-RL wrapper.
11+
12+
413
0.1.1 (2025-03-10)
514
~~~~~~~~~~~~~~~~~~
615

source/isaaclab_rl/isaaclab_rl/rsl_rl/__init__.py

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -17,4 +17,6 @@
1717

1818
from .exporter import export_policy_as_jit, export_policy_as_onnx
1919
from .rl_cfg import RslRlOnPolicyRunnerCfg, RslRlPpoActorCriticCfg, RslRlPpoAlgorithmCfg
20+
from .rnd_cfg import RslRlRndCfg
21+
from .symmetry_cfg import RslRlSymmetryCfg
2022
from .vecenv_wrapper import RslRlVecEnvWrapper

source/isaaclab_rl/isaaclab_rl/rsl_rl/rl_cfg.py

Lines changed: 26 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -8,6 +8,9 @@
88

99
from isaaclab.utils import configclass
1010

11+
from .rnd_cfg import RslRlRndCfg
12+
from .symmetry_cfg import RslRlSymmetryCfg
13+
1114

1215
@configclass
1316
class RslRlPpoActorCriticCfg:
@@ -19,6 +22,9 @@ class RslRlPpoActorCriticCfg:
1922
init_noise_std: float = MISSING
2023
"""The initial noise standard deviation for the policy."""
2124

25+
noise_std_type: Literal["scalar", "log"] = "scalar"
26+
"""The type of noise standard deviation for the policy. Default is scalar."""
27+
2228
actor_hidden_dims: list[int] = MISSING
2329
"""The hidden dimensions of the actor network."""
2430

@@ -72,6 +78,21 @@ class RslRlPpoAlgorithmCfg:
7278
max_grad_norm: float = MISSING
7379
"""The maximum gradient norm."""
7480

81+
normalize_advantage_per_mini_batch: bool = False
82+
"""Whether to normalize the advantage per mini-batch. Default is False.
83+
84+
If True, the advantage is normalized over the entire collected trajectories.
85+
Otherwise, the advantage is normalized over the mini-batches only.
86+
"""
87+
88+
symmetry_cfg: RslRlSymmetryCfg | None = None
89+
"""The symmetry configuration. Default is None, in which case symmetry is not used."""
90+
91+
rnd_cfg: RslRlRndCfg | None = None
92+
"""The configuration for the Random Network Distillation (RND) module. Default is None,
93+
in which case RND is not used.
94+
"""
95+
7596

7697
@configclass
7798
class RslRlOnPolicyRunnerCfg:
@@ -99,7 +120,11 @@ class RslRlOnPolicyRunnerCfg:
99120
"""The algorithm configuration."""
100121

101122
clip_actions: float | None = None
102-
"""The clipping value for actions. If ``None``, then no clipping is done."""
123+
"""The clipping value for actions. If ``None``, then no clipping is done.
124+
125+
.. note::
126+
This clipping is performed inside the :class:`RslRlVecEnvWrapper` wrapper.
127+
"""
103128

104129
##
105130
# Checkpointing parameters

0 commit comments

Comments
 (0)