Skip to content

Commit 0959091

Browse files
authored
Resets cuda device after each app.update call (#2283)
# Description Calling app.update may change the cuda device that was previously set by Isaac Lab. This change forces the cuda device to be set back to the desired device after each app.update call made in SimulationContext in reset, step, and render. This fixes NCCL errors on distributed setups for certain environments (especially when rendering is enabled), where previously it would generate errors that different ranks were running on the same device. ## Type of change <!-- As you go through the list, delete the ones that are not applicable. --> - Bug fix (non-breaking change which fixes an issue) ## Checklist - [x] I have run the [`pre-commit` checks](https://pre-commit.com/) with `./isaaclab.sh --format` - [x] I have made corresponding changes to the documentation - [x] My changes generate no new warnings - [ ] I have added tests that prove my fix is effective or that my feature works - [ ] I have updated the changelog and the corresponding version in the extension's `config/extension.toml` file - [ ] I have added my name to the `CONTRIBUTORS.md` or my name already exists there <!-- As you go through the checklist above, you can mark something as done by putting an x character in it For example, - [x] I have done this task - [ ] I have not done this task -->
1 parent 203955e commit 0959091

File tree

3 files changed

+23
-1
lines changed

3 files changed

+23
-1
lines changed

source/isaaclab/config/extension.toml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
[package]
22

33
# Note: Semantic Versioning is used: https://semver.org/
4-
version = "0.36.5"
4+
version = "0.36.6"
55

66
# Description
77
title = "Isaac Lab framework for Robot Learning"

source/isaaclab/docs/CHANGELOG.rst

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,17 @@
11
Changelog
22
---------
33

4+
0.36.6 (2025-04-09)
5+
~~~~~~~~~~~~~~~~~~~
6+
7+
Changed
8+
^^^^^^^
9+
10+
* Added call to set cuda device after each ``app.update()`` call in :class:`~isaaclab.sim.SimulationContext`.
11+
This is now required for multi-GPU workflows because some underlying logic in ``app.update()`` is modifying
12+
the cuda device, which results in NCCL errors on distributed setups.
13+
14+
415
0.36.5 (2025-04-01)
516
~~~~~~~~~~~~~~~~~~~
617

source/isaaclab/isaaclab/sim/simulation_context.py

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -452,6 +452,9 @@ def forward(self) -> None:
452452

453453
def reset(self, soft: bool = False):
454454
super().reset(soft=soft)
455+
# app.update() may be changing the cuda device in reset, so we force it back to our desired device here
456+
if "cuda" in self.device:
457+
torch.cuda.set_device(self.device)
455458
# enable kinematic rendering with fabric
456459
if self.physics_sim_view:
457460
self.physics_sim_view._backend.initialize_kinematic_bodies()
@@ -488,6 +491,10 @@ def step(self, render: bool = True):
488491
# step the simulation
489492
super().step(render=render)
490493

494+
# app.update() may be changing the cuda device in step, so we force it back to our desired device here
495+
if "cuda" in self.device:
496+
torch.cuda.set_device(self.device)
497+
491498
def render(self, mode: RenderMode | None = None):
492499
"""Refreshes the rendering components including UI elements and view-ports depending on the render mode.
493500
@@ -527,6 +534,10 @@ def render(self, mode: RenderMode | None = None):
527534
self._app.update()
528535
self.set_setting("/app/player/playSimulations", True)
529536

537+
# app.update() may be changing the cuda device, so we force it back to our desired device here
538+
if "cuda" in self.device:
539+
torch.cuda.set_device(self.device)
540+
530541
"""
531542
Operations - Override (extension)
532543
"""

0 commit comments

Comments
 (0)