Description
Describe the bug
In the reward computation logic (compute
function), when a reward term has a weight of zero, the code skips computation and does not update self._step_reward
for that term.
This is not an issue if a term's weight remains zero from initialization or is changed from zero to non-zero after the reward manager is created.
However, if a term's weight is first changed to non-zero and later back to zero, the self._step_reward
would retain stale (nonzero) values from earlier updates, since the loop skips updating it when the weight is zero.
Explicitly setting self._step_reward
to zero when the weight is zero ensures correctness and prevents stale data,
Steps to reproduce
- Create or initialize a reward manager with a reward term having zero weight.
- Change the weight of the term dynamically to a non-zero value during runtime.
- Compute the reward and observe that
self._step_reward
is correctly updated with nonzero values. - Change the term’s weight back to zero dynamically.
- Compute the reward again.
- Observe that
self._step_reward
still retains the previous nonzero value because the update is skipped when the weight is zero.
Example (pseudo-code):
# Assume reward manager initialized normally
reward_manager._term_cfgs[idx].weight = 2.0 # Set to non-zero
reward_manager.compute(dt=0.02)
print(reward_manager._step_reward[:, idx]) # --> Should show nonzero value
reward_manager._term_cfgs[idx].weight = 0.0 # Set back to zero
reward_manager.compute(dt=0.02)
print(reward_manager._step_reward[:, idx])
# Expected: 0.0
# Observed: still showing stale nonzero value
System Info
Describe the characteristic of your environment:
- Commit: 2e6946a
- Isaac Lab Version: 2.1.0
- Isaac Sim Version: 4.5
- OS: Ubuntu 22.04
- GPU: RTX 3060
- CUDA: 12.4
- GPU Driver: 550.120
Additional context
This bug was detected during live visualization of rewards in the IsaacSim GUI, where a curriculum reward setting was used.
The ManagerLiveVisualizer
relies on _step_reward
to report reward terms visually, and in this case, it incorrectly reported stale reward values when the weight was dynamically changed back to zero.
This issue affects any tool that uses _step_reward
for visualization or logging.
Note that the total reward (_reward_buf
) computation remains correct, since skipping a term with zero weight or explicitly adding zero yields the same result.
Checklist
- I have checked that there is no similar issue in the repo (required)
- I have checked that the issue is not in running Isaac Sim itself and is related to the repo
Acceptance Criteria
Add the criteria for which this task is considered done. If not known at issue creation time, you can add this once the issue is assigned.
-
self._step_reward
is explicitly set to zero for zero-weight terms - Per-term step reward accurately reflects zero contribution when weight is zero