Skip to content

[Bug Report] Step reward retains stale values when weight is dynamically set back to zero #2391

Closed
@bikcrum

Description

@bikcrum

Describe the bug

In the reward computation logic (compute function), when a reward term has a weight of zero, the code skips computation and does not update self._step_reward for that term.
This is not an issue if a term's weight remains zero from initialization or is changed from zero to non-zero after the reward manager is created.
However, if a term's weight is first changed to non-zero and later back to zero, the self._step_reward would retain stale (nonzero) values from earlier updates, since the loop skips updating it when the weight is zero.

Explicitly setting self._step_reward to zero when the weight is zero ensures correctness and prevents stale data,

Steps to reproduce

  1. Create or initialize a reward manager with a reward term having zero weight.
  2. Change the weight of the term dynamically to a non-zero value during runtime.
  3. Compute the reward and observe that self._step_reward is correctly updated with nonzero values.
  4. Change the term’s weight back to zero dynamically.
  5. Compute the reward again.
  6. Observe that self._step_reward still retains the previous nonzero value because the update is skipped when the weight is zero.

Example (pseudo-code):

# Assume reward manager initialized normally
reward_manager._term_cfgs[idx].weight = 2.0  # Set to non-zero
reward_manager.compute(dt=0.02)

print(reward_manager._step_reward[:, idx])  # --> Should show nonzero value

reward_manager._term_cfgs[idx].weight = 0.0  # Set back to zero
reward_manager.compute(dt=0.02)

print(reward_manager._step_reward[:, idx])  
# Expected: 0.0
# Observed: still showing stale nonzero value

System Info

Describe the characteristic of your environment:

  • Commit: 2e6946a
  • Isaac Lab Version: 2.1.0
  • Isaac Sim Version: 4.5
  • OS: Ubuntu 22.04
  • GPU: RTX 3060
  • CUDA: 12.4
  • GPU Driver: 550.120

Additional context

This bug was detected during live visualization of rewards in the IsaacSim GUI, where a curriculum reward setting was used.
The ManagerLiveVisualizer relies on _step_reward to report reward terms visually, and in this case, it incorrectly reported stale reward values when the weight was dynamically changed back to zero.
This issue affects any tool that uses _step_reward for visualization or logging.

Note that the total reward (_reward_buf) computation remains correct, since skipping a term with zero weight or explicitly adding zero yields the same result.

Checklist

  • I have checked that there is no similar issue in the repo (required)
  • I have checked that the issue is not in running Isaac Sim itself and is related to the repo

Acceptance Criteria

Add the criteria for which this task is considered done. If not known at issue creation time, you can add this once the issue is assigned.

  • self._step_reward is explicitly set to zero for zero-weight terms
  • Per-term step reward accurately reflects zero contribution when weight is zero

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions