[Bug Report] Step reward retains stale values when weight is dynamically set back to zero #2391
Open
4 tasks done
Labels
bug
Something isn't working
Describe the bug
In the reward computation logic (
compute
function), when a reward term has a weight of zero, the code skips computation and does not updateself._step_reward
for that term.This is not an issue if a term's weight remains zero from initialization or is changed from zero to non-zero after the reward manager is created.
However, if a term's weight is first changed to non-zero and later back to zero, the
self._step_reward
would retain stale (nonzero) values from earlier updates, since the loop skips updating it when the weight is zero.Explicitly setting
self._step_reward
to zero when the weight is zero ensures correctness and prevents stale data,Steps to reproduce
self._step_reward
is correctly updated with nonzero values.self._step_reward
still retains the previous nonzero value because the update is skipped when the weight is zero.Example (pseudo-code):
System Info
Describe the characteristic of your environment:
Additional context
This bug was detected during live visualization of rewards in the IsaacSim GUI, where a curriculum reward setting was used.
The
ManagerLiveVisualizer
relies on_step_reward
to report reward terms visually, and in this case, it incorrectly reported stale reward values when the weight was dynamically changed back to zero.This issue affects any tool that uses
_step_reward
for visualization or logging.Note that the total reward (
_reward_buf
) computation remains correct, since skipping a term with zero weight or explicitly adding zero yields the same result.Checklist
Acceptance Criteria
Add the criteria for which this task is considered done. If not known at issue creation time, you can add this once the issue is assigned.
self._step_reward
is explicitly set to zero for zero-weight termsThe text was updated successfully, but these errors were encountered: