-
Notifications
You must be signed in to change notification settings - Fork 347
Open
Description
In the DQNAgent, particularly in the step method, there seems to be a potential issue in properly distinguishing between termination and truncation, as suggested by the Gymnasium documentation available at https://gymnasium.farama.org/tutorials/gymnasium_basics/handling_time_limits/.
The following line of code, done = terminated or truncated
, treats both termination and truncation equally.
Furthermore, in the _compute_dqn_loss method, the code lines:
mask = 1 - done target = (reward + self.gamma * next_q_value * mask).to(self.device)
do not seem to account specifically for truncation.
Metadata
Metadata
Assignees
Labels
No labels