Skip to content

Not handling time limits #70

@carlos-UPC-AI

Description

@carlos-UPC-AI

In the DQNAgent, particularly in the step method, there seems to be a potential issue in properly distinguishing between termination and truncation, as suggested by the Gymnasium documentation available at https://gymnasium.farama.org/tutorials/gymnasium_basics/handling_time_limits/.

The following line of code, done = terminated or truncated, treats both termination and truncation equally.

Furthermore, in the _compute_dqn_loss method, the code lines:
mask = 1 - done target = (reward + self.gamma * next_q_value * mask).to(self.device)

do not seem to account specifically for truncation.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions