Skip to content

Regarding logic for first done indexes #17

@victor-psiori

Description

@victor-psiori

Hi,
Thanks for the code and the paper on using adaptive attention span in RL.
In train.py, I haven't understood the logic for calculating ind_first_done in following line:

ind_first_done = padding_mask.long().argmin(0) + 1 # will be index of first 1 in each column
.

After going through the loss calculations and learn function where ind_first_done is used, I feel line:

ind_first_done = padding_mask.long().argmin(0) + 1 # will be index of first 1 in each column
should be as follows:
ind_first_done = padding_mask.long().argmax(0) + 1 .
I feel so because from the comments, ind_first_done denotes the final index in each trajectory.

Could you kindly explain the logic used for the mentioned snippet?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions