-
Notifications
You must be signed in to change notification settings - Fork 14
Open
Description
Hi,
Thanks for the code and the paper on using adaptive attention span in RL.
In train.py, I haven't understood the logic for calculating ind_first_done in following line:
adaptive-transformers-in-rl/train.py
Line 1240 in 6f75366
| ind_first_done = padding_mask.long().argmin(0) + 1 # will be index of first 1 in each column |
After going through the loss calculations and learn function where ind_first_done is used, I feel line:
adaptive-transformers-in-rl/train.py
Line 1240 in 6f75366
| ind_first_done = padding_mask.long().argmin(0) + 1 # will be index of first 1 in each column |
ind_first_done = padding_mask.long().argmax(0) + 1 .I feel so because from the comments,
ind_first_done denotes the final index in each trajectory.
Could you kindly explain the logic used for the mentioned snippet?
Metadata
Metadata
Assignees
Labels
No labels