Regarding logic for first done indexes

Hi,
Thanks for the code and the paper on using adaptive attention span in RL. 
In `train.py`, I haven't understood the logic for calculating `ind_first_done` in following line:   
https://github.com/jerrodparker20/adaptive-transformers-in-rl/blob/6f75366b78998fb1d8755acd2d851c461c82ee75/train.py#L1240 .

After going through the loss calculations and `learn` function where `ind_first_done` is used, I feel line: https://github.com/jerrodparker20/adaptive-transformers-in-rl/blob/6f75366b78998fb1d8755acd2d851c461c82ee75/train.py#L1240 should be as follows:
`ind_first_done = padding_mask.long().argmax(0) + 1` . 
I feel so because from the comments, `ind_first_done` denotes the final index in each trajectory. 

Could you kindly explain the logic used for the mentioned snippet?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Regarding logic for first done indexes #17

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Regarding logic for first done indexes #17

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions