Skip to content

Is this algorithm suitable for off-policy policy? #15

@dbsxdbsx

Description

@dbsxdbsx

I just finished reading your paper, and I notice that it is an on policy method.
And I wondering if anyone has tested it with an rl method that has a replay_buff pool.
As far as I know, for off-policy method with RNN structure(like lstm, gru or attention or transformer...), if hidden state is stored with a sample (s,a,r,s'), the hidden state would become a stale data after a long training--- Is this issue conqured with adaptive-transformer?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions