Is actions_prob calculation correct?

The following code uses the action logit value for the optimal action, and then diff against the log prob of the action from the last actor model iteration. Should we instead pick the action from `old_actions` instead just `max`, so that we are comparing the prob for the same action from two iterations?

```
                # get action log prob
                actions_prob = (
                    torch.softmax(actions_logits, dim=-1).max(dim=-1).values
                )
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Is actions_prob calculation correct? #7

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Uh oh!

Is actions_prob calculation correct? #7

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions