Skip to content

Question about Issue 1 #2

@RyanRizzo96

Description

@RyanRizzo96

I am struggling to understand your reasoning here:

Issue - The paper states that the number of sequences of actions should be 2^N. But I could only find the one sequence of right actions and N other sequences that terminate by the wrong action and the number of transitions in the replay memory to be (N(N+1)/2 + N)

Can you show how this holds for a simlpe case such as N = 3?

Here is mine:

IMG_AD65C9CCFDD9-1

This will form our replay memory. In total, there will be (N*(N+1)/2 + N) transitions in the list.

This also doesn't match what the paper reports. According to the paper:

The replay memory contains all therelevant experience (the total number of transitions is 2^(n+1) - 2)

In the paper they show that returing from state N to state 1 can either give a reward of 1 (green arrow) or 0 (dashed red arrow). How did you decide to implement this?

image

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions