Question about Issue 1

I am struggling to understand your reasoning here:

> Issue - The paper states that the number of sequences of actions should be 2^N. But I could only find the one sequence of right actions and N other sequences that terminate by the wrong action and the number of transitions in the replay memory to be (N(N+1)/2 + N)

Can you show how this holds for a simlpe case such as N = 3?

Here is mine:

![IMG_AD65C9CCFDD9-1](https://user-images.githubusercontent.com/31866965/75117376-20cfbd80-5671-11ea-85fe-5dc7d3d57bf6.jpeg)


> This will form our replay memory. In total, there will be (N*(N+1)/2 + N) transitions in the list.

This also doesn't match what the paper reports. According to the paper:

The replay memory contains all therelevant experience (the total number of transitions is 2^(n+1) - 2)

In the paper they show that returing from state N to state 1 can either give a reward of 1 (green arrow) or 0 (dashed red arrow). How did you decide to implement this?

![image](https://user-images.githubusercontent.com/31866965/75116801-43aba300-566c-11ea-8c2d-b9b20ca71e08.png)




Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Question about Issue 1 #2

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Question about Issue 1 #2

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions