Update algorithm works not correct?

Hi, 

first I have to say, great work! Without your repository I wouldn't been able to get first results so quickly=)

I took your algorithm with your maze implementation. Unfortunately I had to change some stuff since the maze code is a bit older and does not include all your new implementations. After getting rid of all errors 
I wasn't able to train anything for a long time until I recognized that a negative reward of <=-1 does not work at all. 

What I did to proof this is: I initialized the network randomly and forced just one agent to always run directly into the wall during training. What I expected while doing this is that the probabilities for every action goes to 0.33 except the one which leads the agent directly into the wall. This should go to 0. 
But I observed another behavior. The action that leads the agent into the wall converges to zero as expected but one of the other actions emphasizes to one and the other both to zero too. 
As far as I observed the situation it's always the action with the highest starting probability directly after initialization which will be emphasized during this training process. 
I can't imagine that this is a wanted behavior, or am I wrong?
When I clip the reward to min -0.9 it works fine! 
My agent finds a way with minimal costs, even if the learning diverges after several thousand steps again. 
I guess this could also be an issue addressing the same problem. 

Regards, Martin 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Update algorithm works not correct? #44

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Update algorithm works not correct? #44

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions