Skip to content

Update algorithm works not correct? #44

@Maschwe

Description

@Maschwe

Hi,

first I have to say, great work! Without your repository I wouldn't been able to get first results so quickly=)

I took your algorithm with your maze implementation. Unfortunately I had to change some stuff since the maze code is a bit older and does not include all your new implementations. After getting rid of all errors
I wasn't able to train anything for a long time until I recognized that a negative reward of <=-1 does not work at all.

What I did to proof this is: I initialized the network randomly and forced just one agent to always run directly into the wall during training. What I expected while doing this is that the probabilities for every action goes to 0.33 except the one which leads the agent directly into the wall. This should go to 0.
But I observed another behavior. The action that leads the agent into the wall converges to zero as expected but one of the other actions emphasizes to one and the other both to zero too.
As far as I observed the situation it's always the action with the highest starting probability directly after initialization which will be emphasized during this training process.
I can't imagine that this is a wanted behavior, or am I wrong?
When I clip the reward to min -0.9 it works fine!
My agent finds a way with minimal costs, even if the learning diverges after several thousand steps again.
I guess this could also be an issue addressing the same problem.

Regards, Martin

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions