Skip to content

ChayannFamali/FlappyBird-PyTorch-1.11-Reinforcement-Learning-DDQN

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

40 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

FlappyBird PyTorch-1.11-Reinforcement-Learning DDQN

This project was taken as a basis https://github.com/PacktPublishing/PyTorch-1.x-Reinforcement-Learning-Cookbook/tree/master/Chapter09

The usual DQN tends to overestimate the Q-values of potential actions in a given state. This wouldn't cause any problems if all actions were equally overrated, but the fact is that once one particular action becomes overrated, it's more likely that it will be selected in the next iteration, making it difficult for the agent to uniformly explore the environment. and find the right policy.

The trick to overcoming this is quite simple. We are going to separate the action selection from the target generation of the Q-value. To do this, we will have two separate networks, so the name is Double Q-Learning. We will use our main network to select an action, and the target network to generate a Q value for that action. To synchronize our networks, we are going to copy weights from the primary network to the target one every (usually about 10k) training steps.

image

About

In this project, the agent learns to play FlappyBird based on the DDQN model.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages