FlappyBird PyTorch-1.11-Reinforcement-Learning DDQN

This project was taken as a basis https://github.com/PacktPublishing/PyTorch-1.x-Reinforcement-Learning-Cookbook/tree/master/Chapter09

The usual DQN tends to overestimate the Q-values of potential actions in a given state. This wouldn't cause any problems if all actions were equally overrated, but the fact is that once one particular action becomes overrated, it's more likely that it will be selected in the next iteration, making it difficult for the agent to uniformly explore the environment. and find the right policy.

The trick to overcoming this is quite simple. We are going to separate the action selection from the target generation of the Q-value. To do this, we will have two separate networks, so the name is Double Q-Learning. We will use our main network to select an action, and the target network to generate a Q value for that action. To synchronize our networks, we are going to copy weights from the primary network to the target one every (usually about 10k) training steps.

Name		Name	Last commit message	Last commit date
Latest commit History 40 Commits
.idea		.idea
assets		assets
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
ddqn.py		ddqn.py
flappy_bird.py		flappy_bird.py
test.py		test.py
train.py		train.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

FlappyBird PyTorch-1.11-Reinforcement-Learning DDQN

About

Uh oh!

Releases

Packages

Languages

License

ChayannFamali/FlappyBird-PyTorch-1.11-Reinforcement-Learning-DDQN

Folders and files

Latest commit

History

Repository files navigation

FlappyBird PyTorch-1.11-Reinforcement-Learning DDQN

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages