deep q-learning: pipe riders

this project was created to familiarize myself with reinforcement learning (deep-q, double-deep-q, deep-q-from-demonstrations), and to have a reference point for future investigations. the code is mostly annotated with source material taken from the related books, various sources around the web and my own explainations aimed to build intuition.

deep q-learning from demonstrations

for a more detailed and easy(er) to understand explanation of the algorithm look at pipe-riders.ipynb and dqfd_agent.py

python .\main-dqfd-record.py # record demonstration
python .\main-dqfd.py # train
python .\main-dqfd-test.py # inference - live environment - remember to set correct checkpoint paths

deep q-learning from demonstrations: results

original run: 5000000 steps (4949 episodes)

starting with an exploration rate ($\epsilon$) of 100% and is correction factor ($\beta$) of 0%

average episode length: 1434 steps

extension run: 5000000 steps + additional 1000000 steps

the episode average suggested that there is some improvement to be done so the agent was left running for an additional 1m steps starting with an exploration rate ($\epsilon$) fixed at 1% (meaning little to no explation), is correction factor ($\beta$) fixed at 100% (meaning high td-error transitions don't impact learning too much)

expert demonstration data can be found here: https://mega.nz/folder/W15TQCLJ#6kABEerwaZyU9nqHxR8pyQ
the trained agent can be found here: https://mega.nz/folder/ipZDGaIb#Bn1NXlWoSvRhUp0IjnCjTw

double deep q-learning

for a more detailed and easy(er) to understand explanation of the algorithm look at pipe-riders.ipynb and ddq_agent.py

python .\main-ddq.py # train
python .\main-ddq-test.py # inference - live environment - remember to set correct checkpoint paths

double deep q-learning: results

original run: 1000000 steps

the basic deep q-learning agent could not make any noticable progress with this (small) amount of training steps due to the complexity of the game

todo / possible improvements / known flaws

address the reward signals flaws
- add reward for actually avoiding barrier
  - or provide a slight penalty when agent is close to a barrier to reinforce the need for caution
  - this might be tricky due to the choice of emulation environment (can't extract too much from a browser based emulator)
- add reward for collecting time bonuses (green boxes)
  - tracking health increase can be tricky when health bar is already full
experiment with different parameters for cnn (kernel size, stride, etc)
- since we don't have exact time-steps the input images have large variance, specific values can make the cnn too sensitive and make it not learn effectively (e.g. a small kernel could possibly fail to learn the underlying pattern when the input variance is high). another way to go about this is to add max-pooling layers to make the network less sensitive to exact positions of barriers
try he initialization instead of xavier-glorot
delayed learning signal - if the agent only receives a penalty when it hits a barrier, it may struggle to connect its previous actions to that outcome. this can make it difficult for the agent to understand which specific actions led to a negative result, hindering effective learning. the only signal it can go by are the features extracted by the convolutions which might not be enough. [update] n-step learning should have completely address this issue
irregular frame capture might interfere with q-value approximation / feature extraction
- because of the timeout based frame-advancing the captured frames are irregular i.e frames captured in one episode might not overlap with frames of another episode even when they should be referring to the same position
- this might end up helping the neural network to generalize better because it introduces variability to cnn input as well as output. one downside might be the need for a more complex fully-connected setup
- implementing better frame-advancing -> will require to decompile the game or hook deeper into the emulator

first time setup

pip install -r requirements.txt --extra-index-url https://download.pytorch.org/whl/cu118
# or
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
pip install scikit-image opencv-python
pip install numpy pandas plotly matplotlib torch-lucent torchinfo pillow
pip install keyboard selenium webdriver-manager pytest-playwright

performance profiling

python -m cProfile -o .\main-dqfd.generated.prof .\main-dqfd.py

python -m pip install snakeviz
python -m snakeviz .\main-dqfd.generated.prof

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
.vscode		.vscode
agent		agent
checkpoints		checkpoints
environment		environment
experiments		experiments
metric_logging		metric_logging
snapshots		snapshots
.gitignore		.gitignore
deep-q-from-demonstrations.png		deep-q-from-demonstrations.png
double-deep-q.png		double-deep-q.png
main-ddq-test.py		main-ddq-test.py
main-ddq.py		main-ddq.py
main-dqfd-record.py		main-dqfd-record.py
main-dqfd-test.py		main-dqfd-test.py
main-dqfd.py		main-dqfd.py
pipe-riders.ipynb		pipe-riders.ipynb
readme.md		readme.md
requirements.txt		requirements.txt
scratch-book.ipynb		scratch-book.ipynb
setup.cfg		setup.cfg

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

deep q-learning: pipe riders

deep q-learning from demonstrations

deep q-learning from demonstrations: results

original run: 5000000 steps (4949 episodes)

extension run: 5000000 steps + additional 1000000 steps

double deep q-learning

double deep q-learning: results

original run: 1000000 steps

todo / possible improvements / known flaws

first time setup

performance profiling

About

Uh oh!

Releases

Packages

Uh oh!

Languages

hockic/deep-q-learning-pipe-riders

Folders and files

Latest commit

History

Repository files navigation

deep q-learning: pipe riders

deep q-learning from demonstrations

deep q-learning from demonstrations: results

original run: 5000000 steps (4949 episodes)

extension run: 5000000 steps + additional 1000000 steps

double deep q-learning

double deep q-learning: results

original run: 1000000 steps

todo / possible improvements / known flaws

first time setup

performance profiling

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages