Idea

Train an agent to play snake
Give agent full view of the board and let him figure out key features himself
Q-function is approximated by neural network
Use target network and replay memory

How to run this?

Create virtual environment.

python3 -m venv env

Activate virtual environment.

source env/bin/activate

Install required libraries.

pip install -r requirements.txt

Test pretrained model

python test.py ./models/gregor.pt

Train yourself

python deep_q_learning.py

or

python policy_search.py

MDP

State Space

A state is made of a 10x10 grid where:

empty is represented by -0.1
body is represented by 1
head is represented by 2
food is represented by 3

Action Space

There are 4 actions:

left represented by 0
right represented by 1
up represented by 2
down represented by 3

Reward Function

r(s,a) = -2, if the action results in a crash with wall or body
r(s,a) = 1, if the agent ate food
r(s,a) = -0.01, in any other scenario

Notes

Network

Use CNN to take advantage of spatial structure
Without padding, snake struggles to see food at the edge
Unsure about size

Exploration Strategy

Epsilon-Greedy doesn't work well because it samples actions that lead to certain death, therefore snake doesn't get long
Softmax works better
TODO: Find good temperature decay rate

Results

I stopped training after 27 hours on my GeForce MX250 which equated to 450 epochs with 20 games each. So in total 9000 games were played. I plotted the maximum, average and minimum score that gregor achieved in each epoch as well as a cumulative score where the score at every time step is accumulated. You can run gregor with python test.py models/gregor.pt

The agent wasn't quite able to get an average score of 30/99 consistently. In a very good game, the agent is able to get to a score of 50/99.

I think the main problem is, that the agent has no memory, therefor it does not know how the snakes tail moves which hinders him in avoiding his own body.

I trained a second model on a 6x6 grid where the body doesn't have a constant value. The values of the body decrease linearly from 1 to 0. The longer the body remains at this position the higher the value. In theory, the snake should have all information needed to beat the game. On this smaller board the agent is able to beat the game 45% of the time which I take as a win. You can run the model with python test.py models/6x6.pt.

Name		Name	Last commit message	Last commit date
Latest commit History 51 Commits
models		models
.gitignore		.gitignore
LICENSE		LICENSE
Readme.md		Readme.md
actorcritic_model.py		actorcritic_model.py
deep_q_learning.py		deep_q_learning.py
display.py		display.py
dqn.py		dqn.py
eval.py		eval.py
exploration_strategy.py		exploration_strategy.py
plot.py		plot.py
policy_search.py		policy_search.py
ppo.py		ppo.py
ppo_stablebaselines.py		ppo_stablebaselines.py
replay_buffer.py		replay_buffer.py
requirements.txt		requirements.txt
results.png		results.png
resultsFade.png		resultsFade.png
save.py		save.py
snakeMDP.py		snakeMDP.py
test.py		test.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Idea

How to run this?

MDP

State Space

Action Space

Reward Function

Notes

Network

Exploration Strategy

Results

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

davidgru/snakeQlearning

Folders and files

Latest commit

History

Repository files navigation

Idea

How to run this?

MDP

State Space

Action Space

Reward Function

Notes

Network

Exploration Strategy

Results

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages