- Train an agent to play snake
- Give agent full view of the board and let him figure out key features himself
- Q-function is approximated by neural network
- Use target network and replay memory
- Create virtual environment.
python3 -m venv env
- Activate virtual environment.
source env/bin/activate
- Install required libraries.
pip install -r requirements.txt
- Test pretrained model
python test.py ./models/gregor.pt
- Train yourself
python deep_q_learning.py
or
python policy_search.py
A state is made of a 10x10
grid where:
empty
is represented by-0.1
body
is represented by1
head
is represented by2
food
is represented by3
There are 4 actions:
left
represented by0
right
represented by1
up
represented by2
down
represented by3
r(s,a) = -2
, if the action results in a crash with wall or bodyr(s,a) = 1
, if the agent ate foodr(s,a) = -0.01
, in any other scenario
- Use CNN to take advantage of spatial structure
- Without padding, snake struggles to see food at the edge
- Unsure about size
- Epsilon-Greedy doesn't work well because it samples actions that lead to certain death, therefore snake doesn't get long
- Softmax works better
- TODO: Find good temperature decay rate
I stopped training after 27 hours on my GeForce MX250 which equated to 450 epochs with 20 games each. So in total 9000 games were played. I plotted the maximum, average and minimum score that gregor
achieved in each epoch as well as a cumulative score where the score at every time step is accumulated. You can run gregor
with python test.py models/gregor.pt
The agent wasn't quite able to get an average score of 30/99 consistently. In a very good game, the agent is able to get to a score of 50/99.
I think the main problem is, that the agent has no memory, therefor it does not know how the snakes tail moves which hinders him in avoiding his own body.
I trained a second model on a 6x6 grid where the body doesn't have a constant value. The values of the body decrease linearly from 1
to 0
. The longer the body remains at this position the higher the value. In theory, the snake should have all information needed to beat the game. On this smaller board the agent is able to beat the game 45%
of the time which I take as a win. You can run the model with python test.py models/6x6.pt
.