Skip to content

Ruglio/TicTacToe

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

In this project I tried to put into practice my studies into reinforcement learning. In particular, the temporal difference method: Q-learning with epsilon-greedy approach. The aim is simply to implement it to properly understand the relationship between environment (the game) and agent (the player). Therefore, the parameter of the model (epsilon, learning-rate, and decay) were not fine-tuned. As a consequence the model is not as good as it can be and we will win a few more matches :)

The computer was taught how to win by performing against itself for 5.000.000 games, quite a lot. Kinda as a trial and error, it understood the usually suggested moves. The result can be seen below and played in my personal website. You can try to beat it.

The color of each square represent the respective Q-factor (value of that action) learnt by the model. Hence, a green color means that the respective action is revealing move, and could let you win the match. On the other hand, a red color is more likely a blunder.

The project has been interesting and educational also because I better understood how to use Flask. It is a python module that helps in the creation of a web app by linking back-end and front-end. Therefore, it can better show off the logic of the game, by making use of HTML and CSS, by creating a more appealing and user-friendly interface.

ttt

About

Application of Reinforcement Learning to play TicTacToe

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published