This repository contains two different approaches to building an AI for the classic game of Tic-Tac-Toe:
- Reinforcement Learning with Deep Q-Network (RL-DQN)
- Deep Neural Network (DNN) with Minimax and Alpha-Beta Pruning
Each approach is implemented in a separate folder, and both include a Flask-based web interface for playing against the AI.
tic-tac-toe-RL-DQN/
├── create_model.py # Script to create and train the RL-DQN model.
├── app.py # Flask application for the web interface.
├── templates/
│ └── index.html # HTML file for the web interface.
└── static/
├── style.css # CSS for styling the web application.
└── script.js # JavaScript for game functionality.
tic-tac-toe-DNN/
├── create_model.py # Script to create and train the DNN model.
├── app.py # Flask application for the web interface.
├── templates/
│ └── index.html # HTML file for the web interface.
└── static/
├── style.css # CSS for styling the web application.
└── script.js # JavaScript for game functionality.
- Environment: The
TicTacToeEnv
class simulates the Tic-Tac-Toe game, handling the game state, moves, and rewards. - DQN Agent: The
DQNAgent
class implements the Deep Q-Network, which learns to play the game by interacting with the environment. It uses a neural network to approximate the Q-value function, which estimates the expected future rewards for each action. - Training: The agent is trained over multiple episodes, where it plays the game and updates its Q-values based on the rewards received. The agent uses an epsilon-greedy strategy to balance exploration and exploitation.
- Minimax with Alpha-Beta Pruning: The
minimax
function is used to evaluate the best possible move by exploring the game tree. Alpha-Beta pruning is applied to reduce the number of nodes evaluated, making the algorithm more efficient. - DNN Model: The
create_model
function defines a deep neural network that takes the current board state as input and predicts the best move. The model is trained on a dataset of game states and corresponding optimal moves generated using the Minimax algorithm. - Training: The model is trained on a dataset of 5000 game states, with 20% of the data reserved for testing. The model uses dropout layers to prevent overfitting and is trained using the Adam optimizer.
- The RL-DQN model is trained over 1000 episodes. During each episode, the agent plays the game and updates its Q-values based on the rewards received. The agent uses a replay buffer to store experiences and samples from this buffer to update the model.
- The training process involves balancing exploration (random moves) and exploitation (moves based on the current Q-values). The epsilon value decreases over time, reducing the amount of exploration as the agent learns.
- The DNN model is trained on a dataset of 5000 game states, with 20% of the data reserved for testing. The model is trained for 120 epochs using the Adam optimizer.
- The dataset is generated using the Minimax algorithm with Alpha-Beta pruning, ensuring that the model learns from optimal moves. The model uses dropout layers to prevent overfitting.
- Navigate to the
tic-tac-toe-RL-DQN
folder. - Run the
create_model.py
script to train the RL-DQN model:python create_model.py
- Once the model is trained, run the Flask application:
python app.py
- Open your web browser and go to
http://127.0.0.1:5000/
to play against the AI.
- Navigate to the
tic-tac-toe-DNN
folder. - Run the
create_model.py
script to train the DNN model:python create_model.py
- Once the model is trained, run the Flask application:
python app.py
- Open your web browser and go to
http://127.0.0.1:5000/
to play against the AI.
- Python 3.x
- TensorFlow 2.x
- Flask
- Numpy
- Numba (for DNN)
You can install the required packages using pip:
pip install tensorflow flask numpy numba
-
RL-DQN:
- Reinforcement Learning-based AI that learns to play Tic-Tac-Toe by interacting with the environment.
- Uses a Deep Q-Network to approximate the Q-value function.
- Epsilon-greedy strategy for balancing exploration and exploitation.
-
DNN:
- Deep Neural Network-based AI that predicts the best move using a dataset of optimal moves generated by the Minimax algorithm.
- Uses Alpha-Beta pruning to efficiently explore the game tree.
- Dropout layers to prevent overfitting during training.
-
Web Interface:
- Both models come with a Flask-based web interface that allows you to play against the AI in your browser.
- The interface includes a simple and intuitive design, with the game board displayed and the AI's moves shown in real-time.
This repository provides two different approaches to building an AI for Tic-Tac-Toe, each with its own strengths and weaknesses. The RL-DQN approach is more flexible and can adapt to different strategies, while the DNN approach is more efficient and relies on precomputed optimal moves. Both models are trained to play the game at a high level and provide a challenging opponent for human players.