first try on existing enviornment
mini hex https://github.com/FirefoxMetzger/minihex after final decision on algorithmn implement enviornment
implement CNN https://github.com/harbecke/HexHex
Monte Carlo Tree Search (MCTS) to simulate game moves and explore the state space efficiently balance exploration and exploitation using the Upper Confidence bounds for Trees (UCT) algorithm
Self-Play Data Generation https://github.com/jseppanen/azalea collect training data from self-play-games, including states, actions, and outcomes
general sources https://spinningup.openai.com/en/latest/spinningup/rl_intro.html