Heyy,
First of all, great project, I'm glad to see it succeeded! I'm working on a similar project of making A2C model learn to play Yamb (it's a variation of Yahtzee played with 6 dices where you keep 5 and the categories are different), so I was wondering if you could possibly find a couple of minutes of your day to answer some of my questions of how you got the agent to learn, since it's a pretty similar problem that we're dealing with.
Sorry for opening an issue, I just had no idea of how to message you other than this :)
Regards,
Aleksandar