YAMS: Reinforcement Learning Project

Objective

The goal of the YAMS Reinforcement Learning (RL) project is to develop an agent capable of playing the Yahtzee game optimally using RL algorithms. The agent will aim to maximize its total score after playing 10 games, learning how to make the best decisions about which dice to keep and which to re-roll. The RL model will be evaluated on its ability to choose optimal actions based on the game’s state and its learning progress.

Each game of Yahtzee is divided into thirteen rounds. During each round, a player rolls five dice. The player can re-roll the dice up to two more times and may choose which specific dice to re-roll. After completing the rolls, the player must assign the final roll to one of the thirteen categories on their score-sheet. The score for each round is determined by how well the dice match the chosen category. Once a category is selected, it cannot be used again for the remainder of the game. The game ends when all categories are filled, and the player with the highest total score wins.

Example of a Round

For example, imagine a player rolls a 1, 2, 2, 3, and 5 on their first roll. The player decides to re-roll the 3 and 5, obtaining a 2 and 4. The player re-rolls the 4 again and gets another 2, resulting in a final roll of 1, 2, 2, 2, 2. The player then assigns this roll to the "Twos" category, where the score is the sum of all dice that show a 2. In this case, the score would be 2 + 2 + 2 + 2 = 8 points for that round.

Components

Performance Measure

The agent’s objective is to maximize the total score after 10 games. The performance measures include:

Total Score: The sum of dice values after 10 games.
Efficiency: The ability to make optimal decisions after each roll.
Number of Dice Kept: Indicates the strategy adopted by the agent.

Environment

The environment consists of two parts: the full game environment and the turn environment.

Game Environment

The game environment defines the rules of the Yahtzee game, manages turns and episodes, and assigns rewards based on the actions taken.

Main Attributes:

self.dices, self.faces, self.turns: Parameters of the game.
self.S: Describes the point categories in the game (sum of faces, three of a kind, pair, chance, etc.).

Key Methods:

get_action: Calculates the possible rewards for a given state.
play_episode: Simulates a full game episode, including dice rolls and selected actions.
choose_action and choose_random_action: Select actions either strategically (via Q-values) or randomly.

Turn Environment

The turn environment models a single game round, managing the state of the dice, possible actions, and transitions between states.

Main Attributes:

self.dices, self.faces: Number of dice and sides per die.
self.Roll, self.Roll_P: Possible dice states and their probabilities.
self.S: List of initial dice states.
self.Aa: List of all possible actions.

Key Methods:

get_states: Generates all possible dice states and their probabilities.
get_actions_from_state: Determines the possible actions for a given state.
get_actions_list: Generates a global list of all possible actions for all states.
One_step_backward: Updates state values and Q-values through a dynamic learning step.
choose_best_action: Selects the action with the highest Q-value for a given state.

Agents

The project features several agents that employ different reinforcement learning techniques, including:

Random Agent: Chooses actions randomly.
Greedy Agent: Selects the action with the highest immediate reward.
Monte Carlo Agent: Learns from complete episodes by averaging returns from different states.
Q-learning Agent: Uses Q-values to make decisions based on past experiences.
SARSA Agent: Similar to Q-learning but updates Q-values using the action taken in the next step.
Perceptron Q-learning Agent: Combines Q-learning with a perceptron model for function approximation.

Properties of the Environment

Property	Description
Observability	The environment is fully observable. The agent can see the value of all dice after each roll.
Determinism	The environment is stochastic due to the random nature of dice rolls.
Dynamics	The environment is sequential, as decisions (such as which dice to keep or re-roll) affect future outcomes.
Time Complexity	The game is discrete, with each action occurring in defined steps (roll, keep, re-roll).
Autonomy	The agent is autonomous, making decisions based on its observations, without external interference.
Multi-agent	This is a single-agent environment, where only one player interacts with the environment.

Results

1. Random Agent

Score max: 186
Score min: 82
Score moyen: 119.01

2. Greedy Agents

Greedy Agent Level 1: It never re-rolls the dice after the first roll and chooses the action that immediately maximizes its score.
Greedy Agent Level 2: It can re-roll once and chooses the action that maximizes the expected score after one re-roll.
Greedy Agent Level 3: It can re-roll twice and chooses the action that maximizes the expected score after two re-rolls.

3. Q-learning and SARSA Agent

SARSA Update:
Q(s, a) ← Q(s, a) + α [ r + γ Q(s', a') - Q(s, a) ]
where a' is the next action chosen by the policy in the next state s'.
Q-Learning Update:
Q(s, a) ← Q(s, a) + α [ r + γ max_a' Q(s', a') - Q(s, a) ]
Q-Learning uses the best possible Q value without considering the agent's current policy.

4. Perceptron Q-learning Agent

Ref : https://web.stanford.edu/class/aa228/reports/2018/final75.pdf (Paper Reinforcement Learning for Solving Yahtzee)

Conclusion

Number of dice: 5
Number of faces per die: 6
Epsilon (exploration): 0.2
Alpha (learning rate): 0.01
Gamma (discount factor): 0.9
Maximum number of turns per game: 7

License

This project is licensed under the MIT License - see the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
src		src
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

YAMS: Reinforcement Learning Project

Objective

Example of a Round

Components

Performance Measure

Environment

Game Environment

Turn Environment

Agents

Properties of the Environment

Results

1. Random Agent

2. Greedy Agents

3. Q-learning and SARSA Agent

4. Perceptron Q-learning Agent

Conclusion

License

About

Uh oh!

Releases

Packages

Languages

License

abelaich/YAMS-Reinforcement-Learning-Project

Folders and files

Latest commit

History

Repository files navigation

YAMS: Reinforcement Learning Project

Objective

Example of a Round

Components

Performance Measure

Environment

Game Environment

Turn Environment

Agents

Properties of the Environment

Results

1. Random Agent

2. Greedy Agents

3. Q-learning and SARSA Agent

4. Perceptron Q-learning Agent

Conclusion

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages