Skip to content

abelaich/YAMS-Reinforcement-Learning-Project

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 

Repository files navigation

YAMS: Reinforcement Learning Project

Objective

The goal of the YAMS Reinforcement Learning (RL) project is to develop an agent capable of playing the Yahtzee game optimally using RL algorithms. The agent will aim to maximize its total score after playing 10 games, learning how to make the best decisions about which dice to keep and which to re-roll. The RL model will be evaluated on its ability to choose optimal actions based on the game’s state and its learning progress.

Each game of Yahtzee is divided into thirteen rounds. During each round, a player rolls five dice. The player can re-roll the dice up to two more times and may choose which specific dice to re-roll. After completing the rolls, the player must assign the final roll to one of the thirteen categories on their score-sheet. The score for each round is determined by how well the dice match the chosen category. Once a category is selected, it cannot be used again for the remainder of the game. The game ends when all categories are filled, and the player with the highest total score wins.

Example of a Round

For example, imagine a player rolls a 1, 2, 2, 3, and 5 on their first roll. The player decides to re-roll the 3 and 5, obtaining a 2 and 4. The player re-rolls the 4 again and gets another 2, resulting in a final roll of 1, 2, 2, 2, 2. The player then assigns this roll to the "Twos" category, where the score is the sum of all dice that show a 2. In this case, the score would be 2 + 2 + 2 + 2 = 8 points for that round.

image

Components

Performance Measure

The agent’s objective is to maximize the total score after 10 games. The performance measures include:

  • Total Score: The sum of dice values after 10 games.
  • Efficiency: The ability to make optimal decisions after each roll.
  • Number of Dice Kept: Indicates the strategy adopted by the agent.

Environment

The environment consists of two parts: the full game environment and the turn environment.

Game Environment

The game environment defines the rules of the Yahtzee game, manages turns and episodes, and assigns rewards based on the actions taken.

Main Attributes:

  • self.dices, self.faces, self.turns: Parameters of the game.
  • self.S: Describes the point categories in the game (sum of faces, three of a kind, pair, chance, etc.).

Key Methods:

  • get_action: Calculates the possible rewards for a given state.
  • play_episode: Simulates a full game episode, including dice rolls and selected actions.
  • choose_action and choose_random_action: Select actions either strategically (via Q-values) or randomly.

Turn Environment

The turn environment models a single game round, managing the state of the dice, possible actions, and transitions between states.

Main Attributes:

  • self.dices, self.faces: Number of dice and sides per die.
  • self.Roll, self.Roll_P: Possible dice states and their probabilities.
  • self.S: List of initial dice states.
  • self.Aa: List of all possible actions.

Key Methods:

  • get_states: Generates all possible dice states and their probabilities.
  • get_actions_from_state: Determines the possible actions for a given state.
  • get_actions_list: Generates a global list of all possible actions for all states.
  • One_step_backward: Updates state values and Q-values through a dynamic learning step.
  • choose_best_action: Selects the action with the highest Q-value for a given state.

Agents

The project features several agents that employ different reinforcement learning techniques, including:

  • Random Agent: Chooses actions randomly.
  • Greedy Agent: Selects the action with the highest immediate reward.
  • Monte Carlo Agent: Learns from complete episodes by averaging returns from different states.
  • Q-learning Agent: Uses Q-values to make decisions based on past experiences.
  • SARSA Agent: Similar to Q-learning but updates Q-values using the action taken in the next step.
  • Perceptron Q-learning Agent: Combines Q-learning with a perceptron model for function approximation.

Properties of the Environment

Property Description
Observability The environment is fully observable. The agent can see the value of all dice after each roll.
Determinism The environment is stochastic due to the random nature of dice rolls.
Dynamics The environment is sequential, as decisions (such as which dice to keep or re-roll) affect future outcomes.
Time Complexity The game is discrete, with each action occurring in defined steps (roll, keep, re-roll).
Autonomy The agent is autonomous, making decisions based on its observations, without external interference.
Multi-agent This is a single-agent environment, where only one player interacts with the environment.

Results

1. Random Agent

  • Score max: 186
  • Score min: 82
  • Score moyen: 119.01
image

2. Greedy Agents

  • Greedy Agent Level 1: It never re-rolls the dice after the first roll and chooses the action that immediately maximizes its score.
  • Greedy Agent Level 2: It can re-roll once and chooses the action that maximizes the expected score after one re-roll.
  • Greedy Agent Level 3: It can re-roll twice and chooses the action that maximizes the expected score after two re-rolls.
image image image

3. Q-learning and SARSA Agent

  • SARSA Update:
    Q(s, a) ← Q(s, a) + α [ r + γ Q(s', a') - Q(s, a) ]
    where a' is the next action chosen by the policy in the next state s'. image
  • Q-Learning Update:
    Q(s, a) ← Q(s, a) + α [ r + γ maxa' Q(s', a') - Q(s, a) ]
    Q-Learning uses the best possible Q value without considering the agent's current policy. image

4. Perceptron Q-learning Agent

Ref : https://web.stanford.edu/class/aa228/reports/2018/final75.pdf (Paper Reinforcement Learning for Solving Yahtzee)

image

Conclusion

  • Number of dice: 5
  • Number of faces per die: 6
  • Epsilon (exploration): 0.2
  • Alpha (learning rate): 0.01
  • Gamma (discount factor): 0.9
  • Maximum number of turns per game: 7
image

License

This project is licensed under the MIT License - see the LICENSE file for details.

About

Develop RL agents capable of playing the Yahtzee game optimally and maximizing their score.

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages