GitHub - nihareeka/Reinforcement-Learning-Mario-Kart: This repository contains the code and documentation for the project undertaken in the course CSE 4/574: Introduction to Machine Learning during the Spring 2023 semester. The project focuses on defining and solving a Reinforcement Learning task in a Mario Kart environment.

Defining and Solving Reinforcement Learning Task

Part I

Environment Description

The environment is a Mario Kart grid world where Mario navigates a 4x4 grid. Mario can move up, down, left, and right. Rewards (+4, +7) and penalties (-3, -5) are represented by coin bags and bombs, respectively. The goal is for Mario to reach the bottom-right cell, starting from the top-left. The objective is to maximize the score by collecting rewards and avoiding penalties, with a trophy reward of +10 for reaching the goal.

Theme

Mario Grid World with coin bags as positive rewards and bombs as negative rewards.

States

{S1 = (0,0), S2 = (0,1), ..., S16 = (3,3)}

Initial and Goal States

Initial state: S1 = (0,0) Goal state: S16 = (3,3)

Actions

{Up, Down, Right, Left} => {3,2,0,1}

Rewards

{-3, -5, +4, +7, +10}

Objective

Reach the goal state with maximum reward

Visualization

Visualization of the grid world and agent's movements

Safety in AI

To ensure safety, agent movements are confined to the 4x4 grid using np.clip. The state-space is defined, and the reset_state variable resets the agent's position to a valid state. Constraints are enforced on actions to prevent invalid moves.

Part II and III

Tabular Methods Used

SARSA (State-Action-Reward-State-Action)
- Update function: Q(s,a) <- Q(s,a) + α[r + γQ(s',a') - Q(s,a)]
- Key features: On-policy, TD learning, Control algorithm
Q-learning
- Update function: Q(s,a) <- Q(s,a) + α[r + γmax(Q(s',a')) - Q(s,a)]
- Key features: Off-policy, TD learning, Control algorithm

Results after Applying SARSA and Q-learning

[Plots for SARSA: Total rewards per episode, Epsilon Decay] [Plots for Q-learning: Total rewards per episode, Epsilon Decay]

Evaluation Results

[Plots for SARSA and Q-learning on test data]

Performance Comparison

[Comparison of SARSA and Q-learning performance]

Hyperparameter Tuning

SARSA

Setup 1: Tuning γ
- γ = 0.8
- γ = 0.4
- γ = 1.0
Setup 2: Tuning decay_rate
- decay_rate = 0.8
- decay_rate = 0.6
- decay_rate = 0.3

Q-Learning

Setup 1: Tuning γ
- γ = 0.8
- γ = 0.4
- γ = 0.1
Setup 2: Tuning decay_rate
- decay_rate = 0.8
- decay_rate = 0.6
- decay_rate = 0.3

Conclusion

After tuning, optimal values for SARSA and Q-learning were found to be gamma = 0.99 and decay rate = 0.995.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
README.md		README.md
RL_Mario_Kart.ipynb		RL_Mario_Kart.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Defining and Solving Reinforcement Learning Task

Part I

Environment Description

Theme

States

Initial and Goal States

Actions

Rewards

Objective

Visualization

Safety in AI

Part II and III

Tabular Methods Used

Results after Applying SARSA and Q-learning

Evaluation Results

Performance Comparison

Hyperparameter Tuning

SARSA

Q-Learning

Conclusion

About

Uh oh!

Releases

Packages

Languages

nihareeka/Reinforcement-Learning-Mario-Kart

Folders and files

Latest commit

History

Repository files navigation

Defining and Solving Reinforcement Learning Task

Part I

Environment Description

Theme

States

Initial and Goal States

Actions

Rewards

Objective

Visualization

Safety in AI

Part II and III

Tabular Methods Used

Results after Applying SARSA and Q-learning

Evaluation Results

Performance Comparison

Hyperparameter Tuning

SARSA

Q-Learning

Conclusion

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages