Store RL Agent — Gridworld Q‑learning

RL agent learns to “shop smart” in a grid world, balancing rewards and dirtiness.

Overview

A compact reinforcement‑learning project.
An agent moves on a 2D grid with multiple stores. Each store yields a multi‑attribute reward (bread, milk, eggs) and a dirt penalty.
Using Q‑learning with decaying ε‑greedy exploration, the agent learns a policy that trades off product quality and store cleanliness. The loop tracks Reward Prediction Error (RPE) and visualizes learning as a state‑value heatmap.

Features

Grid world with randomly placed stores (default: 5)
Hidden average rewards per product (bread, milk, eggs)
Dirt level per store (penalty)
Discrete actions: UP, DOWN, LEFT, RIGHT
Q‑learning with ε‑greedy decay
RPE tracking and per‑episode averages
Live heatmap of learned state values; final view after training
Saves results.csv (metrics) and q_table.pkl (learned values)

Why this matters

A clear, compact example of multi‑attribute decision‑making in RL.
It showcases core ideas—Q‑learning, exploration vs. exploitation, and RPE—while integrating multiple utilities (products) and a cost (dirt) in a single value signal.

Install

python -m venv .venv
# Windows: .venv\Scripts\activate
source .venv/bin/activate
pip install -r requirements.txt

Requires Python 3.9+.

Run

python bandit_game.py

A pygame window opens and training runs for the configured number of episodes.
At the end, the script shows a final view and then an RPE plot.

Configuration

Tune parameters near the top of bandit_game.py (grouped and commented).

Training

Key	Default
`EPISODES`	730
`MAX_STEPS_PER_EPISODE`	100
`ALPHA`	0.1
`GAMMA`	0.9

Exploration

Key	Default
`EPSILON_START`	0.9
`EPSILON_MIN`	0.05
`EPSILON_DECAY_RATE`	0.005

Environment

Key	Default
`GRID_W`, `GRID_H`	20, 15
`N_STORES`	5
`DIRT_PENALTY_SCALE`	2.0

Output

results.csv – episode metrics (e.g., average RPE, episode reward)
q_table.pkl – pickled dict of (state, action) -> Q‑value
Matplotlib window – RPE learning curve

Quick‑look (Python):

import pickle, csv
Q = pickle.load(open('q_table.pkl','rb'))
with open('results.csv') as f:
    print(next(csv.reader(f)))  # header

Repository Layout

bandit_game.py        # main script (training, visualization, saving)
requirements.txt      # pygame, matplotlib
LICENSE               # MIT
README.md             # this file

Roadmap

CLI flags (argparse) for episodes/epsilon grid search
Deterministic seeding option for reproducibility
Store distributions per product (non‑Gaussian)
Policy/value export heatmaps as images
Unit tests for reward and transition functions

License

MIT — see LICENSE.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Store RL Agent — Gridworld Q‑learning

Overview

Features

Why this matters

Install

Run

Configuration

Output

Repository Layout

Roadmap

License

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
bandit_game.py		bandit_game.py
requirements.txt		requirements.txt

License

artzylabs/store-rl-agent

Folders and files

Latest commit

History

Repository files navigation

Store RL Agent — Gridworld Q‑learning

Overview

Features

Why this matters

Install

Run

Configuration

Output

Repository Layout

Roadmap

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages