Skip to content

Commit a106204

Browse files
Adding Boltzman Model and WolfSheep Model to Mesa_RL (#197)
* Seeding RL Folder * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Formatting Corrections * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Re-formatting * Reformatting * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Minor corrections * Minor corrections * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Adding 2 more examples * Formatting Code * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Improvements * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
1 parent 9897884 commit a106204

File tree

14 files changed

+886
-0
lines changed

14 files changed

+886
-0
lines changed

rl/boltzmann_money/README.md

Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,15 @@
1+
# Balancing Wealth Inequality
2+
This folder showcases how to solve the Boltzmann wealth model with Proximal Policy Optimization (PPO) from Stable Baselines.
3+
4+
## Key features:
5+
6+
- Boltzmann Wealth Model: Agents with varying wealth navigate a grid, aiming to minimize inequality measured by the Gini coefficient.
7+
- PPO Training: A PPO agent is trained to achieve this goal, receiving sparse rewards based on Gini coefficient improvement and a large terminal reward for achieving low inequality.
8+
- Mesa Data Collection and Visualization: The Mesa data collector tool tracks Gini values during training, allowing for real-time visualization.
9+
- Visualization Script: Visualize the trained agent's behavior with Mesa's visualization tools, presenting agent movement and Gini values within the grid. You can run `server.py` file to test it with pre-trained model.
10+
11+
## Model Behaviour
12+
As stable baselines controls multiple agents with the same weight, this results in the agents learning to move towards a corner of the grid. These brings all the agents together allowing exchange of money between them resulting in reward maximization.
13+
<p align="center">
14+
<img src="ppo_agent.gif" width="500" height="400">
15+
</p>

rl/boltzmann_money/model.py

Lines changed: 161 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,161 @@
1+
"""
2+
This code implements a multi-agent model called MoneyModel using the Mesa library.
3+
The model simulates the distribution of wealth among agents in a grid environment.
4+
Each agent has a randomly assigned wealth and can move to neighboring cells.
5+
Agents can also give money to other agents in the same cell if they have greater wealth.
6+
The model is trained by a scientist who believes in an equal society and wants to minimize the Gini coefficient, which measures wealth inequality.
7+
The model is trained using the Proximal Policy Optimization (PPO) algorithm from the stable-baselines3 library.
8+
The trained model is saved as "ppo_money_model".
9+
"""
10+
11+
import random
12+
13+
import gymnasium
14+
import matplotlib.pyplot as plt
15+
16+
# Import mesa
17+
import mesa
18+
19+
# Import necessary libraries
20+
import numpy as np
21+
import seaborn as sns
22+
from mesa_models.boltzmann_wealth_model.model import (
23+
BoltzmannWealthModel,
24+
MoneyAgent,
25+
compute_gini,
26+
)
27+
28+
NUM_AGENTS = 10
29+
30+
31+
# Define the agent class
32+
class MoneyAgentRL(MoneyAgent):
33+
def __init__(self, unique_id, model):
34+
super().__init__(unique_id, model)
35+
self.wealth = np.random.randint(1, NUM_AGENTS)
36+
37+
def move(self, action):
38+
empty_neighbors = self.model.grid.get_neighborhood(
39+
self.pos, moore=True, include_center=False
40+
)
41+
42+
# Define the movement deltas
43+
moves = {
44+
0: (1, 0), # Move right
45+
1: (-1, 0), # Move left
46+
2: (0, -1), # Move up
47+
3: (0, 1), # Move down
48+
4: (0, 0), # Stay in place
49+
}
50+
51+
# Get the delta for the action, defaulting to (0, 0) if the action is invalid
52+
dx, dy = moves.get(int(action), (0, 0))
53+
54+
# Calculate the new position and wrap around the grid
55+
new_position = (
56+
(self.pos[0] + dx) % self.model.grid.width,
57+
(self.pos[1] + dy) % self.model.grid.height,
58+
)
59+
60+
# Move the agent if the new position is in empty_neighbors
61+
if new_position in empty_neighbors:
62+
self.model.grid.move_agent(self, new_position)
63+
64+
def take_money(self):
65+
# Get all agents in the same cell
66+
cellmates = self.model.grid.get_cell_list_contents([self.pos])
67+
if len(cellmates) > 1:
68+
# Choose a random agent from the cellmates
69+
other_agent = random.choice(cellmates)
70+
if other_agent.wealth > self.wealth:
71+
# Transfer money from other_agent to self
72+
other_agent.wealth -= 1
73+
self.wealth += 1
74+
75+
def step(self):
76+
# Get the action for the agent
77+
action = self.model.action_dict[self.unique_id]
78+
# Move the agent based on the action
79+
self.move(action)
80+
# Take money from other agents in the same cell
81+
self.take_money()
82+
83+
84+
# Define the model class
85+
class BoltzmannWealthModelRL(BoltzmannWealthModel, gymnasium.Env):
86+
def __init__(self, N, width, height):
87+
super().__init__(N, width, height)
88+
# Define the observation and action space for the RL model
89+
# The observation space is the wealth of each agent and their position
90+
self.observation_space = gymnasium.spaces.Box(low=0, high=10 * N, shape=(N, 3))
91+
# The action space is a MultiDiscrete space with 5 possible actions for each agent
92+
self.action_space = gymnasium.spaces.MultiDiscrete([5] * N)
93+
self.is_visualize = False
94+
95+
def step(self, action):
96+
self.action_dict = action
97+
# Perform one step of the model
98+
self.schedule.step()
99+
# Collect data for visualization
100+
self.datacollector.collect(self)
101+
# Compute the new Gini coefficient
102+
new_gini = compute_gini(self)
103+
# Compute the reward based on the change in Gini coefficient
104+
reward = self.calculate_reward(new_gini)
105+
self.prev_gini = new_gini
106+
# Get the observation for the RL model
107+
obs = self._get_obs()
108+
if self.schedule.time > 5 * NUM_AGENTS:
109+
# Terminate the episode if the model has run for a certain number of timesteps
110+
done = True
111+
reward = -1
112+
elif new_gini < 0.1:
113+
# Terminate the episode if the Gini coefficient is below a certain threshold
114+
done = True
115+
reward = 50 / self.schedule.time
116+
else:
117+
done = False
118+
info = {}
119+
truncated = False
120+
return obs, reward, done, truncated, info
121+
122+
def calculate_reward(self, new_gini):
123+
if new_gini < self.prev_gini:
124+
# Compute the reward based on the decrease in Gini coefficient
125+
reward = (self.prev_gini - new_gini) * 20
126+
else:
127+
# Penalize for increase in Gini coefficient
128+
reward = -0.05
129+
self.prev_gini = new_gini
130+
return reward
131+
132+
def visualize(self):
133+
# Visualize the Gini coefficient over time
134+
gini = self.datacollector.get_model_vars_dataframe()
135+
g = sns.lineplot(data=gini)
136+
g.set(title="Gini Coefficient over Time", ylabel="Gini Coefficient")
137+
plt.show()
138+
139+
def reset(self, *, seed=None, options=None):
140+
if self.is_visualize:
141+
# Visualize the Gini coefficient before resetting the model
142+
self.visualize()
143+
super().reset()
144+
self.grid = mesa.space.MultiGrid(self.grid.width, self.grid.height, True)
145+
self.schedule = mesa.time.RandomActivation(self)
146+
for i in range(self.num_agents):
147+
# Create MoneyAgentRL instances and add them to the schedule
148+
a = MoneyAgentRL(i, self)
149+
self.schedule.add(a)
150+
x = self.random.randrange(self.grid.width)
151+
y = self.random.randrange(self.grid.height)
152+
self.grid.place_agent(a, (x, y))
153+
self.prev_gini = compute_gini(self)
154+
return self._get_obs(), {}
155+
156+
def _get_obs(self):
157+
# The observation is the wealth of each agent and their position
158+
obs = []
159+
for a in self.schedule.agents:
160+
obs.append([a.wealth, *list(a.pos)])
161+
return np.array(obs)

rl/boltzmann_money/ppo_agent.gif

389 KB
Loading

rl/boltzmann_money/server.py

Lines changed: 69 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,69 @@
1+
import os
2+
3+
import mesa
4+
from mesa.visualization.ModularVisualization import ModularServer
5+
from mesa.visualization.modules import ChartModule
6+
from model import BoltzmannWealthModelRL
7+
from stable_baselines3 import PPO
8+
9+
10+
# Modify the MoneyModel class to take actions from the RL model
11+
class MoneyModelRL(BoltzmannWealthModelRL):
12+
def __init__(self, N, width, height):
13+
super().__init__(N, width, height)
14+
model_path = os.path.join(
15+
os.path.dirname(__file__), "..", "model", "boltzmann_money.zip"
16+
)
17+
self.rl_model = PPO.load(model_path)
18+
self.reset()
19+
20+
def step(self):
21+
# Collect data
22+
self.datacollector.collect(self)
23+
24+
# Get observations which is the wealth of each agent and their position
25+
obs = self._get_obs()
26+
27+
action, _states = self.rl_model.predict(obs)
28+
self.action_dict = action
29+
self.schedule.step()
30+
31+
32+
# Define the agent portrayal with different colors for different wealth levels
33+
def agent_portrayal(agent):
34+
if agent.wealth > 10:
35+
color = "purple"
36+
elif agent.wealth > 7:
37+
color = "red"
38+
elif agent.wealth > 5:
39+
color = "orange"
40+
elif agent.wealth > 3:
41+
color = "yellow"
42+
else:
43+
color = "blue"
44+
45+
portrayal = {
46+
"Shape": "circle",
47+
"Filled": "true",
48+
"Layer": 0,
49+
"Color": color,
50+
"r": 0.5,
51+
}
52+
return portrayal
53+
54+
55+
if __name__ == "__main__":
56+
# Define a grid visualization
57+
grid = mesa.visualization.CanvasGrid(agent_portrayal, 10, 10, 500, 500)
58+
59+
# Define a chart visualization
60+
chart = ChartModule(
61+
[{"Label": "Gini", "Color": "Black"}], data_collector_name="datacollector"
62+
)
63+
64+
# Create a modular server
65+
server = ModularServer(
66+
MoneyModelRL, [grid, chart], "Money Model", {"N": 10, "width": 10, "height": 10}
67+
)
68+
server.port = 8521 # The default
69+
server.launch()

rl/boltzmann_money/train.py

Lines changed: 35 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,35 @@
1+
import argparse
2+
3+
from model import NUM_AGENTS, BoltzmannWealthModelRL
4+
from stable_baselines3 import PPO
5+
from stable_baselines3.common.callbacks import EvalCallback
6+
7+
8+
def rl_model(args):
9+
# Create the environment
10+
env = BoltzmannWealthModelRL(N=NUM_AGENTS, width=NUM_AGENTS, height=NUM_AGENTS)
11+
eval_env = BoltzmannWealthModelRL(N=NUM_AGENTS, width=NUM_AGENTS, height=NUM_AGENTS)
12+
eval_callback = EvalCallback(
13+
eval_env, best_model_save_path="./logs/", log_path="./logs/", eval_freq=5000
14+
)
15+
# Define the PPO model
16+
model = PPO("MlpPolicy", env, verbose=1, tensorboard_log="./logs/")
17+
18+
# Train the model
19+
model.learn(total_timesteps=args.stop_timesteps, callback=[eval_callback])
20+
21+
# Save the model
22+
model.save("ppo_money_model")
23+
24+
25+
if __name__ == "__main__":
26+
# Define the command line arguments
27+
parser = argparse.ArgumentParser()
28+
parser.add_argument(
29+
"--stop-timesteps",
30+
type=int,
31+
default=NUM_AGENTS * 100,
32+
help="Number of timesteps to train.",
33+
)
34+
args = parser.parse_args()
35+
rl_model(args)

rl/wolf_sheep/README.md

Lines changed: 33 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,33 @@
1+
# Collaborative Survival: Wolf-Sheep Predation Model
2+
3+
This project demonstrates the use of the RLlib library to implement Multi-Agent Reinforcement Learning (MARL) in the classic Wolf-Sheep predation problem. The environment details can be found on the Mesa project's GitHub repository [here](https://github.com/projectmesa/mesa-examples/tree/main/examples/wolf_sheep).
4+
5+
## Key Features
6+
7+
**RLlib and Multi-Agent Learning**:
8+
- **Library Utilized**: The project leverages the RLlib library to concurrently train two independent PPO (Proximal Policy Optimization) agents.
9+
- **Agents**:
10+
- **Wolf**: Predatory agent survives by eating sheeps
11+
- **Sheep**: Prey agent survives by eating grass
12+
- **Grass**: Grass is eaten by sheep and regrows with time
13+
14+
**Input and Observation Space**:
15+
- **Observation Grid**: Each agent's policy receives a 10x10 grid centered on itself as input.
16+
- **Grid Details**: The grid incorporates information about the presence of other agents (wolves, sheep, and grass) within the grid.
17+
- **Agent's Energy Level**: The agent's current energy level is also included in the observations.
18+
19+
**Action Space**:
20+
- **Action Space**: The action space is the ID of the neighboring tile to which the agent wants to move.
21+
22+
**Behavior and Training Outcomes**:
23+
- **Optimal Behavior**:
24+
- **Wolf**: Learns to move towards the nearest sheep.
25+
- **Sheep**: Learns to run away from wolves and is attracted to grass.
26+
- **Density Variations**: You can vary the densities of sheep and wolves to observe different results.
27+
28+
By leveraging RLlib and Multi-Agent Learning, this project provides insights into the dynamics of predator-prey relationships and optimal behavior strategies in a simulated environment.
29+
30+
31+
<p align="center">
32+
<img src="resources/wolf_sheep.gif" width="500" height="400">
33+
</p>

0 commit comments

Comments
 (0)