About • Requirements • Train Agents • Execute Trained Agents • Cite
A Multi-Agent Reinforcement Learning Simulated Farm Environment (MARL-SFE).
MARL-SFE is a virtual 2D-grid farm for Multi-Agent Reinforcement Learning training. Inspired by Deepmind's RL benchmark on Attari games, this environment is meant to serve as training ground for MARL algorithms, where agents try to collect crops in a simulated farm cooperatively. Additionally, this repository contains the result of training agents using IPPO and MADDPG state-of-the-art algorithms.
- 🚶🏼♂️ Agents can move:
${up, \ down, \ left, \ right, \ idle }$ - 👨🏼🌾 Agents harvest crops by landing on the same grid cell as a crop
- 🧅 Onions take 2 timesteps to be harvested
- 🥕 Carrots take 2 simultaneous agents to be harvested
- 👀 Agents get a partial observation of the state (environment) as their field of view
⚠️ Pesticides are sprayed, reducing the spawn rates and time-to-live for every crop
Independent Proximal Policy Optimization (IPPO) is an algorithm from "Is Independent Learning All You Need in the StarCraft Multi-Agent Challenge?" Christian Schroeder de Witt et al. (2020). It is based on Trust Region Policy Optimization (TRPO) and standard Proximal Policy Optimization (PPO).
To adapt and implement IPPO into MARL-SFE, this project has made use of the RL library skrl. Refer to its documentation here.
Multi-Agent Deep Deterministic Policy Gradient (MADDPG) is an algorithm from "Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments" Ryan Lowe et al. (2020). It originates from Deterministic Policy Gradient (DPG) algorithms.
To adapt and implement MADDPG into MARL-SFE, this project has made use of the RL library AgileRL. Refer to its documentation here.
Several rounds of training and hyperparameter tuning of MADDPG and IPPO have led to the results displayed in the following graph, showing the mean episodic reward obtained during training, where the shaded regions are
It is encouraged to use a python virtual environment to manage dependencies (For this project, Miniconda was used, with Python 3.12.8). To train and execute the environment with the algorithms used, the list of Python dependencies can be found in libs/
.
For IPPO (in libs/
)
pip install -r ippo.txt
For MADDPG (in libs/
)
pip install -r maddpg.txt
This project uses Python module-based imports, thus, please, execute the following commands from the project's root directory.
The training scripts are found in algorithms/MADDPG
as MADDPG_MARL-SFE_train.py
and in algorithms/IPPO
as IPPO_MARL-SFE_train.py
. These files manage the training loop, and it is where hyperparameters are defined. To execute a training run, do the following:
For IPPO:
python -m algorithms.IPPO.IPPO_MARL-SFE_train
For MADDPG:
python -m algorithms.MADDPG.MADDPG_MARL-SFE_train
Trained agents are already provided, and the execution scripts to watch the agents 'play' are found in algorithms/MADDPG
as MADDPG_MARL-SFE_execute.py
and in algorithms/IPPO
as IPPO_MARL-SFE_execute.py
. For the agents that are already trained to run properly, make sure you do not change the structure of the NNs defined in the scripts. If you wish to run the already trained agents, provided in runs/torch
, run the execution scripts like so:
For IPPO:
python -m algorithms.IPPO.IPPO_MARL-SFE_execute
For MADDPG:
python -m algorithms.MADDPG.MADDPG_MARL-SFE_execute
In the case the project wants to be referenced, a BibTeX entry is provided below:
@software{MARL-SFE,
author = {Gines Moratalla},
license = {MIT},
title = {{Multi-Agent Reinforcement Learning Simulated Farm Environment (MARL-SFE)}},
url = {https://github.com/ginesmoratalla/MARL-SFE}
}