MARL-SFE

About • Requirements • Train Agents • Execute Trained Agents • Cite

A Multi-Agent Reinforcement Learning Simulated Farm Environment (MARL-SFE).

1. About MARL-SFE

MARL-SFE is a virtual 2D-grid farm for Multi-Agent Reinforcement Learning training. Inspired by Deepmind's RL benchmark on Attari games, this environment is meant to serve as training ground for MARL algorithms, where agents try to collect crops in a simulated farm cooperatively. Additionally, this repository contains the result of training agents using IPPO and MADDPG state-of-the-art algorithms.

1.1. Features

🚶🏼‍♂️ Agents can move: ${up, \ down, \ left, \ right, \ idle }$
👨🏼‍🌾 Agents harvest crops by landing on the same grid cell as a crop
🧅 Onions take 2 timesteps to be harvested
🥕 Carrots take 2 simultaneous agents to be harvested
👀 Agents get a partial observation of the state (environment) as their field of view
⚠️ Pesticides are sprayed, reducing the spawn rates and time-to-live for every crop

2. Algorithms Used

2.1. Independent Proximal Policy Optimization (IPPO)

Independent Proximal Policy Optimization (IPPO) is an algorithm from "Is Independent Learning All You Need in the StarCraft Multi-Agent Challenge?" Christian Schroeder de Witt et al. (2020). It is based on Trust Region Policy Optimization (TRPO) and standard Proximal Policy Optimization (PPO).

2.1.1. IPPO - skrl

To adapt and implement IPPO into MARL-SFE, this project has made use of the RL library skrl. Refer to its documentation here.

2.2. Multi-Agent Deep Deterministic Policy Gradient (MADDPG)

Multi-Agent Deep Deterministic Policy Gradient (MADDPG) is an algorithm from "Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments" Ryan Lowe et al. (2020). It originates from Deterministic Policy Gradient (DPG) algorithms.

2.2.2. MADDPG - AgileRL

To adapt and implement MADDPG into MARL-SFE, this project has made use of the RL library AgileRL. Refer to its documentation here.

2.3. Training Results (IPPO and MADDPG)

Several rounds of training and hyperparameter tuning of MADDPG and IPPO have led to the results displayed in the following graph, showing the mean episodic reward obtained during training, where the shaded regions are $\pm$ standard deviation. For additional context on the training scenario, the environment has been truncated after 150 timesteps or terminated if the global reward reached $-200$. Results shown deploy 5 agents for each algorithm.

3. Requirements

It is encouraged to use a python virtual environment to manage dependencies (For this project, Miniconda was used, with Python 3.12.8). To train and execute the environment with the algorithms used, the list of Python dependencies can be found in libs/.

For IPPO (in libs/)

pip install -r ippo.txt

For MADDPG (in libs/)

pip install -r maddpg.txt

4. Running the Environment

This project uses Python module-based imports, thus, please, execute the following commands from the project's root directory.

4.1. Train MARL Agents in MARL-SFE

The training scripts are found in algorithms/MADDPG as MADDPG_MARL-SFE_train.py and in algorithms/IPPO as IPPO_MARL-SFE_train.py. These files manage the training loop, and it is where hyperparameters are defined. To execute a training run, do the following:

For IPPO:

python -m algorithms.IPPO.IPPO_MARL-SFE_train

For MADDPG:

python -m algorithms.MADDPG.MADDPG_MARL-SFE_train

4.2. Execute Trained Agents

Trained agents are already provided, and the execution scripts to watch the agents 'play' are found in algorithms/MADDPG as MADDPG_MARL-SFE_execute.py and in algorithms/IPPO as IPPO_MARL-SFE_execute.py. For the agents that are already trained to run properly, make sure you do not change the structure of the NNs defined in the scripts. If you wish to run the already trained agents, provided in runs/torch, run the execution scripts like so:

For IPPO:

python -m algorithms.IPPO.IPPO_MARL-SFE_execute

For MADDPG:

python -m algorithms.MADDPG.MADDPG_MARL-SFE_execute

5. Cite this Project

In the case the project wants to be referenced, a BibTeX entry is provided below:

@software{MARL-SFE,
  author = {Gines Moratalla},
  license = {MIT},
  title = {{Multi-Agent Reinforcement Learning Simulated Farm Environment (MARL-SFE)}},
  url = {https://github.com/ginesmoratalla/MARL-SFE}
}

Name		Name	Last commit message	Last commit date
Latest commit History 97 Commits
.github/workflows		.github/workflows
algorithms		algorithms
assets		assets
environment		environment
fonts		fonts
libs		libs
runs/torch		runs/torch
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

MARL-SFE

1. About MARL-SFE

1.1. Features

2. Algorithms Used

2.1. Independent Proximal Policy Optimization (IPPO)

2.1.1. IPPO - skrl

2.2. Multi-Agent Deep Deterministic Policy Gradient (MADDPG)

2.2.2. MADDPG - AgileRL

2.3. Training Results (IPPO and MADDPG)

3. Requirements

4. Running the Environment

4.1. Train MARL Agents in MARL-SFE

4.2. Execute Trained Agents

5. Cite this Project

About

Uh oh!

Releases

Packages

Languages

License

ginesmoratalla/MARL-SFE

Folders and files

Latest commit

History

Repository files navigation

MARL-SFE

1. About MARL-SFE

1.1. Features

2. Algorithms Used

2.1. Independent Proximal Policy Optimization (IPPO)

2.1.1. IPPO - skrl

2.2. Multi-Agent Deep Deterministic Policy Gradient (MADDPG)

2.2.2. MADDPG - AgileRL

2.3. Training Results (IPPO and MADDPG)

3. Requirements

4. Running the Environment

4.1. Train MARL Agents in MARL-SFE

4.2. Execute Trained Agents

5. Cite this Project

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages