Skip to content

mwasifanwar/multi_agent_rl

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Multi-Agent Reinforcement Learning for Complex Cooperative and Competitive Systems

A comprehensive framework for advanced multi-agent reinforcement learning research, implementing state-of-the-art algorithms for complex cooperative, competitive, and mixed scenarios.

Overview

This project provides a sophisticated multi-agent reinforcement learning (MARL) framework that enables research and development of intelligent agent systems capable of learning complex coordinated behaviors. The framework implements cutting-edge MARL algorithms including MADDPG, QMIX, IQL, and MAPPO, with support for both centralized and decentralized training paradigms.

Key objectives include:

  • Enabling research in emergent multi-agent behaviors
  • Providing scalable implementations of state-of-the-art MARL algorithms
  • Supporting both cooperative and competitive multi-agent scenarios
  • Facilitating curriculum learning and complex environment design
  • Offering comprehensive evaluation and visualization tools
image

System Architecture

The framework follows a modular architecture that separates environment simulation, agent policies, learning algorithms, and evaluation utilities. The core system workflow operates as follows:


Environment Observation → Agent Policy Networks → Action Selection → Environment Step
       ↑                                                                  ↓
    Replay Buffer ← Experience Storage ← Reward Calculation ← Next Observation
       ↓
Algorithm Update (Policy Gradients, Q-learning, etc.) → Model Improvement
image

The architecture supports both decentralized execution (where agents make decisions based on local observations) and centralized training (where additional global information may be used during learning).

Technical Stack

  • Deep Learning Framework: PyTorch 1.9+
  • Environment Simulation: OpenAI Gym
  • Numerical Computing: NumPy
  • Visualization: Matplotlib, Seaborn
  • Development Tools: pytest, black, flake8

Mathematical Foundation

The framework implements several advanced MARL algorithms based on rigorous mathematical foundations:

Multi-Agent Deep Deterministic Policy Gradient (MADDPG)

MADDPG extends DDPG to multi-agent settings using centralized critics. The objective function for agent $i$ is:

$J(\theta_i) = \mathbb{E}_{\mathbf{o},\mathbf{a} \sim \mathcal{D}}[Q_i^{\phi}(\mathbf{o}, a_1, \ldots, a_N)]$

where the centralized action-value function $Q_i^{\phi}$ takes all agents' actions as input.

QMIX

QMIX employs monotonic value function factorization to ensure individual action selection consistency with joint action-value maximization:

$\frac{\partial Q_{tot}}{\partial Q_a} \geq 0 \quad \forall a \in A$

The mixing network ensures this monotonicity constraint while allowing rich representation of the joint action-value function.

Multi-Agent PPO (MAPPO)

MAPPO extends PPO to multi-agent settings with the clipped objective:

$L^{CLIP}(\theta) = \mathbb{E}[\min(r_t(\theta)\hat{A}_t, \text{clip}(r_t(\theta), 1-\epsilon, 1+\epsilon)\hat{A}_t)]$

where $r_t(\theta) = \frac{\pi_\theta(a_t|s_t)}{\pi_{\theta_{old}}(a_t|s_t)}$ and $\hat{A}_t$ is the advantage estimate.

Features

  • Multiple Environment Types: Cooperative navigation, predator-prey, and custom mixed scenarios
  • Advanced Algorithms: MADDPG, QMIX, IQL, MAPPO with centralized and decentralized variants
  • Flexible Architecture: Support for both cooperative and competitive multi-agent learning
  • Comprehensive Training: Experience replay, prioritized replay, curriculum learning
  • Sophisticated Analysis: Cooperation metrics, emergence analysis, performance evaluation
  • Visualization Tools: Learning curves, agent performance, action distributions
  • Modular Design: Easy extension with new algorithms and environments
image

Installation

Follow these steps to set up the environment and install dependencies:


# Clone the repository
git clone https://github.com/mwasifanwar/multi-agent-rl.git
cd multi-agent-rl

# Create a virtual environment (recommended)
python -m venv marl_env
source marl_env/bin/activate  # On Windows: marl_env\Scripts\activate

# Install dependencies
pip install -r requirements.txt

# Install the package in development mode
pip install -e .

# Verify installation
python -c "import multi_agent_rl; print('Installation successful!')"

Usage / Running the Project

Basic Training Example


# Train cooperative agents using MADDPG
python main.py --mode cooperative --algorithm MADDPG --episodes 1000 --agents 3

Train competitive predator-prey scenario

python main.py --mode competitive --episodes 1500

Run mixed cooperative-competitive training

python main.py --mode mixed --episodes 2000

Advanced Usage with Custom Configuration


import torch
from multi_agent_rl.core import CooperativeNavigationEnv
from multi_agent_rl.algorithms import MADDPG

Initialize environment and algorithm

env = CooperativeNavigationEnv(num_agents=3, state_dim=20, action_dim=5) algorithm = MADDPG(num_agents=3, state_dim=20, action_dim=5)

Training loop

for episode in range(1000): observations = env.reset() episode_rewards = {f'agent_{i}': 0 for i in range(3)}

for step in range(200):
    actions = algorithm.select_actions(observations, training=True)
    next_observations, rewards, dones, infos = env.step(actions)
    
    # Store experience and update
    # ... (complete training logic)
    
    observations = next_observations
    
    if all(dones.values()):
        break

# Periodic evaluation and model saving
if episode % 100 == 0:
    print(f"Episode {episode}, Total Reward: {sum(episode_rewards.values()):.2f}")

Configuration / Parameters

Algorithm Hyperparameters

  • Learning Rates: Actor (0.001), Critic (0.001)
  • Discount Factor (γ): 0.95-0.99
  • Replay Buffer Size: 10,000-20,000 experiences
  • Batch Size: 256-512
  • Exploration: ε-greedy with decay from 1.0 to 0.01
  • Target Network Update (τ): 0.01 for soft updates

Environment Parameters

  • State Dimensions: 10-25 depending on scenario complexity
  • Action Space: Discrete (4-6 actions) or continuous
  • Episode Length: 200-300 steps
  • Agent Count: 2-6 agents for different scenarios

Folder Structure


multi_agent_rl/
├── core/                          # Core framework components
│   ├── __init__.py
│   ├── multi_agent_env.py         # Base environment class
│   ├── agent_manager.py           # Agent coordination and management
│   ├── policy_network.py          # Neural network architectures
│   ├── value_network.py           # Value function approximators
│   └── experience_replay.py       # Replay buffer implementations
├── algorithms/                    # MARL algorithm implementations
│   ├── __init__.py
│   ├── maddpg.py                  # MADDPG algorithm
│   ├── qmix.py                    # QMIX algorithm
│   ├── iql.py                     # Independent Q-learning
│   └── mappo.py                   # Multi-agent PPO
├── environments/                  # Custom environment implementations
│   ├── __init__.py
│   ├── cooperative_navigation.py  # Cooperative multi-agent navigation
│   └── predator_prey.py           # Competitive predator-prey scenario
├── utils/                         # Utility functions and tools
│   ├── __init__.py
│   ├── training_utils.py          # Training helpers and curriculum learning
│   ├── evaluation_utils.py        # Performance metrics and analysis
│   └── visualization_utils.py     # Plotting and visualization
├── examples/                      # Example training scripts
│   ├── __init__.py
│   ├── cooperative_training.py    # Cooperative scenario examples
│   ├── competitive_training.py    # Competitive scenario examples
│   └── mixed_training.py          # Mixed scenario examples
├── requirements.txt               # Python dependencies
├── setup.py                       # Package installation script
└── main.py                        # Main entry point

Results / Experiments / Evaluation

Performance Metrics

The framework includes comprehensive evaluation metrics:

  • Total Episode Reward: Sum of all agents' rewards
  • Cooperation Score: Measures coordination between agents (0-1 scale)
  • Fairness Index: Measures reward distribution equality
  • Action Coordination: Temporal alignment of agent actions
  • Success Rate: Task completion frequency

Experimental Results

Typical training performance across different scenarios:

  • Cooperative Navigation: Agents learn to efficiently cover targets with 80-95% success rate
  • Predator-Prey: Predators develop coordinated hunting strategies with 70-85% capture rate
  • Mixed Scenarios: Teams learn both cooperative and competitive behaviors simultaneously

Emergent Behavior Analysis

The framework enables analysis of complex emergent behaviors:

  • Role Specialization: Agents spontaneously develop specialized roles
  • Communication Patterns: Implicit communication through action sequences
  • Strategy Evolution: Progressive development of sophisticated multi-step strategies

References / Citations

  1. Lowe, R., et al. "Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments." NeurIPS 2017. arXiv:1706.02275
  2. Rashid, T., et al. "QMIX: Monotonic Value Function Factorisation for Deep Multi-Agent Reinforcement Learning." ICML 2018. arXiv:1803.11485
  3. Yu, C., et al. "The Surprising Effectiveness of PPO in Cooperative Multi-Agent Games." NeurIPS 2021. arXiv:2103.01955
  4. Tan, M. "Multi-Agent Reinforcement Learning: Independent vs. Cooperative Agents." ICML 1993.
  5. Foerster, J., et al. "Counterfactual Multi-Agent Policy Gradients." AAAI 2018. arXiv:1705.08926

Acknowledgements

This project builds upon foundational research in multi-agent reinforcement learning and leverages several open-source libraries:

  • PyTorch team for the deep learning framework
  • OpenAI for the Gym environment interface
  • The multi-agent RL research community for algorithm development
  • Contributors to the numerical computing and visualization ecosystems

✨ Author

M Wasif Anwar
AI/ML Engineer | Effixly AI

LinkedIn Email Website GitHub



⭐ Don't forget to star this repository if you find it helpful!

About

Coordinated multi-agent systems that learn to solve complex collaborative and competitive tasks.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages