This is a collection of Multi-Agent Reinforcement Learning (MARL) papers with code. I have selected some relatively important papers with open source code and categorized them by time and method based on the efforts of Chen, Hao, Multi-Agent Reinforcement Learning Papers with Code, thanks to him.
Then I'll update this collection for deeper study.
- 经典论文
- 其他论文
- 综述
- 环境
- 多优化目标
- 信用分配
- 多任务
- 通信
- 涌现
- 对手建模
- 博弈论
- 分层
- 角色
- 大规模
- 即兴协作
- 进化算法
- 团队训练
- 课程学习
- 平均场
- 迁移学习
- 元学习
- 公平性
- 奖励搜索
- 稀疏奖励
- 图神经网络
- 基于模型的
- 神经架构搜索NAS
- 安全学习
- 单智能体到多智能体
- 动作空间
- 多样性
- 分布式训练分布式执行
- 离线多智能体强化学习
- 对抗
- 模仿学习
- 训练数据
- 优化器
- 待分类
- 致谢
- A Survey and Critique of Multiagent Deep Reinforcement Learning
- An Overview of Multi-Agent Reinforcement Learning from Game Theoretical Perspective
- Multi-Agent Reinforcement Learning: A Selective Overview of Theories and Algorithms
- A Review of Cooperative Multi-Agent Deep Reinforcement Learning
- Dealing with Non-Stationarity in Multi-Agent Deep Reinforcement Learning
- A Survey of Learning in Multiagent Environments: Dealing with Non-Stationarity
- Deep Reinforcement Learning for Multi-Agent Systems: A Review of Challenges, Solutions and Applications
- A Survey on Transfer Learning for Multiagent Reinforcement Learning Systems
- [If multi-agent learning is the answer, what is the question?](https://ai.stanford.edu/people/shoham/www papers/LearningInMAS.pdf)
- Multiagent learning is not the answer. It is the question
- Is multiagent deep reinforcement learning the answer or the question? A brief survey Note that A Survey and Critique of Multiagent Deep Reinforcement Learning is an updated version of this paper with the same authors.
- Evolutionary Dynamics of Multi-Agent Learning: A Survey
- (Worth reading although they're not recent reviews.)
Paper | Key Words | Code | Accepted at | Year | others | |
---|---|---|---|---|---|---|
A Toolkit for Reliable Benchmarking and Research in Multi-Objective Reinforcement Learning | 开源的多目标强化学习算法库多目标强化学习环境MO-Gymnasium单策略和多策略方法 | https://github.com/LucasAlegre/morl-baselines | 2023 | |||
Paper | Key Words | Code | Accepted at | Year | others | |
---|---|---|---|---|---|---|
A Generalized Algorithm for Multi-Objective Reinforcement Learning and Policy Adaptation | 算法框架偏好未知线性偏好优于标量化的MORL算法合理推断隐藏偏好贝尔曼方程参数化policy表示 | https://github.com/RunzheYang/MORL | NeurIPS'19 | 2019 | ||
Lexicographic Multi-Objective Reinforcement Learning | 字典序多目标问题:目标有明确优先级 资源有限 多阶段任务可扩展性实际应用Policy-basedvalue-based单智能体 | https://github.com/lrhammond/lmorl | 2022 | |||
Multi-objective Conflict-based Search for Multi-agent Path Finding. Subdimensional Expansion for Multi-objective Multi-agent Path Finding. | 帕累托最优解集基于冲突搜索 | https://github.com/wonderren/public_pymomapf | 2021 | |||
Unifying All Species: LLM-based Hyper-Heuristics for Multi-objective Optimization | TSP多目标优化 | 2024 | ||||
Multi-objective Evolution of Heuristic Using Large Language Model | TSP,BPP多目标 | 2024 | ||||
Thresholded Lexicographic Ordered Multiobjective Reinforcement Learning | 梯度投影多智能体 | 2024 | ||||
PA2D-MORL:Pareto Ascent Directional Decomposition Based Multi-Objective Reinforcement Learning | Pareto策略集灵活性适应性结果稳定 | 2024 | ||||
Multi-Objective Deep Reinforcement Learning Optimisation in Autonomous Systems | 同时优化多个目标自适应服务器配置 | https://github.com/JuanK120/RL_EWS | 2024 | |||
A practical guide to multi-objective reinforcement learning and planning | 线性标量化转为单目标 VS. 权重空间遍历存储向量值的Q值(Q-Learning)每个策略对应不同的目标组合(Pareto Q-Learning)在学习过程中动态调整偏好(Q-steering) | 2021 | ||||
CM3: Cooperative Multi-goal Multi-stage Multi-agent Reinforcement Learning | 智能体个体目标和集体目标平衡两阶段课程学习(平滑过渡)多目标多智能体策略梯度(信用分配机制)学习效率提升 | https://github.com/011235813/cm3 | 2021 | |||
MO-MIX: Multi-Objective Multi-Agent Cooperative Decision-Making With Deep Reinforcement Learning | 权重向量:平衡不同目标的重要性混合网络:整合所有智能体的局部信息,协调合作 |
Paper | KeyWords | Code | Accepted at | Year | |
---|---|---|---|---|---|
COMA:Counterfactual Multi-Agent Policy Gradients | https://github.com/oxwhirl/pymarl | AAAI | 2018 | ||
LiCA:Learning Implicit Credit Assignment for Cooperative Multi-Agent Reinforcement Learning | https://github.com/mzho7212/LICA | NIPS | 2020 | ||
Evaluating Memory and Credit Assignment in Memory-Based RL | Decoupling Memory from Credit Assignment | https://github.com/twni2016/Memory-RL | 2023 |
Paper | KeyWords | Code | Accepted at | Year | Others | |
---|---|---|---|---|---|---|
Switch Trajectory Transformer with Distributional Value Approximation for Multi-Task Reinforcement Learning | Multi-Task RLSparse Reward | ExpEnv: MINIGRID | ||||
HarmoDT: Harmony Multi-Task Decision Transformer for Offline Reinforcement Learning | ExpEnv: MetaWorld | 2024 | ||||
Elastic Decision Transformer | Offline RLstitch trajectoryMulti-Task | 2023 | ||||
Learning to Modulate pre-trained Models in RL | multi-task learningcontinual learningfine-tuning | 2023 | ||||
Multi Task RL Baselines | https://github.com/facebookresearch/mtrl | |||||
A PyTorch Library for Multi-Task Learning | https://github.com/median-research-group/LibMTL | 2024 | ||||
Discovering Generalizable Multi-agent Coordination Skills from Multi-task Offline Data | 有限来源的离线数据 MARL跨任务的协作未见任务泛化能力 | https://github.com/LAMDA-RL/ODIS | 2023 | |||
Few is More: Task-Efficient Skill-Discovery for Multi-Task Offline Multi-Agent Reinforcement Learning | 避免新任务重复训练多任务离线MARL算法重构观测->评估固定动作+可变动作->正则保守动作从有限小规模源任务->强大的多任务泛化 | 2025 |
Paper | Key Words | Code | Accepted at | Year | Others | |
---|---|---|---|---|---|---|
Responsive Regulation of Dynamic UAV Communication Networks Based on Deep Reinforcement Learning | 无人机(UAV)通信网络动态调控异步DDPGPython | https://github.com/ducmngx/DDPG-UAV-Efficiency | 2021 | |||
eQMARL: Entangled Quantum Multi-Agent Reinforcement Learning for Distributed Cooperation over Quantum Channels | 分布式多智能体强化学习信息共享量子通道收敛更快 效果更好 减少计算负担 | https://github.com/news-vt/eqmarl | 2025 | |||
Learning to Communicate Through Implicit Communication Channels | 隐式通信协议(ICP)框架更高效 | 2024 | ||||
Scaling Large Language Model-based Multi-Agent Collaboration | LLM协作涌现通信模式有向无环图1000个智能体 | https://github.com/OpenBMB/ChatDev | 2024 |
Paper | Code | Accepted at | Year |
---|---|---|---|
Bayesian Opponent Exploitation in Imperfect-Information Games | IEEE Conference on Computational Intelligence and Games | 2018 | |
LOLA:Learning with Opponent-Learning Awareness | AAMAS | 2018 | |
Variational Autoencoders for Opponent Modeling in Multi-Agent Systems | 2020 | ||
Stable Opponent Shaping in Differentiable Games | 2018 | ||
Opponent Modeling in Deep Reinforcement Learning | https://github.com/hhexiy/opponent | ICML | 2016 |
Game Theory-Based Opponent Modeling in Large Imperfect-Information Games | AAMAS | 2011 | |
Agent Modelling under Partial Observability for Deep Reinforcement Learning | NIPS | 2021 |
Paper | KeyWords | Code | Accepted at | Year |
---|---|---|---|---|
ROMA: Multi-Agent Reinforcement Learning with Emergent Roles | https://github.com/TonghanWang/ROMA | ICML | 2020 | |
RODE: Learning Roles to Decompose Multi-Agent Tasks | https://github.com/TonghanWang/RODE | ICLR | 2021 | |
Scaling Multi-Agent Reinforcement Learning with Selective Parameter Sharing | https://github.com/uoe-agents/seps | ICML | 2021 | |
MetaGPT: Meta Programming for A Multi-Agent Collaborative Framework | 元编程框架:流水线范式智能体角色任务分解:子任务更连贯的解决方案 | https://github.com/geekan/MetaGPT | 2023 |
- Bridging Evolutionary Algorithms and Reinforcement Learning: A Comprehensive Survey on Hybrid Algorithms
- Combining evolution and deep reinforcement learning for policy search: a survey
- Deep reinforcement learning versus evolution strategies: A comparative survey
- A survey on evolutionary reinforcement learning algorithms
- Reinforcement learning versus evolutionary computation: A survey on hybrid algorithms
- Evolutionary computation and the reinforcement learning problem
- Evolutionary reinforcement learning: A survey
Paper | Code | Accepted at | Year |
---|---|---|---|
AlphaStar:Grandmaster level in StarCraft II using multi-agent reinforcement learning | Nature | 2019 | |
Paper | KeyWords | Code | Accepted at | Year | Others |
---|---|---|---|---|---|
Mean Field Multi-Agent Reinforcement Learning | ICML | 2018 | |||
Efficient Ridesharing Order Dispatching with Mean Field Multi-Agent Reinforcement Learning | The world wide web conference | 2019 | |||
Bayesian Multi-type Mean Field Multi-agent Imitation Learning | NIPS | 2020 | |||
Bridging mean-field games and normalizing flows with trajectory regularization | 联系平均场博弈与归一化流Python | https://github.com/Whalefishin/MFG_NF | 2023 | ||
Python | https://github.com/hsvgbkhgbv/Mean-field-Fictitious-Play-in-Potential-Games | ||||
MFGLib A Library for Mean Field Games | 库 | https://github.com/radar-research-lab/MFGLib | 2023 | ||
APAC-Net: Alternating the population and agent control via two neural networks to solve high-dimensional stochastic mean field games | 求解随机均场博弈100维 | https://github.com/atlin23/apac-net | 2021 |
Paper | Code | Accepted at | Year |
---|---|---|---|
A Survey on Transfer Learning for Multiagent Reinforcement Learning Systems | Journal of Artificial Intelligence Research | 2019 | |
Parallel Knowledge Transfer in Multi-Agent Reinforcement Learning | 2020 | ||
Paper | keyWords | Code | Accepted at | Year |
---|---|---|---|---|
A Policy Gradient Algorithm for Learning to Learn in Multiagent Reinforcement Learning | ICML | 2021 | ||
Continuous Adaptation via Meta-Learning in Nonstationary and Competitive Environments | 2017 | |||
MetaGPT: Meta Programming for A Multi-Agent Collaborative Framework | 元编程框架:流水线范式智能体角色任务分解:子任务更连贯的解决方案 | https://github.com/geekan/MetaGPT | 2023 | |
Paper | Code | Accepted at | Year |
---|---|---|---|
FEN:Learning Fairness in Multi-Agent Systems | NIPS | 2019 | |
Fairness in Multiagent Resource Allocation with Dynamic and Partial Observations | AAMAS | 2018 | |
Fairness in Multi-agent Reinforcement Learning for Stock Trading | 2019 | ||
Paper | Code | Accepted at | Year |
---|---|---|---|
EITI/EDTI:Influence-Based Multi-Agent Exploration | https://github.com/TonghanWang/EITI-EDTI | ICLR | 2020 |
Cooperative Exploration for Multi-Agent Deep Reinforcement Learning | ICML | 2021 | |
Centralized Model and Exploration Policy for Multi-Agent | 2021 | ||
REMAX: Relational Representation for Multi-Agent Exploration | AAMAS | 2022 |
Paper | KeyWords | Code | Accepted at | Year | |
---|---|---|---|---|---|
Variational Automatic Curriculum Learning for Sparse-Reward Cooperative Multi-Agent Problems | NIPS | 2021 | |||
Individual Reward Assisted Multi-Agent Reinforcement Learning | https://github.com/MDrW/ICML2022-IRAT | ICML | 2022 | ||
Switch Trajectory Transformer with Distributional Value Approximation for Multi-Task Reinforcement Learning | Multi-Task RLSparse Reward | ExpEnv: MINIGRID |
Paper | Code | Accepted at | Year |
---|---|---|---|
Model-based Multi-Agent Reinforcement Learning with Cooperative Prioritized Sweeping | 2020 | ||
Paper | Code | Accepted at | Year |
---|---|---|---|
MANAS: Multi-Agent Neural Architecture Search | 2019 | ||
Paper | Code | Accepted at | Year |
---|---|---|---|
MAMPS: Safe Multi-Agent Reinforcement Learning via Model Predictive Shielding | 2019 | ||
Safer Deep RL with Shallow MCTS: A Case Study in Pommerman | 2019 | ||
Paper | Code | Accepted at | Year | |||
---|---|---|---|---|---|---|
Deep Reinforcement Learning in Parameterized Action Space | 2015 | |||||
DMAPQN: Deep Multi-Agent Reinforcement Learning with Discrete-Continuous Hybrid Action Spaces | IJCAI | 2019 | ||||
H-PPO: Hybrid actor-critic reinforcement learning in parameterized action space | IJCAI | 2019 | ||||
P-DQN: Parametrized Deep Q-Networks Learning: Reinforcement Learning with Discrete-Continuous Hybrid Action Space | 2018 | |||||
Few is More: Task-Efficient Skill-Discovery for Multi-Task Offline Multi-Agent Reinforcement Learning | 避免新任务重复训练多任务离线MARL算法重构观测->评估固定动作+可变动作->正则保守动作从有限小规模源任务->强大的多任务泛化 | 2025 |
Paper | KeyWords | Code | Accepted at | Year | ||
---|---|---|---|---|---|---|
Diverse Auto-Curriculum is Critical for Successful Real-World Multiagent Learning Systems | AAMAS | 2021 | ||||
Q-DPP:Multi-Agent Determinantal Q-Learning | https://github.com/QDPP-GitHub/QDPP | ICML | 2020 | |||
Diversity is All You Need: Learning Skills without a Reward Function | 2018 | |||||
Modelling Behavioural Diversity for Learning in Open-Ended Games | ICML | 2021 | ||||
Diverse Agents for Ad-Hoc Cooperation in Hanabi | CoG | 2019 | ||||
Generating Behavior-Diverse Game AIs with Evolutionary Multi-Objective Deep Reinforcement Learning | IJCAI | 2020 | ||||
Quantifying environment and population diversity in multi-agent reinforcement learning | 2021 | |||||
POMO: Policy Optimization with Multiple Optima for Reinforcement Learning | REINFORCE算法组合优化问题多样化轨迹Python | https://github.com/yd-kwon/POMO | 2021 | |||
HIQL: Offline Goal-Conditioned RL with Latent States as Actions | Hierarchical Goal-Conditioned RLOffline Reinforcement LearningValue Function Estimation | 2023 |
Paper | Code | Accepted at | Year |
---|---|---|---|
Networked Multi-Agent Reinforcement Learning in Continuous Spaces | IEEE conference on decision and control | 2018 | |
Value Propagation for Decentralized Networked Deep Multi-agent Reinforcement Learning | NIPS | 2019 | |
Fully Decentralized Multi-Agent Reinforcement Learning with Networked Agents | ICML | 2018 | |
Paper | Key Words | Code | Accepted at | Year | ||
---|---|---|---|---|---|---|
Offline Pre-trained Multi-Agent Decision Transformer: One Big Sequence Model Conquers All StarCraftII Tasks | 2021 | |||||
Believe what you see: Implicit constraint approach for offline multi-agent reinforcement learning | NIPS | 2021 | ||||
FOCAL: Efficient Fully-Offline Meta-Reinforcement Learning via Distance Metric Learning and Behavior Regularization | https://github.com/LanqingLi1993/FOCAL-ICLR | 2020 | ||||
ComaDICE: Offline Cooperative Multi-Agent Reinforcement Learning with Stationary Distribution Shift Regularization | 分布偏移 | |||||
Discovering Generalizable Multi-agent Coordination Skills from Multi-task Offline Data | 有限来源的离线数据 MARL跨任务的协作未见任务泛化能力 | https://github.com/LAMDA-RL/ODIS | 2023 | |||
Few is More: Task-Efficient Skill-Discovery for Multi-Task Offline Multi-Agent Reinforcement Learning | 避免新任务重复训练多任务离线MARL算法重构观测->评估固定动作+可变动作->正则保守动作从有限小规模源任务->强大的多任务泛化 | 2025 |
Paper | Code | Accepted at | Year |
---|---|---|---|
Certifiably Robust Policy Learning against Adversarial Communication in Multi-agent Systems | 2022 | ||
Distributed Multi-Agent Deep Reinforcement Learning for Robust Coordination against Noise | 2022 | ||
On the Robustness of Cooperative Multi-Agent Reinforcement Learning | IEEE Security and Privacy Workshops | 2020 | |
Towards Comprehensive Testing on the Robustness of Cooperative Multi-agent Reinforcement Learning | CVPR workshop | 2022 | |
Robust Multi-Agent Reinforcement Learning via Minimax Deep Deterministic Policy Gradient | AAAI | 2019 | |
Multi-agent Deep Reinforcement Learning with Extremely Noisy Observations | NIPS Deep Reinforcement Learning Workshop | 2018 | |
Policy Regularization via Noisy Advantage Values for Cooperative Multi-agent Actor-Critic methods | 2021 |
Paper | Code | Accepted at | Year |
---|---|---|---|
Certifiably Robust Policy Learning against Adversarial Communication in Multi-agent Systems | 2022 | ||
Paper | Code | Accepted at | Year |
---|---|---|---|
Towards Comprehensive Testing on the Robustness of Cooperative Multi-agent Reinforcement Learning | CVPR workshop | 2022 | |
Paper | Key Words | Code | Accepted at | Year | Others | |
---|---|---|---|---|---|---|
MAPF-GPT: Imitation Learning for Multi-Agent Pathfinding at Scale | 路径发现无通信 无启发式模仿学习部分可观测Python | https://github.com/CognitiveAISystems/MAPF-GPT | 2025 | |||
Paper | Key Words | Code | Accepted at | Year | ||
---|---|---|---|---|---|---|
INS: Interaction-aware Synthesis to Enhance Offline Multi-agent Reinforcement Learning | 数据稀缺性智能体间交互数据扩散模型合成高质量多智能体数据集稀疏注意力机制 | https://github.com/fyqqyf/INS | 2025 | |||
Discovering Generalizable Multi-agent Coordination Skills from Multi-task Offline Data | 有限来源的离线数据 MARL跨任务的协作未见任务泛化能力 | https://github.com/LAMDA-RL/ODIS | 2023 | |||
Few is More: Task-Efficient Skill-Discovery for Multi-Task Offline Multi-Agent Reinforcement Learning | 避免新任务重复训练多任务离线MARL算法重构观测->评估固定动作+可变动作->正则保守动作从有限小规模源任务->强大的多任务泛化 | 2025 |
Paper | Key Words | Code | Accepted at | Year | ||
---|---|---|---|---|---|---|
Conformal Symplectic Optimization for Stable Reinforcement Learning | RL专用神经网络优化器RAD性能达到Adam优化器的2.5倍得分提升了155.1% | https://github.com/TobiasLv/RAD | 2025 | |||
Chen, Hao, Multi-Agent Reinforcement Learning Papers with Code