InfoPG: Mutual Information Maximizing Policy Gradient

This repo provides the full implementation for the paper "Iterated Reasoning with Mutual Information in Cooperative and Byzantine Decentralized Teaming" at the International Conference on Learning Representations (ICLR) 2022

Authors: Sachin Konan*, Esmaeil Seraj*, Matthew Gombolay

* Co-first authors. These authors contributed equally to this work.

Full Read (arXiv): https://arxiv.org/pdf/2201.08484.pdf

Installation Instructions:

Download Anaconda
conda env create --file marl.yml
cd PettingZoo
conda activate marl
python setup.py install
Follow Starcraft MultiAgent Challenge Instructions Here: https://github.com/oxwhirl/smac

Run PistonBall:

cd pistonball
To Execute Experiments:
1. MOA: python test_piston_ball.py -method moa
2. InfoPG: python test_piston_ball.py -method infopg -k [K_LEVELS]
3. Adv. InfoPG: python test_piston_ball.py -method infopg_adv -k [K_LEVELS]
4. Consensus Update: python test_piston_ball.py -method consensus
5. Standard A2C: python test_piston_ball.py -method a2c
To Execute PR2-AC Experiments:
1. cd ../pr2-ac/pistonball/
2. python distributed_pistonabll_train.py -batch 4 -workers [NUM CPUS]
3. Results will be saved in experiments/pistonball/[DATETIME OF RUN]/

Run Fraud (Byzantine Experiments):

MOA: python batch_pistoncase_moa_env.py
InfoPG: python batch_pistoncase_infopg_env.py

Run Pong:

cd pong
To Execute MOA Experiments:
1. cd pong_moa
2. MOA: python distributed_pong_moa_train.py -batch 16 -workers [NUM CPUS]
3. Results will be saved in experiments/pong/[DATETIME OF RUN]/
To Execute PR2-AC Experiments:
1. cd ../pr2-ac/pong/
2. python distributed_pong_train.py -batch 16 -workers [NUM CPUS]
3. Results will be saved in experiments/pong/[DATETIME OF RUN]/
To Execute Other Experiments:
1. InfoPG: python distributed_pong_train.py -batch 16 -workers [NUM CPUS] -k [K_LEVELS] -adv info -critic
2. Adv. InfoPG: python distributed_pong_train.py -batch 16 -workers [NUM CPUS] -k [K_LEVELS] -adv normal
3. Consensus Update: python distributed_pong_train.py -batch 16 -workers [NUM CPUS] -k 0 -adv normal -consensus
4. Standard A2C: python distributed_pong_train.py -batch 16 -workers [NUM CPUS] -k 0 -adv normal
5. Results will be saved in experiments/pong/[DATETIME OF RUN]/

Run Walker:

cd walker
To Execute MOA Experiments:
cd walker_moa
MOA: python distributed_walker_train_moa.py -batch 16 -workers [NUM CPUS]
Results will be saved in experiments/walker_moa/[DATETIME OF RUN]/
To Execute PR2-AC Experiments:
1. cd ../pr2-ac/walker/
2. python distributed_walker_train.py -batch 16 -workers [NUM CPUS]
3. Results will be saved in experiments/walker/[DATETIME OF RUN]/
To Execute Other Experiments:
1. InfoPG: python distributed_walker_train.py -batch 16 -workers [NUM CPUS] -k [K_LEVELS] -adv info -critic
2. Adv. InfoPG: python distributed_walker_train.py -batch 16 -workers [NUM CPUS] -k [K_LEVELS] -adv normal
3. Consensus Update: python distributed_walker_train.py -batch 16 -workers [NUM CPUS] -k 0 -adv normal -consensus
4. Standard A2C: python distributed_walker_train.py -batch 16 -workers [NUM CPUS] -k 0 -adv normal
5. Results will be saved in experiments/walker/[DATETIME OF RUN]/

Run Starcraft:

cd starcraft
To Execute MOA Experiments:
1. cd moa
2. MOA: python distributed_starcraft_train_moa.py -batch 128 -workers [NUM CPUS] -positive_rewards
3. Results will be saved in experiments/starcraft/[DATETIME OF RUN]/
To Execute PR2-AC Experiments:
1. cd ../pr2-ac/starcraft/
2. python distributed_starcraft_train.py -batch 128 -workers [NUM CPUS]
3. Results will be saved in experiments/starcraft/[DATETIME OF RUN]/
To Execute Other Experiments:
1. InfoPG: python distributed_walker_train.py -batch 128 -workers [NUM CPUS] -k [K_LEVELS] -adv info -critic -positive_rewards
2. Adv. InfoPG: python distributed_walker_train.py -batch 128 -workers [NUM CPUS] -k [K_LEVELS] -adv normal -positive_rewards
3. Consensus Update: python distributed_walker_train.py -batch 128 -workers [NUM CPUS] -k 0 -adv normal -consensus -positive_rewards
4. Standard A2C: python distributed_walker_train.py -batch 128 -workers [NUM CPUS] -k 0 -adv normal -positive_rewards
5. Results will be saved in experiments/starcraft/[DATETIME OF RUN]/

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

InfoPG: Mutual Information Maximizing Policy Gradient

Installation Instructions:

Run PistonBall:

Run Fraud (Byzantine Experiments):

Run Pong:

Run Walker:

Run Starcraft:

About

Uh oh!

Releases

Packages

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
.idea		.idea
PettingZoo		PettingZoo
experiments		experiments
pistonball		pistonball
pong		pong
pr2-ac		pr2-ac
starcraft		starcraft
walker		walker
.DS_Store		.DS_Store
LICENSE		LICENSE
README.md		README.md
marl.yml		marl.yml

License

CORE-Robotics-Lab/InfoPG

Folders and files

Latest commit

History

Repository files navigation

InfoPG: Mutual Information Maximizing Policy Gradient

Installation Instructions:

Run PistonBall:

Run Fraud (Byzantine Experiments):

Run Pong:

Run Walker:

Run Starcraft:

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages