In this project, we try to tackle a multi-agent system by leveraging advanced techniques in deep reinforcement learning. We use an implementation of the Proximal Policy Optimization (PPO) algorithm tailored for Unity's Tennis environment (continuous action space and multi-agent).
Notes: This project shares a significant portion of code with my other Continuous Control project.
table_tennis_demo.mp4
This environment presents two agents who control rackets to volley a tennis ball. A positive reward of +0.1 is received if the ball crosses the net, while a punishment of -0.01 is given for every miss, emphasizing the need for continued volleys.
- Dimension: 8
- Details: The variables are related to the position coordinates and the velocity of both the ball and racket.
- Type: Continuous
- Dimension: 2
- Actions: The agent can move towards and away from the net, and jump.
The task is episodic. A moving average score of +0.5 over the last 100 episodes (considering the max score of the two agents in each episode) is necessary to consider the environment as solved.
-
Python Environment: Ensure Python (>=3.6) is installed. Due to package dependencies, it's advised to set up the
drlnd
kernel. Comprehensive setup guide available here. Activation command:source activate drlnd
-
Weights & Biases Integration: For real-time performance tracking, analysis, and hyperparameter tuning, Weights & Biases is integrated. Registration might be required for first-time users.
-
Clone the repository to access the latest features and implementations:
git clone https://github.com/naoufal51/CollabCompet.git
-
Ensure
Tennis.app
aligns with your OS specifications. -
Transition to the project's root directory and install the pertinent dependencies:
cd CollabCompet pip3 install -r requirements.txt
-
Synchronize Weights & Biases with your unique API key to facilitate seamless experiment tracking:
export WANDB_API_KEY=<your_wandb_api_key>
Dependencies:
numpy
: For numerical operations and data manipulation.torch
: Deep learning framework used for modeling and optimization.unityagents
: Interface and utilities for Unity-based environments.wandb
: Performance tracking and visualization tool.matplotlib
: For creating static, animated, and interactive visualizations.
Initiate the training sequence with:
python src/train.py
During training, performance metrics are systematically logged via wandb
and curated under the results
directory. Post-convergence or post the exhaustion of episodes, the model weights are archived in results/weights
.
Post-training, the model's efficacy can be gauged via:
python src/evaluate.py
The evaluation matrices, inclusive of a histogram detailing score distributions, are consolidated under results
.
For an insightful drill-down into the performance metrics:
jupyter notebook results_visualisation.ipynb
.
├── README.md
├── Tennis.app
├── python
├── results
│ ├── scores*.npy
│ ├── scores*.png
│ └── weights
│ ├── actor_critic*.pth
├── results_visualisation.ipynb
├── src
│ ├── actor_critic_model.py
│ ├── evaluate.py
│ ├── ppo_agent.py
│ ├── train.py
│ └── utils.py
├── requirements.txt
└── report.md
This project draws inspiration from :
- Udacity DRLND.
- stable-baselines3
You can freely use the code present in the repo.
For modules provided by Udacity DRLND check their repo DRLND.