A repository containing the code and resources for the paper "Learned Controllers for Agile Quadrotors in Pursuit-Evasion Games" by Alejandro Sánchez Roncero, Yixi Cai, Olov Andersson, and Petter Ögren. We build our code on OmniDrones.
This project provides a reinforcement learning (RL) framework and neural-network controllers for 1v1 quadrotor pursuit–evasion games. We leverage body-rate control commands and an Asynchronous Multi-Stage Population-Based (AMSPB) training loop to achieve agile, high-speed maneuvers and robust adversarial performance.
Key highlights:
- Body-rate policies that command roll, pitch, yaw rates and collective thrust to exploit full quadrotor dynamics.
- AMSPB training alternates learning between pursuer and evader while sampling from a growing population of past and current policies to mitigate catastrophic forgetting and ensure monotonic improvement.
- High-fidelity simulation in NVIDIA Isaac Sim (4.1.0) with realistic quadrotor dynamics at 62.5 Hz.
Asynchronous Multi-Stage Population-Based (AMSPB)
- Operating System: Linux (tested on Ubuntu 24.04)
- Python: 3.8 or 3.9
- Conda: for environment management
- OmniDrones (requires NVIDIA Omniverse and Isaac Sim 4.1.0)
-
Install Omniverse Launcher following NVIDIA’s instructions: https://docs.isaacsim.omniverse.nvidia.com/4.1.0/installation/install_workstation.html#isaac-sim-app-install-workstation
-
Download Isaac Sim 4.1.0 via the Launcher and move it into your Omniverse package folder:
mv ~/Downloads/IsaacSim-4.1.0 ~/.local/share/ov/pkg
git clone https://github.com/yourusername/AMSPB_PEG.git
cd AMSPB_PEG
cd ~/Omnidrones
pip install -e .
conda create -n amspb_env python=3.9 -y
conda activate amspb_env
pip install -r requirements.txt
pip install --upgrade tensordict==0.3.2 torchrl==0.3.1
Troubleshooting: If you encounter the error:
TypeError: ArticulationView.get_world_poses() got an unexpected keyword argument 'usd'
Follow the guide: https://omnidrones.readthedocs.io/en/latest/troubleshooting.html
To train and evaluate our quadrotor pursuit–evasion policies, follow these steps:
The training scripts are located in scripts/experiments/
. You can configure each run via the corresponding YAML config file.
-
Train Pursuer:
python scripts/experiments/train_pursuer.py
-
Train Evader:
python scripts/experiments/train_evader.py
By default, the scripts use the configurations from our paper. To modify hyperparameters or environment settings, edit the YAML files directly.
If you want to log experiments online with Weights & Biases (W&B), add the following parameters:
--wandb.entity <project_name> --wandb.entity <entity_name>
To evaluate trained policies without further training, use the benchmark script in scripts/benchmark/
:
python scripts/benchmark/benchmark_pursuer.py
Edit the benchmark config file to set evaluation parameters (e.g., number of episodes, seed, evaluation opponents).
The adversarial co-training pipeline (AMSPB) implementation will be released soon. Stay tuned for updates.
If you use this code in your research, please cite the paper:
@article{roncero2025learned,
title={Learned Controllers for Agile Quadrotors in Pursuit-Evasion Games},
author={Roncero, Alejandro Sanchez and Andersson, Olov and Ogren, Petter},
journal={arXiv preprint arXiv:2506.02849},
year={2025}
}
This project is released under the MIT License. See LICENSE for details.
Alejandro Sánchez Roncero — alesr@kth.se