This repository contains the implementation of TACO (Target-and-Command-Oriented Reinforcement Learning), a novel framework for achieving general acrobatic flight control of Micro Aerial Vehicles (MAVs). The framework enables high-speed, high-accuracy circular flights and continuous multi-flips through a unified reinforcement learning approach.
TACO addresses two critical limitations in existing aerobatic flight control methods:
- Task Flexibility: Unlike traditional methods restricted to specific maneuver trajectories, TACO supports online parameter adjustments and handles diverse aerobatic tasks within a unified framework.
- Sim-to-Real Transfer: Our spectral normalization method with input-output rescaling enhances policy smoothness, independence, and symmetry, enabling zero-shot sim-to-real deployment.
- Unified Framework: Single framework handles multiple aerobatic maneuvers (hover, circle, flip)
- Online Parameter Adjustment: Real-time modification of flight parameters during execution
- Zero-shot Sim-to-Real: Advanced training techniques eliminate the sim-to-real gap
- High Performance: Achieves 4.2 rad/s angular velocity (1.6ร faster than previous work) with 70ยฐ tilt angle
- Continuous Multi-flips: Stable 14+ continuous flips without altitude loss or stabilization pauses
The TACO framework consists of three main components:
- TACO RL Framework: Unified state design and reward functions for different maneuvers
- Simulation Environment: High-fidelity MAV model with motor dynamics and aerodynamics
- Real MAV Platform: Hardware implementation with onboard inference
The unified state representation includes:
- Task-oriented state (14D): Relative position/orientation to target, task flags, commands
- MAV-oriented state (8D): Body velocity, angular velocity, altitude, battery voltage
- Context-oriented state (4D): Previous action for temporal consistency
- Network: L-layer fully connected network with spectral normalization
- Algorithm: PPO with 4096 parallel environments
- Properties: Temporal/spatial smoothness, independence, symmetry
- Python 3.8+
- PyTorch
- IsaacGym
- CUDA (for GPU acceleration)
please refer to the installation of IsaacGym
python train_fpv_asymmetry_ppo.py --train_mode=train --task_mode=pos --lenObservations=1 --lenStates=5 --use_actor_encoder=False --use_critic_encoder=True --critic_encoder_type=LSTM --rotor_response_time=0.017 --delay_time=20 --lipschitz_para=4&
python train_fpv_asymmetry_ppo.py --train_mode=train --task_mode=rotate --lenObservations=1 --lenStates=5 --use_actor_encoder=False --use_critic_encoder=True --critic_encoder_type=LSTM --rotor_response_time=0.017 --delay_time=20 --lipschitz_para=4&
python train_fpv_asymmetry_ppo.py --train_mode=train --task_mode=flip --lenObservations=1 --lenStates=5 --use_actor_encoder=False --use_critic_encoder=True --critic_encoder_type=LSTM --rotor_response_time=0.017 --delay_time==20 --lipschitz_para=4&
python train_fpv_asymmetry_ppo.py --train_mode=train --task_mode=mix --lenObservations=1 --lenStates=5 --use_actor_encoder=False --use_critic_encoder=True --critic_encoder_type=LSTM --rotor_response_time=0.017 --delay_time=20 --lipschitz_para=4&
python train_fpv_asymmetry_ppo.py --train_mode=testmodel --load_task_mode=pos --load_time=05-23-02-57
- Objective: Fly to and hover at desired position with specified yaw
- Command: None
- Performance: Precise position and attitude control
- Objective: Rotate around center point with specified speed and radius
- Command: Tangential velocity (adjustable online)
- Performance: 1.2m radius at 5m/s, 4.2 rad/s angular velocity
- Objective: Perform continuous flips around x-axis
- Command: flip radian remains to complete
- Performance: 14+ continuous flips, stable fix-point execution
- CIRCLE Task: Achieved 1.2m radius at 5m/s with 70ยฐ tilt angle
- FLIP Task: Completed 14 continuous flips in 6.6s
- Command Tracking: Superior performance compared to MPC controllers
Our spectral normalization method demonstrates:
- Spatial Smoothness: Smooth action transitions across state space
- Independence: Unrelated actions remain unchanged during task execution
- Symmetry: Symmetric states produce symmetric actions
- Temporal Smoothness: Continuous action sequences over time
IsaacGymEnvs/
โโโ isaacgymenvs/ # Main package directory
โ โโโ tasks/ # Task implementations
โ โ โโโ control/ # Control-related modules
โ โ โ โโโ task_reward.py # Reward function implementations
โ โ โ โโโ thrust_dynamics.py # Thrust and motor dynamics
โ โ โ โโโ angvel_control.py # Angular velocity controller
โ โ โ โโโ battery_dynamics.py # Battery model
โ โ โ โโโ fpv_dynamics.py # FPV drone dynamics
โ โ โ โโโ logger.py # Logging utilities
โ โ โโโ base/ # Base task classes
โ โ โโโ fpv_asymmetry.py # Main TACO environment implementation
โ โโโ cfg/ # Configuration files
โ โ โโโ Fpv_asymmetry_PPO_pos.yaml # POS task configuration
โ โ โโโ Fpv_asymmetry_PPO_rotate.yaml # CIRCLE task configuration
โ โ โโโ Fpv_asymmetry_PPO_flip.yaml # FLIP task configuration
โ โโโ utils/ # Utility functions
โ โโโ torch_jit_utils.py # PyTorch JIT utilities
โ โโโ utils.py # General utilities
โ โโโ dr_utils.py # Domain randomization utilities
โโโ train/ # Training scripts
โ โโโ train_fpv_asymmetry_ppo.py # Main training script
โ โโโ start_train.sh # Training start script
โ โโโ stop_train.sh # Training stop script
โโโ algorithms/ # RL algorithm implementations
โ โโโ ppo_asymmetry.py # PPO algorithm with asymmetry handling
โ โโโ nets_asymmetry.py # Neural network architectures
โ โโโ buffer_asymmetry.py # Experience buffer implementation
โโโ assets/ # 3D models and assets
โโโ docs/ # Documentation
โโโ setup.py # Package setup file
If you find this work useful, please cite our paper:
@inproceedings{yin2025taco,
title={TACO: General Acrobatic Flight Control via Target-and-Command-Oriented Reinforcement Learning},
author={Yin, Zikang and Zheng, Canlun and Guo, Shiliang and Wang, Zhikun and Zhao, Shiyu},
booktitle={IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)},
year={2025},
note={Accepted}
}