Skip to content
/ TACO Public

implementation of "TACO: General Acrobatic Flight Control via Target-and-Command-Oriented Reinforcement Learning"

Notifications You must be signed in to change notification settings

yinzikang/TACO

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

1 Commit
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

TACO: General Acrobatic Flight Control via Target-and-Command-Oriented Reinforcement Learning

Paper Conference License

This repository contains the implementation of TACO (Target-and-Command-Oriented Reinforcement Learning), a novel framework for achieving general acrobatic flight control of Micro Aerial Vehicles (MAVs). The framework enables high-speed, high-accuracy circular flights and continuous multi-flips through a unified reinforcement learning approach.

๐ŸŽฏ Overview

TACO addresses two critical limitations in existing aerobatic flight control methods:

  1. Task Flexibility: Unlike traditional methods restricted to specific maneuver trajectories, TACO supports online parameter adjustments and handles diverse aerobatic tasks within a unified framework.
  2. Sim-to-Real Transfer: Our spectral normalization method with input-output rescaling enhances policy smoothness, independence, and symmetry, enabling zero-shot sim-to-real deployment.

โœจ Key Features

  • Unified Framework: Single framework handles multiple aerobatic maneuvers (hover, circle, flip)
  • Online Parameter Adjustment: Real-time modification of flight parameters during execution
  • Zero-shot Sim-to-Real: Advanced training techniques eliminate the sim-to-real gap
  • High Performance: Achieves 4.2 rad/s angular velocity (1.6ร— faster than previous work) with 70ยฐ tilt angle
  • Continuous Multi-flips: Stable 14+ continuous flips without altitude loss or stabilization pauses

๐Ÿ—๏ธ System Architecture

The TACO framework consists of three main components:

  1. TACO RL Framework: Unified state design and reward functions for different maneuvers
  2. Simulation Environment: High-fidelity MAV model with motor dynamics and aerodynamics
  3. Real MAV Platform: Hardware implementation with onboard inference

State Design

The unified state representation includes:

  • Task-oriented state (14D): Relative position/orientation to target, task flags, commands
  • MAV-oriented state (8D): Body velocity, angular velocity, altitude, battery voltage
  • Context-oriented state (4D): Previous action for temporal consistency

Training Method

  • Network: L-layer fully connected network with spectral normalization
  • Algorithm: PPO with 4096 parallel environments
  • Properties: Temporal/spatial smoothness, independence, symmetry

๐Ÿš€ Installation

Prerequisites

  • Python 3.8+
  • PyTorch
  • IsaacGym
  • CUDA (for GPU acceleration)

Setup

please refer to the installation of IsaacGym

Training

python train_fpv_asymmetry_ppo.py --train_mode=train --task_mode=pos --lenObservations=1 --lenStates=5 --use_actor_encoder=False --use_critic_encoder=True --critic_encoder_type=LSTM --rotor_response_time=0.017 --delay_time=20 --lipschitz_para=4&

python train_fpv_asymmetry_ppo.py --train_mode=train --task_mode=rotate --lenObservations=1 --lenStates=5 --use_actor_encoder=False --use_critic_encoder=True --critic_encoder_type=LSTM --rotor_response_time=0.017 --delay_time=20 --lipschitz_para=4&

python train_fpv_asymmetry_ppo.py --train_mode=train --task_mode=flip --lenObservations=1 --lenStates=5 --use_actor_encoder=False --use_critic_encoder=True --critic_encoder_type=LSTM --rotor_response_time=0.017 --delay_time==20 --lipschitz_para=4&

python train_fpv_asymmetry_ppo.py --train_mode=train --task_mode=mix --lenObservations=1 --lenStates=5 --use_actor_encoder=False --use_critic_encoder=True --critic_encoder_type=LSTM --rotor_response_time=0.017 --delay_time=20 --lipschitz_para=4&

Evaluation

python train_fpv_asymmetry_ppo.py --train_mode=testmodel --load_task_mode=pos --load_time=05-23-02-57

๐ŸŽฎ Supported Tasks

1. POS (Hover) Task

  • Objective: Fly to and hover at desired position with specified yaw
  • Command: None
  • Performance: Precise position and attitude control

2. CIRCLE Task

  • Objective: Rotate around center point with specified speed and radius
  • Command: Tangential velocity (adjustable online)
  • Performance: 1.2m radius at 5m/s, 4.2 rad/s angular velocity

3. FLIP Task

  • Objective: Perform continuous flips around x-axis
  • Command: flip radian remains to complete
  • Performance: 14+ continuous flips, stable fix-point execution

๐Ÿ“Š Experimental Results

Real-world Performance

  • CIRCLE Task: Achieved 1.2m radius at 5m/s with 70ยฐ tilt angle
  • FLIP Task: Completed 14 continuous flips in 6.6s
  • Command Tracking: Superior performance compared to MPC controllers

Sim-to-Real Transfer

Our spectral normalization method demonstrates:

  • Spatial Smoothness: Smooth action transitions across state space
  • Independence: Unrelated actions remain unchanged during task execution
  • Symmetry: Symmetric states produce symmetric actions
  • Temporal Smoothness: Continuous action sequences over time

๐Ÿ“ Project Structure

IsaacGymEnvs/
โ”œโ”€โ”€ isaacgymenvs/                    # Main package directory
โ”‚   โ”œโ”€โ”€ tasks/                       # Task implementations
โ”‚   โ”‚   โ”œโ”€โ”€ control/                 # Control-related modules
โ”‚   โ”‚   โ”‚   โ”œโ”€โ”€ task_reward.py       # Reward function implementations
โ”‚   โ”‚   โ”‚   โ”œโ”€โ”€ thrust_dynamics.py   # Thrust and motor dynamics
โ”‚   โ”‚   โ”‚   โ”œโ”€โ”€ angvel_control.py    # Angular velocity controller
โ”‚   โ”‚   โ”‚   โ”œโ”€โ”€ battery_dynamics.py  # Battery model
โ”‚   โ”‚   โ”‚   โ”œโ”€โ”€ fpv_dynamics.py      # FPV drone dynamics
โ”‚   โ”‚   โ”‚   โ””โ”€โ”€ logger.py            # Logging utilities
โ”‚   โ”‚   โ”œโ”€โ”€ base/                    # Base task classes
โ”‚   โ”‚   โ””โ”€โ”€ fpv_asymmetry.py         # Main TACO environment implementation
โ”‚   โ”œโ”€โ”€ cfg/                         # Configuration files
โ”‚   โ”‚   โ”œโ”€โ”€ Fpv_asymmetry_PPO_pos.yaml      # POS task configuration
โ”‚   โ”‚   โ”œโ”€โ”€ Fpv_asymmetry_PPO_rotate.yaml   # CIRCLE task configuration
โ”‚   โ”‚   โ””โ”€โ”€ Fpv_asymmetry_PPO_flip.yaml     # FLIP task configuration
โ”‚   โ””โ”€โ”€ utils/                       # Utility functions
โ”‚       โ”œโ”€โ”€ torch_jit_utils.py       # PyTorch JIT utilities
โ”‚       โ”œโ”€โ”€ utils.py                 # General utilities
โ”‚       โ””โ”€โ”€ dr_utils.py              # Domain randomization utilities
โ”œโ”€โ”€ train/                           # Training scripts
โ”‚   โ”œโ”€โ”€ train_fpv_asymmetry_ppo.py   # Main training script
โ”‚   โ”œโ”€โ”€ start_train.sh               # Training start script
โ”‚   โ””โ”€โ”€ stop_train.sh                # Training stop script
โ”œโ”€โ”€ algorithms/                      # RL algorithm implementations
โ”‚   โ”œโ”€โ”€ ppo_asymmetry.py             # PPO algorithm with asymmetry handling
โ”‚   โ”œโ”€โ”€ nets_asymmetry.py            # Neural network architectures
โ”‚   โ””โ”€โ”€ buffer_asymmetry.py          # Experience buffer implementation
โ”œโ”€โ”€ assets/                          # 3D models and assets
โ”œโ”€โ”€ docs/                            # Documentation
โ””โ”€โ”€ setup.py                         # Package setup file

๐Ÿ“š Citation

If you find this work useful, please cite our paper:

@inproceedings{yin2025taco,
  title={TACO: General Acrobatic Flight Control via Target-and-Command-Oriented Reinforcement Learning},
  author={Yin, Zikang and Zheng, Canlun and Guo, Shiliang and Wang, Zhikun and Zhao, Shiyu},
  booktitle={IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)},
  year={2025},
  note={Accepted}
}

๐Ÿ”— Related Links


About

implementation of "TACO: General Acrobatic Flight Control via Target-and-Command-Oriented Reinforcement Learning"

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published