TVC-AI: State-of-the-Art Deep Reinforcement Learning for Rocket Thrust Vector Control

A state-of-the-art Deep Reinforcement Learning system for controlling model rocket attitude using Thrust Vector Control (TVC). This project implements a multi-algorithm ensemble with transformer networks, hierarchical RL, and physics-informed neural networks, trained in a realistic PyBullet physics simulation with comprehensive anti-reward-hacking measures.

🚀 Features

Multi-Algorithm Ensemble: PPO + SAC + TD3 with intelligent algorithm selection
Transformer Networks: Attention-based policies for temporal dependencies
Hierarchical RL: Automatic skill discovery and decomposition
Physics-Informed: Domain knowledge integration in neural networks
Curiosity-Driven: Intrinsic motivation for efficient exploration
Anti-Reward-Hacking: Real mission success detection with landing criteria
Safety Constraints: Control Barrier Functions for safe operation
Real-time Monitoring: Comprehensive reward hacking detection
Adaptive Curriculum: Progressive learning with 6 difficulty stages

Key Features
Technical Deep Dive
- Simulation Environment
- DRL Agent: Soft Actor-Critic
- State, Action, and Reward
Project Structure
Getting Started
- Prerequisites
- Installation & Setup
- 1. Training the Agent
- 1. Evaluating the Policy
- 1. Deploying to a Microcontroller
Acknowledgements & References

Key Features

High-Fidelity 3D Physics: A digital twin built with PyBullet simulates rigid body dynamics, gravity, variable mass, and TVC motor thrust. RocketPy can be integrated for accurate aerodynamic force modeling.
Sim-to-Real via Domain Randomization: To ensure the policy is robust, the simulation randomizes key physical parameters on every run: rocket mass, center of gravity (CG), motor thrust curves, sensor noise (IMU), and actuator delay/response.
Standard Gymnasium Interface: The environment adheres to the modern Gymnasium API, making it compatible with a wide range of DRL libraries and algorithms.
State-of-the-Art DRL Agent: Employs Soft Actor-Critic (SAC), an off-policy algorithm known for its sample efficiency and stability, making it ideal for complex physics tasks.
Shaped Reward Function: A carefully designed reward function guides the agent to maintain vertical stability, minimize angular velocity (reduce oscillations), and use control inputs efficiently.
End-to-End Workflow: Provides a complete pipeline from training and evaluation to model quantization and deployment.
MCU-Ready Deployment: The trained policy is optimized using Post-Training Quantization (8-bit) and converted into a C-array via TensorFlow Lite for Microcontrollers (TFLM) for fast, on-device inference.

Technical Deep Dive

Simulation Environment (Digital Twin)

The core of this project is the simulated environment. A robust policy can only be learned if the simulation it's trained in is a close-enough approximation of reality.

Physics Engine: PyBullet is used for its fast and stable rigid body dynamics simulation.
State Representation: The rocket's state is observed at each time step. The state vector includes:
- Attitude (Quaternion): [qx, qy, qz, qw] - A 4D representation to avoid gimbal lock.
- Angular Velocity: [ω_x, ω_y, ω_z] - Critical for damping rotational motion.
Imperfections: Real-world rockets are not perfect. Domain randomization introduces controlled chaos, forcing the agent to learn a policy that can handle variations in hardware and conditions, which is the key to bridging the "sim-to-real" gap.

DRL Agent: Soft Actor-Critic

We use Soft Actor-Critic (SAC) for its balance of exploration and exploitation.

Entropy Maximization: SAC's objective is to maximize not only the cumulative reward but also the entropy of its policy. This encourages broader exploration, preventing the agent from settling into a suboptimal local minimum and making it more resilient to perturbations.
Actor-Critic Architecture:
- Actor (Policy): An MLP that takes the rocket's state as input and outputs the parameters (mean and standard deviation) of a Gaussian distribution for each action (gimbal pitch, gimbal yaw). The actual action is then sampled from this distribution.
- Critic (Value Function): One or more MLPs that learn to estimate the expected future reward from a given state-action pair. This estimate is used to "criticize" and improve the actor's policy.

State, Action, and Reward

The interaction between the agent and environment is defined by these three components:

State Space (Observations): s_t = [qx, qy, qz, qw, ω_x, ω_y, ω_z]
Action Space (Control Inputs): a_t = [gimbal_pitch_angle, gimbal_yaw_angle] (continuous values, typically scaled to [-1, 1]).
Reward Function (r_t): The agent's behavior is shaped by the reward r_t it receives at each step. Our function is a sum of several components:
- Attitude Reward: R_attitude = exp(-k_angle * θ_total^2) where θ_total is the total angle off vertical. This provides a strong, smooth incentive to stay upright.
- Angular Velocity Penalty: R_ang_vel = -k_vel * ||ω||^2. This penalizes rapid spinning and encourages smooth, damped control.
- Control Effort Penalty: R_action = -k_action * ||a||^2. This penalizes large, aggressive gimbal movements, promoting efficiency.
- Termination Penalty: A large negative reward (e.g., -100) is given if the rocket tilts beyond a failure threshold (e.g., 20°), immediately ending the episode.

Project Structure

tvc-ai/
├── agent/              # DRL agent implementation (SAC algorithm, networks)
├── env/                # Rocket physics simulation (Gymnasium environment)
├── models/             # Saved model checkpoints and exported TFLM models
├── scripts/            # High-level scripts for training, evaluation, etc.
│   ├── train.py
│   ├── evaluate.py
│   └── export_tflm.py
├── config/             # Configuration files (Hydra-based)
├── tests/              # Test suite and benchmarks
├── utils/              # Utility functions and helpers
├── verify_installation.py
├── setup.py           # Automated setup script
├── requirements.txt   # Full dependencies
├── requirements-minimal.txt  # Core dependencies only
├── requirements-dev.txt      # Development dependencies
├── .gitignore
└── README.md

🚀 Getting Started

Prerequisites

Python 3.8-3.11 (3.10.11 recommended)
8GB+ RAM (16GB recommended for training)
CUDA-capable GPU (optional but recommended for faster training)
~2-5GB disk space (depending on installation type)

Quick Installation

Option 1: Automated Setup (Recommended)

# Clone the repository
git clone <repository-url>
cd TVC-AI

# Run automated setup
python setup.py

Option 2: Manual Installation

# Clone the repository
git clone <repository-url>
cd TVC-AI

# Choose your installation type:

# Minimal (core functionality only - ~2GB)
pip install -r requirements-minimal.txt

# Full (all features - ~4GB, recommended)
pip install -r requirements.txt

# Development (includes testing tools - ~5GB)
pip install -r requirements.txt -r requirements-dev.txt

# Verify installation
python verify_installation.py

First Steps

Verify Installation:
```
python verify_installation.py
```
Train Your First Model:
```
python scripts/train.py
```
Monitor Training:
```
tensorboard --logdir logs/
```

Evaluate Trained Model:

python scripts/evaluate.py --model_path models/best_model.pth

Troubleshooting Installation

Common Issues:

PyBullet installation fails: Install Visual C++ redistributables on Windows
CUDA out of memory: Use CPU training: python scripts/train.py device=cpu

Package conflicts: Use a virtual environment:

python -m venv tvc_env
source tvc_env/bin/activate  # On Windows: tvc_env\Scripts\activate
pip install -r requirements.txt

For detailed usage instructions, see USAGE.md.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

TVC-AI: State-of-the-Art Deep Reinforcement Learning for Rocket Thrust Vector Control

🚀 Features

Table of Contents

Key Features

Technical Deep Dive

Simulation Environment (Digital Twin)

DRL Agent: Soft Actor-Critic

State, Action, and Reward

Project Structure

🚀 Getting Started

Prerequisites

Quick Installation

First Steps

Troubleshooting Installation

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 53 Commits
Docs		Docs
GUIDE		GUIDE
agent		agent
config		config
env		env
scripts		scripts
tests		tests
utils		utils
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
setup.py		setup.py
verify_installation.py		verify_installation.py

License

NIKHILSAI71/TVC-AI

Folders and files

Latest commit

History

Repository files navigation

TVC-AI: State-of-the-Art Deep Reinforcement Learning for Rocket Thrust Vector Control

🚀 Features

Table of Contents

Key Features

Technical Deep Dive

Simulation Environment (Digital Twin)

DRL Agent: Soft Actor-Critic

State, Action, and Reward

Project Structure

🚀 Getting Started

Prerequisites

Quick Installation

First Steps

Troubleshooting Installation

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages