Skip to content

AegisFL is a cloud-native, privacy-preserving federated learning platform. It uses TensorFlow Federated, Differential Privacy, and Secure Aggregation to train models across decentralized clients, ensuring HIPAA/GDPR compliance with cost-optimized Kubernetes deployment and real-time monitoring.

License

Notifications You must be signed in to change notification settings

Ankan-Das/AegisFL

Repository files navigation

🛡️ AegisFL

Privacy-Preserving Federated Learning at Scale — Secure, Compliant, and Cloud-Native.

AegisFL is a non-production-ready federated learning platform designed to train models across thousands of decentralized clients while ensuring strict data privacy. It combines TensorFlow Federated with Differential Privacy, Secure Aggregation, and TLS-encrypted communication to protect user data and meet regulatory standards.

banner

✨ Features

  • Federated Averaging & Optimizers — Scalable to 1,000+ clients with non-IID handling.
  • Differential Privacy — Configurable ε-δ budgets with privacy accounting.
  • Secure Aggregation — Encrypted updates, masking protocols, and penetration-tested security.
  • Cloud-Native Deployment — Docker, Kubernetes (AWS EKS), Helm chart packaging.
  • Cost-Optimized Scaling — Spot instances + autoscaling reduce training costs by up to 35%.
  • Real-Time Monitoring — Prometheus + Grafana dashboards for loss, accuracy, and privacy budget.

📦 Tech Stack

  • Languages: Python
  • Frameworks: TensorFlow Federated, TensorFlow Privacy
  • Security: TLS, Differential Privacy, Secure Aggregation
  • Infrastructure: Docker, Kubernetes, AWS EKS, SageMaker
  • Ops: Prometheus, Grafana, Terraform, Helm

🚀 Quick Start

Prerequisites

Setup Environment

# Clone and setup in one command
git clone <your-repo-url>
cd federated-learning-platform
./setup_environment.sh

# Activate environment
conda activate tff

# Validate setup
python validate_environment.py

Run Notebooks

jupyter notebook

📖 Detailed setup guide: See SETUP.md for alternative installation methods and troubleshooting.

📜 License

MIT — See LICENSE for details.


Federated Learning Platform – Prerequisite Knowledge Guide

Welcome! This README walks you through the background topics you should feel comfortable with before you start building the Privacy‑Preserving Federated Learning Platform. Each section explains why the knowledge matters to the project and points you to high‑quality resources so you can level‑up fast.


1. Machine‑Learning Fundamentals

Why it matters  The core of federated learning is still vanilla supervised ML: you train, validate, and test a model; you worry about bias, variance, and generalisation. If these basics are shaky, debugging FL will be impossible.

Topics to cover

  • Train/validation/test splits, cross‑validation
  • Loss functions & optimisation (SGD, Adam)
  • Bias–variance trade‑off
  • Evaluation metrics (accuracy, precision/recall, ROC‑AUC)

Recommended resources

Type Resource
Book Hands‑On Machine Learning with Scikit‑Learn, Keras & TensorFlow – Aurélien Géron (Ch. 1–4)
MOOC Coursera – Andrew Ng’s Machine Learning
Article Google Developers – Machine Learning Crash Course

2. Deep Learning & Computer Vision

Why it matters  Our reference implementation targets CIFAR‑10; you will likely use CNNs and modern optimisers. Knowing how they behave centrally will help you reason about federated convergence and accuracy drop‑offs.

Topics to cover

  • Convolutional neural networks (LeNet, ResNet)
  • Regularisation (dropout, weight decay, data augmentation)
  • Learning‑rate schedules & early stopping

Recommended resources

Type Resource
Course Stanford CS231n lecture notes + videos
Book Deep Learning – Goodfellow, Bengio, Courville (Ch. 9)
Tutorial PyTorch Vision or TensorFlow Keras CIFAR‑10 example

3. Federated Learning Theory

Why it matters  This project orchestrates hundreds of decentralised clients. You need to understand FedAvg, client selection, straggler handling, and secure aggregation to design a robust coordinator.

Topics to cover

  • FedAvg algorithm & convergence guarantees
  • Client sampling strategies
  • Secure aggregation protocols
  • Handling unbalanced & non‑IID data

Recommended resources

Type Resource
Paper McMahan et al., Communication‑Efficient Learning of Deep Networks from Decentralized Data (2017)
Survey Kairouz et al., Advances and Open Problems in Federated Learning (2021)
Docs TensorFlow Federated tutorials (federated_avg, secure_aggregation)

4. Differential Privacy (DP)

Why it matters  We add noise to model updates so that training is provably privacy‑preserving. Understanding ε, δ and the privacy accountant lets you pick sane budgets and justify them to stakeholders.

Topics to cover

  • Formal DP definition (ε‑δ)
  • Moments/Rényi accountants
  • Gaussian & Laplace mechanisms
  • User‑level vs record‑level DP

Recommended resources

Type Resource
Book Dwork & Roth, The Algorithmic Foundations of Differential Privacy (free PDF)
Library TensorFlow Privacy tutorials (dp_fedavg_keras.py)
Article Google AI Blog – Differential Privacy for Everyone

5. Distributed Systems & Networking

Why it matters  Your coordinator must talk to thousands of clients over the network, deal with retries, and aggregate gradients efficiently. Concepts like back‑pressure and idempotent RPCs are essential.

Topics to cover

  • gRPC (streaming, deadlines, retries)
  • TLS basics (handshake, certificates)
  • Load‑balancing & service discovery
  • Fault tolerance, idempotency

Recommended resources

Type Resource
Book Martin Kleppmann, Designing Data‑Intensive Applications (Ch. 11)
Guide gRPC Official Docs – Concepts & Performance Best Practices
Article Cloudflare Blog – TLS 1.3, explained

6. Containerisation & Orchestration

Why it matters  We ship the coordinator in a Docker image and run it on Kubernetes (EKS). You need to know how to build minimal images, mount secrets, and write Helm charts that scale training jobs on demand.

Topics to cover

  • Writing efficient Dockerfiles (multi‑stage builds)
  • Kubernetes primitives: Deployments, Jobs, CronJobs, Secrets, ConfigMaps
  • Helm templating & chart structure
  • Cluster autoscaling & spot instances

Recommended resources

Type Resource
Book Nigel Poulton, The Kubernetes Book
Tutorial Docker Official – Best Practices for Dockerfiles
Guide Helm Docs – Chart Development Tips and Tricks

7. Cloud & MLOps

Why it matters  AWS gives us managed GPUs (SageMaker) and elastic control‑planes (EKS). Good MLOps practice keeps experiments reproducible and costs under control.

Topics to cover

  • SageMaker training jobs & endpoints
  • EKS node groups & IAM/OIDC roles
  • Infrastructure‑as‑Code with Terraform
  • Monitoring with Prometheus & Grafana

Recommended resources

Type Resource
Workshop AWS – Machine Learning Operations (MLOps) on AWS workshop
Book Mark Treveil et al., Introducing MLOps (O’Reilly)
Article Google Cloud – MLOps: Continuous delivery and automation pipelines in machine learning

8. Python Ecosystem & Coding Practices

Why it matters  Clean, type‑safe Python makes your FL pipeline maintainable. Idiomatic testing and formatting catch bugs before they hit prod.

Topics to cover

  • Virtual environments (venv/poetry)
  • Type checking with mypy & pydantic
  • Testing with pytest & fixtures
  • Formatting & linting (black, ruff)

Recommended resources

Type Resource
Book Brett Slatkin, Effective Python (2nd ed.)
Guide PyPI – pytest Quick Start
Tutorial Real Python – Python Type Checking

9. Security Engineering

Why it matters  FL only helps privacy if the whole stack is secure: encrypted transport, authenticated clients, secrets management, and continuous vulnerability scans.

Topics to cover

  • Threat modelling & STRIDE
  • mTLS & token‑based authentication
  • Secret management (AWS Secrets Manager, KMS)
  • Static & dynamic security scanning (Trivy, Snyk, OWASP ZAP)

Recommended resources

Type Resource
Book Evan Gilman & Doug Barney, Zero Trust Networks
Cheat Sheet OWASP – Docker Security Cheat Sheet
Tool Doc Trivy – Scanning Container Images for Vulnerabilities

How to Use This Guide

  1. Scan the table of contents and mark the sections you’re weakest in.
  2. Allocate study blocks (e.g., two weekends per major topic).
  3. Build small demos after each section (e.g., run a TensorFlow Privacy example; write a Helm chart).
  4. Return to the project repo and start wiring the pieces together.

Happy hacking! When in doubt, open an issue or start a discussion—collaboration is at the heart of federated learning.

About

AegisFL is a cloud-native, privacy-preserving federated learning platform. It uses TensorFlow Federated, Differential Privacy, and Secure Aggregation to train models across decentralized clients, ensuring HIPAA/GDPR compliance with cost-optimized Kubernetes deployment and real-time monitoring.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published