Skip to content

adampetrovic/home-ops

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Home Operations Repository

A comprehensive Kubernetes-native home infrastructure platform

πŸš€ GitOps β€’ πŸ”’ Security-First β€’ πŸ€– Fully Automated

License GitHub last commit Kubernetes Talos


πŸ“– Overview

This repository contains the complete infrastructure-as-code (IaC) configuration for my home operations platform. Built on modern cloud-native principles, it demonstrates enterprise-grade practices scaled down for home use, featuring:

  • πŸ—οΈ Kubernetes-Native Architecture: Built on Talos Linux for immutable infrastructure
  • ⚑ GitOps Workflow: Managed by Flux CD for declarative, Git-driven deployments
  • πŸ” Zero-Trust Security: Comprehensive authentication, authorization, and secrets management
  • πŸ€– Full Automation: From hardware provisioning to application deployment
  • πŸ“Š Complete Observability: Metrics, logs, traces, and alerting across the stack
  • 🏠 Smart Home Integration: IoT, automation, and media management platform

🎯 Core Principles

  • Infrastructure as Code: Everything defined declaratively in Git
  • GitOps: Git as the single source of truth for cluster state
  • Security by Design: Zero-trust networking, encrypted secrets, automated updates
  • Cloud-Native: Kubernetes-first, microservices architecture
  • Observability: Comprehensive monitoring and alerting
  • Automation: Minimal manual intervention required

πŸ—οΈ Infrastructure

Cluster Architecture

The platform runs on a high-availability Kubernetes cluster powered by Talos Linux:

Component Details
OS Talos Linux v1.10.5 - Immutable, API-driven Linux
Kubernetes v1.33.3 - Latest stable Kubernetes
CNI Cilium - eBPF-based networking and security
Nodes 4x Control Plane (no dedicated workers)
High Availability Virtual IP, distributed etcd, automated failover

πŸ–₯️ Hardware Specifications

Device Count CPU Cores RAM OS Data Purpose
Intel NUC12WSHi7 2 i7-1265P 12 (16 threads) 64GB 1TB SSD 1TB NVMe Kubernetes Control Plane
Intel NUC11PAHi7 1 i7-1165G7 4 (8 threads) 64GB 1TB SSD 1TB NVMe Kubernetes Control Plane
Intel NUC11PAHi7 1 i7-1165G7 4 (8 threads) 64GB 1TB SSD 1TB NVMe Kubernetes Worker Node
Minisforum MS-01 1 i9-13900H 14 (20 threads) 96GB 1TB NVMe 2TB NVMe Kubernetes Worker Node
Synology RS1219+ 1 Atom C2538 - 4GB - 6Γ—16TB NAS Storage
Synology DVA1622 1 Atom C3508 - 4GB - 2Γ—4TB NVR/Security Cameras
UniFi UXG-Pro 1 - - - - Gateway/Router
UniFi US-48-500W 1 - - - - 48-Port PoE Switch
APC SMC1000I-2UC 1 - - - - UPS Power Management

🌐 Network Topology

  • Management VLAN (VLAN 80): 10.0.80.0/21 - Kubernetes nodes
  • Trusted VLAN (VLAN 10): 10.0.10.0/24 - Home devices, secondary k8s interfaces
  • Cluster Networking:
    • Pod CIDR: 10.69.0.0/16
    • Service CIDR: 10.96.0.0/16
    • LoadBalancer VIP: 10.0.80.99

πŸš€ Applications

The platform hosts 60+ applications across multiple categories:

πŸ€– AI & Machine Learning

  • Ollama - Local LLM inference server
  • Open WebUI - Modern ChatGPT-like interface for Ollama

🏠 Home Automation

  • Home Assistant - Comprehensive home automation platform
  • ESPHome - ESP8266/ESP32 device management
  • Zigbee2MQTT - Zigbee device bridge
  • Mosquitto - MQTT message broker
  • Frigate - AI-powered network video recorder
  • go2rtc - Real-time streaming server
  • TeslaMate - Tesla vehicle data logging and analytics
  • Fernwood Booker - Custom multi-tenant appointment booking system

πŸ“Ί Media Management

πŸ› οΈ Productivity & Tools

πŸ—„οΈ Database & Storage

πŸ” Security & Authentication

🌐 Networking & DNS

πŸ“Š Observability & Monitoring

  • Prometheus - Metrics collection and alerting
  • Grafana - Metrics visualization and dashboards
  • Loki - Log aggregation and analysis
  • Vector - Log collection and routing
  • InfluxDB - Time-series database
  • UnPoller - UniFi metrics collection

πŸ’Ύ Storage Management

  • Rook-Ceph - Distributed block and object storage
  • OpenEBS - Local persistent volumes
  • VolSync - Volume backup and synchronization
  • Snapshot Controller - Volume snapshot management

βš™οΈ System Services


πŸ›οΈ Architecture

GitOps Workflow

graph TD
    A[Developer] -->|Git Push| B[GitHub Repository]
    B -->|Webhook| C[Flux CD]
    C -->|Pull Changes| B
    C -->|Apply Manifests| D[Kubernetes Cluster]
    D -->|Sync Status| C
    E[Renovate Bot] -->|Dependency Updates| B
    F[External Secrets] -->|Fetch Secrets| G[1Password]
    F -->|Create K8s Secrets| D
Loading

Flux CD continuously monitors the Git repository and automatically applies changes to the cluster:

  1. Source Controller - Monitors Git repositories and Helm charts
  2. Kustomize Controller - Applies Kustomize configurations
  3. Helm Controller - Manages Helm releases
  4. Image Automation - Automatically updates container images

Security Architecture

graph TD
    A[Internet] -->|HTTPS| B[Cloudflare]
    B -->|Cloudflare Tunnel| C[Ingress Controller]
    A -->|HTTPS| C
    C -->|mTLS| D[Authelia]
    D -->|LDAP Auth| E[LLDAP]
    D -->|Authorized| F[Application]
    G[External Secrets] -->|API| H[1Password Connect]
    G -->|K8s Secrets| F
Loading
  • Zero-Trust Network: All traffic encrypted and authenticated
  • Multi-Factor Authentication: TOTP, WebAuthn, and Duo support
  • Secrets Management: Encrypted at rest with SOPS, fetched from 1Password
  • Certificate Management: Automated TLS with Let's Encrypt
  • Network Policies: Microsegmentation with Cilium

Storage Strategy

graph TD
    A[Applications] -->|RWO Volumes| B[Rook-Ceph RBD]
    A -->|RWX Volumes| C[Rook-Ceph FS]
    A -->|Local Volumes| D[OpenEBS LocalPV]
    B -->|Backup| E[VolSync]
    C -->|Backup| E
    E -->|S3| F[MinIO/Cloudflare R2]
    G[NAS] -->|NFS| A
Loading
  • Distributed Storage: Rook-Ceph across all nodes for redundancy
  • Local Storage: OpenEBS for high-performance local volumes
  • Network Storage: NFS mounts from Synology NAS
  • Backup Strategy: VolSync for automated volume backups to S3-compatible storage

Networking Deep Dive

  • CNI: Cilium with eBPF for high-performance networking
  • Load Balancing: MetalLB for bare-metal LoadBalancer services
  • Ingress: Dual NGINX controllers (internal/external) with TLS termination
  • DNS: AdGuard Home for network-wide filtering, cloudflare for both internal and external DNS management
  • Multi-Homing: Multus CNI for additional network interfaces (IoT VLAN access)

πŸ”§ Operations & Automation

Task Automation

The repository includes comprehensive Taskfile automation:

# Cluster operations
task talos:generate           # Generate Talos configuration
task talos:apply              # Apply Talos configuration  
task talos:bootstrap          # Bootstrap new cluster
task talos:fetch-kubeconfig   # Generate talos kubeconfig
task talos:upgrade            # Upgrade Talos on a node (requires: node=<ip>)
task talos:upgrade-rollout    # Rolling Talos upgrade on all nodes
task talos:upgrade-k8s        # Upgrade Kubernetes version (requires: node=<ip> to=<version>)
task talos:reboot-node        # Reboot node (requires: IP=<ip>)
task talos:nuke               # Reset nodes to maintenance mode (DESTRUCTIVE!)

# Volume backup operations  
task volsync:check            # Check volsync repo (requires: app=<name>)
task volsync:debug            # Debug restic (requires: app=<name>)
task volsync:list             # List snapshots (requires: app=<name>)
task volsync:unlock           # Unlock restic repository (requires: app=<name>)
task volsync:snapshot         # Create snapshot (requires: app=<name>)
task volsync:restore          # Restore from snapshot (requires: app=<name>)
task volsync:cleanup          # Delete volume populator PVCs

# Kubernetes operations
task k8s:delete-failed-pods   # Delete pods with failed status

Upgrade Procedures

  • Talos OS: Rolling upgrades via task talos:upgrade node=<ip>
  • Kubernetes: Coordinated upgrades following compatibility matrix
  • Applications: Automated via Renovate bot + Flux CD
  • Full documentation: See docs/UPGRADE.md

Disaster Recovery

Complete cluster rebuild capability:

  1. Hardware Reset: PXE boot into Talos maintenance mode
  2. Cluster Bootstrap: Automated via task talos:bootstrap
  3. Backup Restoration: VolSync automatically restores from last snapshots
  4. Full documentation: See docs/RESTORE.md

πŸ“ Repository Structure

πŸ“ kubernetes/
β”œβ”€β”€ πŸ“ apps/              # Application deployments organized by namespace
β”‚   β”œβ”€β”€ πŸ“ ai/            # AI/ML applications (ollama, open-webui)
β”‚   β”œβ”€β”€ πŸ“ automation/    # Home automation stack
β”‚   β”œβ”€β”€ πŸ“ cert-manager/  # Certificate management
β”‚   β”œβ”€β”€ πŸ“ database/      # Database services  
β”‚   β”œβ”€β”€ πŸ“ default/       # Default namespace apps (atuin, memos, etc.)
β”‚   β”œβ”€β”€ πŸ“ external-secrets/ # Secrets management with 1Password
β”‚   β”œβ”€β”€ πŸ“ flux-system/   # Flux operator and instance
β”‚   β”œβ”€β”€ πŸ“ kube-system/   # Core cluster services (cilium, metrics, etc.)
β”‚   β”œβ”€β”€ πŸ“ media/         # Media management applications
β”‚   β”œβ”€β”€ πŸ“ network/       # Networking and DNS services
β”‚   β”œβ”€β”€ πŸ“ observability/ # Monitoring and logging
β”‚   β”œβ”€β”€ πŸ“ openebs-system/ # OpenEBS storage
β”‚   β”œβ”€β”€ πŸ“ rook-ceph/     # Rook-Ceph distributed storage
β”‚   β”œβ”€β”€ πŸ“ security/      # Authentication and security
β”‚   β”œβ”€β”€ πŸ“ storage/       # MinIO object storage
β”‚   └── πŸ“ volsync-system/ # Volume backup services
β”œβ”€β”€ πŸ“ components/        # Reusable Kustomize components
β”‚   β”œβ”€β”€ πŸ“ common/        # Common configurations
β”‚   └── πŸ“ volsync/       # VolSync components
└── πŸ“ flux/              # Flux system configuration
    β”œβ”€β”€ πŸ“ cluster/       # Cluster-wide configurations
    └── πŸ“ vars/          # Cluster settings and secrets

πŸ“ talos/                 # Talos Linux configuration
β”œβ”€β”€ πŸ“ clusterconfig/     # Generated cluster configs
└── πŸ“ patches/           # Configuration patches
    β”œβ”€β”€ πŸ“ controller/    # Controller-specific patches
    └── πŸ“ global/        # Global patches

πŸ“ bootstrap/             # Initial cluster bootstrapping
β”œβ”€β”€ helmfile.yaml         # Helmfile for bootstrapping
└── resources.yaml.j2     # Template for resources

πŸ“ scripts/               # Helper scripts
└── πŸ“ lib/               # Script libraries

πŸ“ docs/                  # Documentation
β”œβ”€β”€ RESTORE.md            # Disaster recovery procedures
└── UPGRADE.md            # Upgrade procedures

πŸ“ .taskfiles/            # Task automation scripts
β”œβ”€β”€ πŸ“ Kubernetes/        # Kubernetes tasks
β”œβ”€β”€ πŸ“ Talos/             # Talos tasks and scripts
└── πŸ“ VolSync/           # VolSync tasks and templates

Taskfile.yaml             # Main task definitions

Application Organization

Each application follows a consistent structure:

app-name/
β”œβ”€β”€ app/                     # Application manifests
β”‚   β”œβ”€β”€ helmrelease.yaml     # Helm chart configuration
β”‚   β”œβ”€β”€ kustomization.yaml   # Kustomize configuration
β”‚   β”œβ”€β”€ externalsecret.yaml  # Secret management (if needed)
β”‚   └── configs/             # Additional config files (optional)
└── ks.yaml                  # Flux Kustomization

πŸš€ Getting Started

Prerequisites

  • Hardware: Minimum 4x bare-metal servers or VMs with 16GB+ RAM
  • Network: VLAN-capable switch and router/firewall
  • DNS: Domain name with Cloudflare DNS management
  • Secrets: 1Password account for secrets management
  • Tools: talosctl, kubectl, flux, task, age (for SOPS)

Quick Start

  1. Fork this repository and customize for your environment
  2. Configure secrets: Set up SOPS age key and 1Password Connect
  3. Prepare hardware: Install Talos Linux on your nodes
  4. Bootstrap cluster:
    cd kubernetes/bootstrap/talos
    task talos:bootstrap
  5. Install Flux CD:
    task flux:github-deploy-key
    task flux:bootstrap
  6. Monitor deployment: Applications will automatically deploy via GitOps

Configuration Areas

Key files to customize for your environment:

  • kubernetes/bootstrap/talos/talconfig.yaml - Hardware and network configuration
  • kubernetes/flux/vars/cluster-settings.yaml - Cluster-wide configuration
  • kubernetes/flux/vars/cluster-secrets.sops.yaml - Encrypted secrets

☁️ Cloud Dependencies

Service Purpose Cost
1Password Secrets management via External Secrets ~$100/year
Cloudflare DNS, CDN, and secure tunnels Free
GitHub Source control and CI/CD Free
Total ~$8/month

🀝 Community & Inspiration

This repository builds upon the excellent work of the k8s-at-home community. Special thanks to:


πŸ“„ License

This project is licensed under the MIT License - see the LICENSE file for details.


⭐ If you find this repository helpful, please consider giving it a star!

πŸ› Report Bug β€’ πŸ’‘ Request Feature β€’ πŸ’¬ Discussions

About

HomeOps driven by Kubernetes and GitOps using Flux

Topics

Resources

License

Stars

Watchers

Forks

Contributors 3

  •  
  •  
  •