This repository contains a complete cloud infrastructure project where I implemented live VM disk migration and real-time system monitoring using Microsoft Azure, Prometheus, Grafana, and Node Exporter.
📝 Project completed 3–4 weeks ago | 📤 Documented now for public sharing
Feature | Description |
---|---|
🧠 Goal | Monitor VM health and simulate live migration |
☁️ Cloud Platform | Microsoft Azure |
📊 Monitoring Stack | Prometheus + Grafana + Node Exporter |
💻 VMs Used | VM1 (Monitor), VM2 (Target), VM3 (Migrated) |
⚙️ Automation Tools | Azure CLI, Bash, YAML |
VM1 (Prometheus + Grafana) <-- monitors -- VM2 (Node Exporter)
↑
[Disk attached from VM3]
az vm create --resource-group MyResourceGroup --name VM1 --image Ubuntu2204 ...
az vm create --name VM2 ...
az vm create --name VM3 ...
- Installed via
apt
- Configured scrape targets
- Created Grafana dashboards (CPU, RAM, Disk, Uptime)
az vm deallocate --name VM3
az snapshot create --resource-group MyResourceGroup --name VM3_Snapshot --source <VM3_DISK_ID>
az disk create --resource-group MyResourceGroup --name MigratedDisk --source VM3_Snapshot
az vm disk attach --resource-group MyResourceGroup --vm-name VM2 --name MigratedDisk
wget https://github.com/prometheus/node_exporter/releases/download/v*/node_exporter-*.linux-amd64.tar.gz
./node_exporter &
groups:
- name: example_alert
rules:
- alert: HighCPUUsage
expr: rate(node_cpu_seconds_total{mode="user"}[1m]) > 0.7
for: 1m
labels:
severity: critical
annotations:
summary: "High CPU usage detected"
description: "CPU usage > 80% for over 1 min."
stress-ng --cpu 2 --timeout 60s
📸 Screenshots (I’ve detailed every step, command, and dashboard in my full Medium article with screenshots.)
Component | Description |
---|---|
Azure Portal | 3 VMs provisioned |
Prometheus UI | Alerts visible at /alerts |
Grafana Dashboard | CPU, RAM, Disk panels for VM1 & VM2 |
CLI | Disk migration commands |
- CPU Usage:
rate(node_cpu_seconds_total{mode="user"}[5m])
- Memory %:
(node_memory_MemTotal_bytes - node_memory_MemAvailable_bytes) / node_memory_MemTotal_bytes * 100
- Disk I/O:
rate(node_disk_bytes_read_total[5m])
- Uptime:
node_time_seconds - node_boot_time_seconds
- Performed disk-based VM migration using Azure CLI
- Configured Prometheus to collect metrics across nodes
- Built real-time Grafana dashboards
- Triggered alerts and visualized usage during stress
This was a test/study project built on Azure Student Subscription. Please ensure proper shutdown of all services to avoid billing.
Open to feedback, collaboration, or discussion around DevOps, observability, or cloud-native projects!