CatOps

Ultra-lightweight server monitoring with real-time alerts. One command to install, zero configuration needed.

Monitor standalone servers or entire Kubernetes clusters with Telegram alerts and optional web dashboard at catops.app.

# Install in seconds
curl -sfL https://get.catops.io/install.sh | bash

🚀 Features

Core Monitoring:

System metrics (CPU, Memory, Disk, Network, I/O)
Process monitoring with resource usage
Real-time Telegram alerts
Optional web dashboard
Cross-platform (Linux, macOS, Kubernetes)
Ultra-lightweight (~15MB binary, ~128MB RAM)

Deployment Options:

Standalone: Monitor individual servers (Linux/macOS)
Kubernetes: Monitor entire clusters with DaemonSet

Alerting:

Telegram bot integration with remote commands
Configurable thresholds (CPU, Memory, Disk)
Intelligent spike detection (sudden spikes, gradual rises, anomalies)
Alert deduplication (no spam)
Interactive Telegram buttons (Acknowledge, Silence)
Instant notifications with detailed context

Management:

Background service with auto-start
Simple CLI commands
Automatic updates
User-level installation (no root required)

⚡ Quick Start

Standalone Server

# 1. Install
curl -sfL https://get.catops.io/install.sh | bash

# 2. Check status
catops status

# 3. Enable web dashboard (optional)
catops auth login YOUR_TOKEN

Kubernetes Cluster

# 1. Get your token from https://catops.app/setup

# 2. Install with Helm
helm install catops oci://ghcr.io/mfhonley/catops/helm-charts/catops \
  --namespace catops-system \
  --create-namespace \
  --set auth.token=YOUR_TOKEN

# 3. Verify installation
kubectl get pods -n catops-system

That's it! Your servers/nodes appear in the dashboard within 60 seconds.

🛠️ Installation

Standalone Servers

Requirements:

Linux (systemd) or macOS (launchd)
AMD64 or ARM64 architecture
No root privileges required

One-Command Install:

curl -sfL https://get.catops.io/install.sh | bash

Operating Modes:

Local Mode (default): Metrics collected locally, view with catops status
Cloud Mode: Telegram alerts + Web dashboard at catops.app

Enable Cloud Mode for Telegram alerts:

# 1. Install CatOps
curl -sfL https://get.catops.io/install.sh | bash

# 2. Get token from https://catops.app
catops auth login YOUR_TOKEN

# 3. Configure Telegram bot on catops.app dashboard

From Source (Developers):

git clone https://github.com/mfhonley/catops.git
cd catops
go build -o catops ./cmd/catops
./catops --version

Kubernetes Clusters

Requirements:

Kubernetes 1.19+
Helm 3.0+
metrics-server installed

Quick Install:

# Basic installation (core metrics only)
helm install catops oci://ghcr.io/mfhonley/catops/helm-charts/catops \
  --namespace catops-system \
  --create-namespace \
  --set auth.token=YOUR_TOKEN

With Prometheus (Extended Metrics):

# Includes pod labels, owner info, 200+ metrics
helm install catops oci://ghcr.io/mfhonley/catops/helm-charts/catops \
  --namespace catops-system \
  --create-namespace \
  --set auth.token=YOUR_TOKEN \
  --set prometheus.enabled=true \
  --set kubeStateMetrics.enabled=true

Verify Installation:

# Check pods are running
kubectl get pods -n catops-system

# Check logs
kubectl logs -n catops-system -l app.kubernetes.io/name=catops --tail=20

📖 Detailed Kubernetes guide: See Kubernetes Guide below

📊 Usage

Basic Commands

Monitoring:

catops status              # Show current metrics
catops processes           # Top processes by resource usage
catops restart             # Restart monitoring service

Configuration:

catops config show                              # Show current config
catops set cpu=80 mem=85 disk=90                # Set alert thresholds
catops set spike=30 gradual=15                  # Set spike detection sensitivity
catops set anomaly=4.0                          # Set anomaly detection (std deviations)
catops set renotify=120                         # Set re-notification interval (minutes)

Service Management:

catops autostart enable    # Enable auto-start on boot
catops autostart disable   # Disable auto-start
catops autostart status    # Check auto-start status

System:

catops update              # Check for updates and install
catops uninstall           # Remove CatOps completely
catops --version           # Show version

Telegram Alerts (Cloud Mode)

Setup:

Enable Cloud Mode: catops auth login YOUR_TOKEN
Configure Telegram on catops.app dashboard
Connect your Telegram account
Start receiving personal alerts

Interactive Alert Buttons:

When connected to catops.app, alerts include interactive buttons:

🔴 CPU Spike Detected

server-prod-01
📈 5.2% → 35.8% (+30.6% spike)

[Acknowledge] [Silence ▼] [Details]

Acknowledge - Mark alert as seen, stop re-notifications
Silence - Mute alerts for 30m/1h/2h/24h (useful for maintenance)
Details - Open web dashboard for full metrics and history

Alert Types:

Threshold - Metric exceeded configured limit (CPU > 80%)
Sudden Spike - Rapid increase detected (5% → 40% in 15 seconds)
Gradual Rise - Sustained increase over time (15% → 32% over 5 minutes)
Anomaly - Statistical outlier (configurable, default: 3.0σ from average)
- Uses standard deviation to detect unusual values
- Adjust with catops set anomaly=4.0 for less sensitivity
Recovery - Alert resolved, metrics back to normal

Cloud Mode (Web Dashboard)

Enable web dashboard at catops.app:

# 1. Get token from https://catops.app (Profile → Generate Auth Token)

# 2. Login
catops auth login YOUR_AUTH_TOKEN

# 3. Verify
catops auth info

Benefits:

✅ Real-time metrics accessible from anywhere
✅ Historical data and trends
✅ Multi-server monitoring
✅ Team collaboration
✅ Mobile-friendly interface

Operation Modes:

Local Mode (default): Metrics collected locally, no alerts, works offline
Cloud Mode: Telegram alerts + Web dashboard + Multi-server monitoring

☸️ Kubernetes Guide

Installation

Prerequisites Check:

# Check Kubernetes version (need 1.19+)
kubectl version --short

# Check Helm installed (need 3.0+)
helm version --short

# Verify metrics-server is working
kubectl top nodes

Install metrics-server (if needed):

kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml

# For Docker Desktop, allow insecure TLS:
kubectl patch deployment metrics-server -n kube-system \
  --type='json' \
  -p='[{"op": "add", "path": "/spec/template/spec/containers/0/args/-", "value": "--kubelet-insecure-tls"}]'

Install CatOps:

# Get your auth token from https://catops.app/setup

# Install with Helm
helm install catops oci://ghcr.io/mfhonley/catops/helm-charts/catops \
  --namespace catops-system \
  --create-namespace \
  --set auth.token=YOUR_TOKEN \
  --set prometheus.enabled=true \
  --set kubeStateMetrics.enabled=true

What Gets Monitored

Node Metrics (per node):

CPU, Memory, Disk usage
Network I/O, HTTPS connections
Pod count and status
System information (OS, uptime)

Pod Metrics (per pod):

CPU/Memory usage
Restart count, container count
Pod phase (Running/Pending/Failed)
Namespace, labels, owner info (with Prometheus)

Cluster Metrics:

Total nodes / Ready nodes
Total pods / Running / Pending / Failed
Cluster health percentage

Managing CatOps

Stop Monitoring (Temporary):

# Option 1: Stop only CatOps connector (Prometheus continues)
kubectl delete daemonset catops -n catops-system

# Option 2: Stop everything including Prometheus
kubectl scale deployment catops-prometheus-server --replicas=0 -n catops-system
kubectl scale deployment catops-kube-state-metrics --replicas=0 -n catops-system
kubectl delete daemonset catops-prometheus-node-exporter -n catops-system
kubectl delete daemonset catops -n catops-system

# Resume later
helm upgrade catops oci://ghcr.io/mfhonley/catops/helm-charts/catops \
  --namespace catops-system \
  --reuse-values

Reduce Resource Usage:

# Disable Prometheus (saves ~500 MB RAM)
helm upgrade catops oci://ghcr.io/mfhonley/catops/helm-charts/catops \
  --namespace catops-system \
  --reuse-values \
  --set prometheus.enabled=false

# Reduce collection frequency (saves ~40% CPU)
helm upgrade catops oci://ghcr.io/mfhonley/catops/helm-charts/catops \
  --namespace catops-system \
  --reuse-values \
  --set collection.interval=120  # Collect every 2 minutes instead of 1

Update:

# Update to latest version
helm upgrade catops oci://ghcr.io/mfhonley/catops/helm-charts/catops \
  --namespace catops-system \
  --reuse-values

# Update to specific version
helm upgrade catops oci://ghcr.io/mfhonley/catops/helm-charts/catops \
  --version 0.2.7 \
  --namespace catops-system \
  --reuse-values

Uninstall:

# Remove CatOps
helm uninstall catops -n catops-system

# Delete namespace
kubectl delete namespace catops-system

Resource Consumption

Basic Configuration (without Prometheus):

Per node: 128-256 MB RAM, 0.1-0.2 CPU
3-node cluster: ~300-800 MB RAM total

With Prometheus:

Per node: 128-256 MB RAM, 0.1-0.2 CPU
Prometheus server: 256-512 MB RAM, 0.1-0.5 CPU
3-node cluster: ~1-1.5 GB RAM total

Recommended for:

Basic: Small clusters, Docker Desktop, dev environments
With Prometheus: Production, full observability, when you need labels/owner info

Common Kubernetes Issues

Pods not starting:

# Check pod status
kubectl get pods -n catops-system

# Check logs
kubectl logs -n catops-system -l app.kubernetes.io/name=catops --tail=50

# Common fix: Verify metrics-server is working
kubectl top nodes

Metrics not appearing in dashboard:

# Verify auth token is correct
kubectl get secret catops -n catops-system -o jsonpath='{.data.auth-token}' | base64 -d

# Check network connectivity
kubectl exec -n catops-system $(kubectl get pod -n catops-system -l app.kubernetes.io/name=catops -o name | head -1) -- \
  wget -O- https://api.catops.io/health

📖 Advanced Kubernetes topics: See docs/KUBERNETES_ADVANCED.md

🔧 Configuration

Configuration file: ~/.catops/config.yaml

Example:

# Cloud Mode (Set automatically via 'catops auth login')
auth_token: "your_auth_token"
server_id: "507f1f77bcf86cd799439011"

# Alert Thresholds
cpu_threshold: 80.0
mem_threshold: 85.0
disk_threshold: 90.0

# Alert Sensitivity (Advanced)
sudden_spike_threshold: 30.0      # Alert on CPU/Memory changes > 30%
gradual_rise_threshold: 15.0      # Alert on sustained increases > 15%
anomaly_threshold: 4.0            # Alert on statistical anomalies > 4σ (std deviations)
alert_renotify_interval: 120      # Re-notify every 2 hours (minutes)

Edit configuration:

# Via CLI commands (recommended)
catops set cpu=80 mem=85 disk=90

# Configure alert sensitivity
catops set spike=30 gradual=15 anomaly=4.0 renotify=120

# Or edit file directly
nano ~/.catops/config.yaml

Alert Sensitivity Settings

Control how often you receive alerts:

For production servers (less noise):

catops set cpu=80 spike=30 gradual=15 anomaly=4.0 renotify=120
catops restart

For critical servers (more sensitive):

catops set cpu=70 spike=20 gradual=10 anomaly=2.5 renotify=60
catops restart

Understanding Alert Sensitivity Parameters

1. `spike` - Sudden Spike Threshold

Default: 20% | Range: 0-100%

Detects rapid changes in CPU/Memory usage within a short time window (15-30 seconds).

How it works:

Compares current value vs 5 previous values
Triggers if change exceeds threshold percentage
Example: CPU jumps from 5% to 35% = 30% spike

When to adjust:

Too many spike alerts? Increase to 30-40%
Missing sudden issues? Decrease to 15%

Real-world scenarios:

spike=20%: Detects most sudden problems (memory leaks, attacks)
spike=30%: Production servers (filters minor fluctuations)
spike=40%: Dev/staging (only extreme cases)

2. `gradual` - Gradual Rise Threshold

Default: 10% | Range: 0-100%

Detects sustained increases over a longer period (5 minutes).

How it works:

Analyzes trend over 20 data points (5 minutes at 15s intervals)
Triggers if cumulative rise exceeds threshold
Example: CPU steadily climbs 15% → 20% → 25% → 30% = 15% gradual rise

When to adjust:

Too many gradual alerts? Increase to 15-20%
Want early warning? Keep at 10%

Real-world scenarios:

gradual=10%: Catch slow memory leaks early
gradual=15%: Production (ignore normal growth patterns)
gradual=20%: Servers with expected load variations

3. `anomaly` - Statistical Anomaly Threshold

Default: 3.0σ | Range: 1.0-10.0 standard deviations

Detects values that are statistically unusual compared to historical average.

How it works:

Calculates average (μ) and standard deviation (σ) over 5 minutes
Measures how many σ current value deviates from average
Triggers if deviation exceeds threshold

Mathematical explanation:

Current value: 25%
Historical avg (μ): 8%
Std deviation (σ): 4%

Deviation = |25 - 8| / 4 = 4.25σ
If anomaly=4.0 → Alert triggered (4.25 > 4.0)
If anomaly=5.0 → No alert (4.25 < 5.0)

When to adjust:

Too many anomaly alerts for small changes? Increase to 4.0-5.0σ
Missing unusual patterns? Decrease to 2.0-2.5σ
Getting alerts like "13.2% (3.6σ)"? Your threshold is 3.0σ, increase to 4.0σ

Statistical context:

1σ: ~68% of values fall within ±1σ (very common)
2σ: ~95% of values fall within ±2σ (common)
3σ: ~99.7% of values fall within ±3σ (rare, default)
4σ: ~99.99% of values fall within ±4σ (very rare)
5σ: Extreme outlier (once in 1.7 million events)

Real-world scenarios:

anomaly=2.5σ: Critical servers (catch everything unusual)
anomaly=3.0σ: Balanced default
anomaly=4.0σ: Production (reduce noise)
anomaly=5.0σ: Dev/staging (only extreme anomalies)

4. `renotify` - Re-notification Interval

Default: 60 minutes | Range: Any positive integer

How often to resend alert if problem persists.

How it works:

After initial alert, wait N minutes before sending same alert again
Only applies to unacknowledged, active alerts
Stops if alert is acknowledged or resolved

When to adjust:

Alert fatigue? Increase to 120-240 minutes
Need frequent reminders? Keep at 60 minutes
Critical systems? Decrease to 30 minutes

Real-world scenarios:

renotify=30: Critical infrastructure (frequent reminders)
renotify=60: Balanced default
renotify=120: Production (less spam)
renotify=240: Dev/staging (occasional reminders)

Quick Reference Table

Scenario	spike	gradual	anomaly	renotify
Critical production	20%	10%	2.5σ	60min
Standard production	30%	15%	4.0σ	120min
Dev/Staging	40%	20%	5.0σ	240min
High traffic (expected spikes)	35%	20%	4.5σ	120min
Low traffic (stable)	25%	12%	3.5σ	90min

How Alert Types Interact

Priority order (highest to lowest):

Sudden Spike - Immediate danger (memory leak, attack)
Gradual Rise - Growing problem (slow leak, increasing load)
Anomaly - Unusual pattern (statistical outlier)
Threshold - Limit exceeded (only sent if NO spikes/anomalies)

Why threshold alerts might not appear:

If CPU constantly fluctuates by small amounts, anomaly alerts fire
Increase anomaly threshold to reduce sensitivity
This allows threshold alerts to be sent when CPU > configured limit

🔍 Troubleshooting

Standalone Issues

CatOps not starting:

# Check status
catops status

# Force cleanup and restart
catops force-cleanup
catops restart

# Check logs (Linux)
journalctl -u catops --since "10 minutes ago"

# Check logs (macOS)
tail -f ~/Library/Logs/catops.log

Telegram alerts not working:

# Verify Cloud Mode is enabled
catops auth info

# Check Telegram is configured on catops.app dashboard
# Connect your Telegram account in Settings

# Restart daemon
catops restart

Cloud Mode not working:

# Check authentication
catops auth info

# Re-login
catops auth logout
catops auth login YOUR_NEW_TOKEN

Too many alerts (alert spam):

# Reduce sensitivity to avoid notifications for minor fluctuations
catops set spike=30 gradual=15 anomaly=4.0 renotify=120
catops restart

# For even less noise (dev/staging servers)
catops set spike=40 gradual=20 anomaly=5.0 renotify=240
catops restart

# If getting too many anomaly alerts for small changes:
catops set anomaly=4.5  # Increase anomaly threshold
catops restart

Not receiving threshold alerts:

# Threshold alerts only send when there are NO spikes/anomalies
# Increase spike detection thresholds to allow threshold alerts
catops set spike=30 gradual=15 anomaly=4.0
catops restart

Kubernetes Issues

Pods in CrashLoopBackOff:

# Check logs for errors
kubectl logs -n catops-system -l app.kubernetes.io/name=catops --tail=100

# Common fix: Install/fix metrics-server
kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml

High resource usage:

# Check current resource usage
kubectl top pods -n catops-system

# Disable Prometheus to reduce usage
helm upgrade catops oci://ghcr.io/mfhonley/catops/helm-charts/catops \
  --namespace catops-system \
  --reuse-values \
  --set prometheus.enabled=false

📖 More troubleshooting: See docs/TROUBLESHOOTING.md

🤝 Contributing

We welcome contributions!

Quick Start:

# Fork and clone
git clone https://github.com/YOUR_USERNAME/catops.git
cd catops

# Build
go build -o catops ./cmd/catops

# Test
./catops --version

Areas to contribute:

Bug fixes and improvements
Documentation enhancements
Platform support (Windows, FreeBSD)
Feature requests

📖 Development guide: See docs/DEVELOPMENT.md

📞 Support

Get Help:

💬 Telegram: @mfhonley - Fastest response
📧 Email: me@thehonley.org
🐛 Issues: GitHub Issues
💬 Discussions: GitHub Discussions

📄 License

MIT License - see LICENSE for details.

Components:

CatOps CLI: Open source (MIT)
Web Dashboard: catops.app
Backend API: Cloud infrastructure for metrics storage

🔗 Links

Website: catops.app
Documentation: GitHub
Helm Chart: ghcr.io/mfhonley/catops/helm-charts/catops

Built with ❤️ by the open source community

Name		Name	Last commit message	Last commit date
Latest commit History 123 Commits
.github/workflows		.github/workflows
charts/catops		charts/catops
cmd		cmd
config		config
docs		docs
internal		internal
pkg/utils		pkg/utils
.gitignore		.gitignore
Dockerfile.k8s		Dockerfile.k8s
LICENSE		LICENSE
README.md		README.md
go.mod		go.mod
go.sum		go.sum

License

mfhonley/catops

Folders and files

Latest commit

History

Repository files navigation

CatOps

Table of Contents

🚀 Features

⚡ Quick Start

Standalone Server

Kubernetes Cluster

🛠️ Installation

Standalone Servers

Kubernetes Clusters

📊 Usage

Basic Commands

Telegram Alerts (Cloud Mode)

Cloud Mode (Web Dashboard)

☸️ Kubernetes Guide

Installation

What Gets Monitored

Managing CatOps

Resource Consumption

Common Kubernetes Issues

🔧 Configuration

Alert Sensitivity Settings

Understanding Alert Sensitivity Parameters

1. spike - Sudden Spike Threshold

2. gradual - Gradual Rise Threshold

3. anomaly - Statistical Anomaly Threshold

4. renotify - Re-notification Interval

Quick Reference Table

How Alert Types Interact

🔍 Troubleshooting

Standalone Issues

Kubernetes Issues

🤝 Contributing

📞 Support

📄 License

🔗 Links

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 39

Packages 0

Uh oh!

Languages

1. `spike` - Sudden Spike Threshold

2. `gradual` - Gradual Rise Threshold

3. `anomaly` - Statistical Anomaly Threshold

4. `renotify` - Re-notification Interval

Packages