🚨 Alert Fatigue No More: Building an AI-Powered Correlation Engine for Prometheus

A powerful alert correlation system that helps reduce alert fatigue by intelligently grouping and analyzing related alerts from your monitoring stack. Built with Prometheus, AlertManager, Grafana, and Loki.

🌟 Features

Intelligent alert correlation and grouping
Real-time alert processing
Beautiful Grafana dashboards for visualization
Integrated logging with Loki
Comprehensive runbooks for common scenarios
Docker-based deployment for easy setup

🎥 Demo

Watch how the Alert Correlator works in action:

See how to use local models and switch between different LLM providers:

📁 Project Structure

The project is organized into the following main directories:

backend/: Contains the core application logic
- alert_correlator.py: Main correlation engine
- server.js: API server
config/: Configuration files for monitoring and logging
- logging/: Loki and Promtail configurations
- monitoring/: Prometheus and AlertManager configurations
docker/: Docker-related files
- Dockerfile: Main application container
- docker-compose.yml: Service orchestration
- .env: Environment variables
frontend/: Web interface files
grafana/: Grafana provisioning and dashboards
runbooks/: Documentation for various scenarios
logs/: Application logs

🔧 Prerequisites

Docker and Docker Compose
Git
Python 3.8 or higher (if running locally)
~2GB of free RAM for all containers

🚀 Quick Start

Clone the repository:

git clone https://github.com/yourusername/alert_correlator.git
cd alert_correlator

Set up environment configuration:

Create or modify the .env file in the docker directory with your OpenAI API key:

echo "OPENAI_API_KEY=your_openai_api_key_here" > docker/.env

Create the logs directory and log file:

mkdir -p logs
touch logs/alert_correlator.log

Start the services:

docker compose -f docker/docker-compose.yml up -d

📊 Accessing the Services

After starting the containers, you can access the following services:

Grafana: http://localhost:3000
- Default credentials: admin/admin
Prometheus: http://localhost:9090
AlertManager: http://localhost:9093
Alert Correlator API: http://localhost:5000
Loki: http://localhost:3100

📈 Grafana Dashboard

Login to Grafana at http://localhost:3000
Navigate to Dashboards -> Browse
Look for "Alert Correlator Dashboard"
The dashboard includes:
- Alert correlation statistics
- Active alert groups
- Historical correlation data
- Alert patterns and trends

🛠️ Configuration

Alert Correlator

The main configuration is done through environment variables and the following files:

backend/alert_correlator.py: Main correlation logic
config/monitoring/alertmanager.yml: AlertManager configuration
config/monitoring/prometheus.yml: Prometheus configuration
config/logging/loki-config.yaml: Loki configuration
config/logging/promtail-config.yaml: Log shipping configuration

LLM Service Configuration

The Alert Correlator now supports multiple LLM providers:

Local Models with Ollama: Run AI models locally on your machine
- Supports various open-source models
- No internet connection required for inference
- Lower latency for quick responses
OpenAI: Use OpenAI's powerful GPT models
- Requires API key
- Best-in-class performance

You can switch between these providers through the LLM service interface at runtime, allowing for flexible model selection based on your needs.

Runbooks

Predefined runbooks are available in the runbooks/ directory, organized by component:

alertmanager/: AlertManager related issues
prometheus/: Prometheus troubleshooting
kubernetes/: Kubernetes cluster issues
grafana/: Grafana-related problems
azure-devops/: Azure DevOps pipeline issues

🔍 Testing

You can test the alert correlation system using the Makefile:

make help

📝 Logging

Logs are available in:

Docker logs: docker compose -f docker/docker-compose.yml logs -f [service_name]
Application logs: logs/alert_correlator.log
Through Loki in Grafana

🤝 Contributing

Contributions are welcome! Please feel free to submit a Pull Request. For major changes, please open an issue first to discuss what you would like to change.

⚖️ License

This project is licensed under a Custom Attribution-NonCommercial License - see the LICENSE file for details.

👤 Author

Vasile Bogdan Bujor

⚠️ Important Notes

This is a monitoring tool - please ensure you have proper security measures in place
Always backup your configuration before making changes
Test thoroughly in a non-production environment first
Commercial use requires explicit permission from the author

Name		Name	Last commit message	Last commit date
Latest commit History 38 Commits
backend		backend
config		config
demo		demo
diagrams		diagrams
docker		docker
frontend		frontend
grafana		grafana
llm-service		llm-service
logs		logs
runbooks		runbooks
.DS_Store		.DS_Store
.gitignore		.gitignore
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
package-lock.json		package-lock.json
package.json		package.json
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

🚨 Alert Fatigue No More: Building an AI-Powered Correlation Engine for Prometheus

🌟 Features

🎥 Demo

📁 Project Structure

🔧 Prerequisites

🚀 Quick Start

📊 Accessing the Services

📈 Grafana Dashboard

🛠️ Configuration

Alert Correlator

LLM Service Configuration

Runbooks

🔍 Testing

📝 Logging

🤝 Contributing

⚖️ License

👤 Author

⚠️ Important Notes

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

qSharpy/alert_correlator

Folders and files

Latest commit

History

Repository files navigation

🚨 Alert Fatigue No More: Building an AI-Powered Correlation Engine for Prometheus

🌟 Features

🎥 Demo

📁 Project Structure

🔧 Prerequisites

🚀 Quick Start

📊 Accessing the Services

📈 Grafana Dashboard

🛠️ Configuration

Alert Correlator

LLM Service Configuration

Runbooks

🔍 Testing

📝 Logging

🤝 Contributing

⚖️ License

👤 Author

⚠️ Important Notes

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages