Skip to content

geekxflood/Hermes

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Hermes

Go License: MIT

Hermes is a next-generation alert management system, written in Go, designed for high-scale, low-latency environments. Its modular architecture, rule-based processing engine, and CUE-based configuration system enable precise, automated alert routing and processing from multiple sources. Hermes ensures that the right alerts reach the right people, with the right context, at the right time.

Hermes Logo


πŸš€ Goals & Vision

Hermes is built to help modern SRE and operations teams manage alert noise.

Key principles:

  • Modular: Each component is independently deployable and scalable
  • Rule-Based: Flexible, declarative rule engine for alert processing
  • Configurable: CUE-based configuration with live reloading and validation
  • Reliable: Built-in graceful shutdown, component lifecycle management
  • Observable: Structured logging with slog and comprehensive error handling

πŸƒβ€β™‚οΈ Quick Start

1. Clone and Build

git clone git@github.com:geekxflood/hermes.git
cd hermes
go build -o bin/hermes ./main.go

2. Run with Default Configuration

./bin/hermes --config ./configs/config.yaml

3. Verify Operation

The server will start and display:

time=2024-01-15T10:30:45.123Z level=INFO msg="Server starting..." host=127.0.0.1 port=8080
time=2024-01-15T10:30:45.456Z level=INFO msg="Component initialized" component=database
time=2024-01-15T10:30:45.789Z level=INFO msg="Component initialized" component=cache
time=2024-01-15T10:30:46.012Z level=INFO msg="Server started successfully"

4. Test Configuration Changes

Edit configs/config.yaml and save - the configuration will automatically reload.


πŸ—οΈ Current Architecture

Hermes currently implements the foundational components for a robust alert management system:

graph TD
    CLI[Hermes CLI] --> CM[Config Manager]
    CM --> S[Server]
    S --> C[Components]

    C --> HS[HTTP Server]
    C --> IM[Input Manager]
    C --> AP[Alert Processor]
    C --> R[Ruler Component]
    C --> AS[Alert Store]
    C --> OM[Output Manager]

    HS --> IM
    IM --> WH[Webhook Handler]
    IM --> AP
    AP --> R
    AP --> AS
    AP --> OM

    R --> RE[Rule Engine]
    R --> RT[Rule Types]

    OM --> LO[Logger Output]

    CM --> CUE[CUE Schema Validation]
    CM --> FW[File Watcher]
    S --> L[Structured Logging]
Loading

The system provides a complete alert management pipeline with rule-based processing, configuration management, server lifecycle, and component orchestration.


πŸ“¦ Current Project Structure

hermes/
β”œβ”€β”€ cmd/                    # Cobra CLI entrypoint
β”‚   β”œβ”€β”€ root.go            # Main CLI command with server startup
β”‚   └── validate.go        # Configuration validation command
β”œβ”€β”€ internal/              # Core application logic
β”‚   β”œβ”€β”€ alert/            # Alert data structures and utilities
β”‚   β”‚   β”œβ”€β”€ alert.go      # Alert struct and store implementation
β”‚   β”‚   β”œβ”€β”€ alert_test.go # Alert tests
β”‚   β”‚   └── templates/    # Alert CUE schema templates
β”‚   β”œβ”€β”€ common/           # Shared functionality and utilities
β”‚   β”‚   β”œβ”€β”€ constants.go  # System constants
β”‚   β”‚   β”œβ”€β”€ inputs.go     # Input processing utilities
β”‚   β”‚   β”œβ”€β”€ outputs.go    # Output processing utilities
β”‚   β”‚   β”œβ”€β”€ ulid.go       # ULID generation utilities
β”‚   β”‚   └── utils.go      # General utilities
β”‚   β”œβ”€β”€ config/           # Configuration management with CUE
β”‚   β”‚   β”œβ”€β”€ config.go     # Config loading and validation
β”‚   β”‚   β”œβ”€β”€ manager.go    # Live reloading and change notifications
β”‚   β”‚   └── templates/    # CUE schema definitions
β”‚   β”œβ”€β”€ httpserver/       # HTTP server for webhook endpoints
β”‚   β”‚   └── httpserver.go # HTTP server implementation
β”‚   β”œβ”€β”€ inputmgr/         # Input management system
β”‚   β”‚   β”œβ”€β”€ inputmgr.go   # Input manager component
β”‚   β”‚   └── inputmgr_test.go # Input manager tests
β”‚   β”œβ”€β”€ inputs/           # Input processors and handlers
β”‚   β”‚   └── inputwebhook/ # Webhook input handler implementation
β”‚   β”‚       β”œβ”€β”€ webhook_handler.go # Webhook input handler
β”‚   β”‚       └── webhook_handler_test.go # Webhook handler tests
β”‚   β”œβ”€β”€ logging/          # Structured logging with slog
β”‚   β”‚   └── logging.go    # Logger initialization and utilities
β”‚   β”œβ”€β”€ outputmgr/        # Output management system
β”‚   β”‚   └── outputmgr.go  # Output manager component
β”‚   β”œβ”€β”€ outputs/          # Output handlers
β”‚   β”‚   └── outputlogger/ # Console/file logging output
β”‚   β”œβ”€β”€ processor/        # Alert processing pipeline
β”‚   β”‚   └── processor.go  # Alert processor component
β”‚   β”œβ”€β”€ ruler/            # Rule-based alert processing
β”‚   β”‚   β”œβ”€β”€ ruler.go      # Main ruler implementation
β”‚   β”‚   β”œβ”€β”€ engine.go     # Rule evaluation engine
β”‚   β”‚   β”œβ”€β”€ rules.go      # Rule type implementations
β”‚   β”‚   └── benchmark_test.go # Performance benchmarks
β”‚   β”œβ”€β”€ server/           # Server and component lifecycle
β”‚   β”‚   └── server.go     # Main server with graceful shutdown
β”‚   β”œβ”€β”€ testutil/         # Testing utilities and mocks
β”‚   β”‚   └── testutil.go   # Test helper functions

β”œβ”€β”€ configs/              # Configuration files
β”‚   └── config.yaml       # Default configuration
β”œβ”€β”€ docs/                 # Documentation
β”‚   β”œβ”€β”€ architecture.md   # System architecture overview
β”‚   β”œβ”€β”€ cmd.md            # CLI documentation
β”‚   β”œβ”€β”€ config.md         # Configuration management
β”‚   β”œβ”€β”€ logging.md        # Logging system
β”‚   β”œβ”€β”€ ruler.md          # Rule system documentation
β”‚   β”œβ”€β”€ server.md         # Server architecture
β”‚   β”œβ”€β”€ testing.md        # Testing strategy
β”‚   β”œβ”€β”€ testing-implementation-summary.md  # Testing implementation details
β”‚   └── webhook.md        # Webhook input system
β”œβ”€β”€ scripts/              # Build and utility scripts
β”‚   └── test.sh           # Test execution script
β”œβ”€β”€ tests/                # Comprehensive test suite
β”‚   β”œβ”€β”€ testutil/         # Test utilities and factories
β”‚   β”‚   β”œβ”€β”€ factories/    # Test data factories
β”‚   β”‚   └── mocks/        # External service mocks
β”‚   β”œβ”€β”€ integration/      # BDD-style integration tests
β”‚   β”œβ”€β”€ benchmark/        # Performance benchmark tests
β”‚   β”œβ”€β”€ chaos/            # Chaos engineering tests
β”‚   β”œβ”€β”€ contract/         # API contract validation tests
β”‚   └── testdata/         # Static test data
└── main.go              # Application entry point

βš™οΈ Configuration System

Hermes uses CUE for configuration schema validation and management:

CUE Schema Features

  • Type Safety: Strict validation of configuration values
  • Constraints: Port ranges, log levels, and format validation
  • Defaults: Sensible default values for all settings
  • Documentation: Self-documenting schema with constraints

Live Reloading

  • File Watching: Automatic detection of configuration changes
  • Section Notifications: Granular change notifications for specific config sections
  • Graceful Updates: Non-disruptive configuration updates

Supported Formats

  • YAML: Primary configuration format (.yaml, .yml)
  • JSON: Alternative configuration format (.json)

πŸ“‹ Rule-Based Alert Processing

Hermes features a powerful rule engine that processes alerts through configurable rules:

Rule Types

  • Drop: Remove unwanted alerts early in the pipeline
  • Replace: Modify alert fields (severity, labels, annotations)
  • Enrich: Add contextual information (team, escalation, metadata)
  • Forward: Route alerts to specific outputs
  • Suppress: Temporarily suppress alerts based on conditions
  • Throttle: Rate-limit alerts to prevent flooding
  • Template: Apply templated transformations to alert content

Rule Processing Pipeline

Input β†’ Input Rules β†’ Alert Store β†’ Output Rules β†’ Output Manager

Example Rule Configuration

rules:
  - name: drop-test-alerts
    type: drop
    scope: input
    enabled: true
    priority: 10
    match:
      labels.environment: test
      severity: info|debug
    operator: and
    actions:
      drop: true

  - name: enrich-payment-alerts
    type: enrich
    enabled: true
    match:
      labels.service: payment-api
    actions:
      add:
        team: SRE
        escalation: tier2

Rule Features

  • Priority-Based Execution: Rules execute in configurable priority order
  • Flexible Matching: Support for exact, regex, contains, and complex patterns
  • Live Reloading: Rules can be updated without service restart
  • Performance Monitoring: Comprehensive statistics and execution metrics
  • Dry-Run Mode: Test rules without affecting alert processing

For detailed rule documentation, see docs/ruler.md.


πŸ”§ Installation & Setup

Prerequisites

  • Go 1.24.5+
  • CUE CLI

Build from Source

# Clone the repository
git clone git@github.com:geekxflood/hermes.git
cd hermes

# Build the application (creates bin/ directory)
go build -o bin/hermes ./main.go

# Build with version information
VERSION=$(git describe --tags --always --dirty)
COMMIT=$(git rev-parse HEAD)
BUILD_TIME=$(date -u '+%Y-%m-%d_%H:%M:%S')
go build -ldflags "-X main.Version=$VERSION -X main.Commit=$COMMIT \
  -X main.BuildTime=$BUILD_TIME" -o bin/hermes ./main.go

# Or use go run for development
go run ./main.go --help

Development Setup

# Install Air for live reloading (optional)
go install github.com/air-verse/air@latest

# Run with live reloading
air

# Or run directly
go run ./main.go --config ./configs/config.yaml

πŸš€ Usage

Basic Commands

# Show help and available options
hermes --help

# Start Hermes with default configuration
hermes --config ./configs/config.yaml

# Start with debug logging
hermes --config ./configs/config.yaml --debug

# Validate configuration without starting server
hermes validate --config ./configs/config.yaml

# Start with custom configuration file
hermes --config /path/to/your/config.yaml

Available Commands

Command Description
hermes Start the Hermes server (default command)
validate Validate configuration file against schema
completion Generate autocompletion script for shell
help Show help information for any command

Command Line Options

Flag Short Description Default
--config -c Configuration file path Required
--debug -d Enable debug logging false
--help -h Show help information -

βš™οΈ Configuration

Configuration File Structure

# Server configuration
server:
  port: 8080              # Server port (1024-65535)
  host: "127.0.0.1"       # Server host/IP address

# Logging configuration
logging:
  level: "info"           # Log level: debug, info, warn, error
  format: "logfmt"        # Log format: logfmt, json
  output: "stdout"        # Output: stdout, stderr, or file path

# Component configuration (array-based with CUE validation)
# Note: Input and output components are now implicitly managed through ruler configuration
components:

  - type: "alertProcessor"
    name: "main-processor"
    enabled: true
    config:
      batchSize: 100
      processingInterval: "30s"
      maxRetries: 3
      priority: "normal"

  - type: "alertStore"
    name: "memory-store"
    enabled: true
    config:
      backend: "memory"
      maxAlerts: 10000
      retentionDays: 30
      cleanupInterval: "1h"

  - type: "outputManager"
    name: "output-router"
    enabled: true
    config:
      defaultHandlers: ["logger"]
      retryAttempts: 3
      retryDelay: "30s"

# Rule-based alert processing
rules:
  - name: "drop-test-alerts"
    type: "drop"
    scope: "input"
    enabled: true
    priority: 10
    match:
      labels:
        environment: "test"
      severity: ["info", "debug"]
    operator: "and"
    actions:
      drop: true

  - name: "enrich-payment-alerts"
    type: "enrich"
    enabled: true
    match:
      labels:
        service: "payment-api"
    actions:
      add:
        labels:
          team: "SRE"
          escalation: "tier2"

Configuration Validation

The configuration is validated against a CUE schema that enforces:

  • Port Range: 1024-65535 for server ports with automatic collision detection
  • Log Levels: Only valid log levels (debug, info, warn, error)
  • Log Formats: Only supported formats (logfmt, JSON)
  • Output Targets: stdout, stderr, or valid file paths
  • Component Types: Validates component-specific configuration schemas
  • Component Names: Ensures unique component names and valid identifiers
  • Rule Syntax: Validates rule definitions, match conditions, and actions

πŸ“š Documentation

Comprehensive documentation is available in the docs/ directory:


πŸ” Current Status

Hermes is currently in foundational development with these implemented features:

βœ… Completed Features

  • CLI Interface: Cobra-based command-line interface with validation command
  • Configuration System: CUE-based schema validation and live reloading
  • Logging System: Structured logging with slog
  • Server Framework: Component lifecycle and graceful shutdown
  • Rule Engine: Complete rule-based alert processing system
  • Alert Store: In-memory alert storage with CUE validation
  • Input Processors: Webhook input for flexible alert ingestion
  • Output Handlers: Logger output for console and file logging
  • Implicit I/O Management: Input and output components managed through ruler configuration
  • Component System: Modular component architecture with array-based configuration
  • Code Quality: Google Go Style Guide compliance with optimized linting

🚧 In Development

  • HTTP API: REST endpoints for alert management and status
  • Persistent Storage: Database backend for alert store
  • Web UI: Real-time alert management interface
  • Metrics & Monitoring: Comprehensive observability and statistics

🎯 Planned Features

  • Plugin System: Dynamic plugin loading and management
  • Advanced Outputs: Email, Slack, PagerDuty, and other integrations
  • Alert Correlation: Intelligent alert grouping and correlation
  • Multi-tenancy: Support for multiple teams and organizations

πŸ§ͺ Code Quality & Linting

Hermes follows the Google Go Style Guide with optimized golangci-lint configuration:

Run Linting

# Verify configuration
golangci-lint config verify

# Run all linters
golangci-lint run

# Run with timeout for large codebases
golangci-lint run --timeout=5m

# Auto-fix issues where possible
golangci-lint run --fix

Enabled Linters

  • Core Quality: govet, staticcheck, revive, errcheck, unused
  • Maintainability: gocyclo, gocognit, nestif, unconvert, unparam
  • Security: gosec, copyloopvar
  • Performance: perfsprint, prealloc
  • Project-Specific: sloglint, errorlint, contextcheck, depguard

Google Go Style Compliance

  • βœ… Error Handling: All errors must be checked
  • βœ… Documentation: Package comments required
  • βœ… Naming: MixedCaps with proper initialisms
  • βœ… Complexity: Cognitive complexity limits
  • βœ… Security: Vulnerability detection enabled

🀝 Contributing

We welcome contributions! Please follow these guidelines:

Code Quality Standards

  1. Google Go Style Guide: All code must follow Google's Go style guide
  2. Linting: Ensure golangci-lint run passes without errors
  3. Testing: Write comprehensive tests for new functionality
  4. Documentation: Update relevant documentation for changes

Development Workflow

# 1. Fork and clone the repository
git clone <your-fork-url>
cd hermes

# 2. Create a feature branch
git checkout -b feature/your-feature-name

# 3. Make changes and test
go test ./...
golangci-lint run

# 4. Update documentation
# Edit relevant files in docs/ directory

# 5. Commit and push
git commit -m "feat: add your feature description"
git push origin feature/your-feature-name

# 6. Create a pull request

Code Review Process

  • All changes require review and approval
  • Automated checks must pass (linting, tests)
  • Documentation must be updated for user-facing changes
  • Breaking changes require discussion and migration plan

πŸ“„ License

This project is licensed under the MIT License - see the LICENSE file for details.

About

Extensible alert processing and routing platform

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published