Hermes

Hermes is a next-generation alert management system, written in Go, designed for high-scale, low-latency environments. Its modular architecture, rule-based processing engine, and CUE-based configuration system enable precise, automated alert routing and processing from multiple sources. Hermes ensures that the right alerts reach the right people, with the right context, at the right time.

🚀 Goals & Vision

Hermes is built to help modern SRE and operations teams manage alert noise.

Key principles:

Modular: Each component is independently deployable and scalable
Rule-Based: Flexible, declarative rule engine for alert processing
Configurable: CUE-based configuration with live reloading and validation
Reliable: Built-in graceful shutdown, component lifecycle management
Observable: Structured logging with slog and comprehensive error handling

🏃‍♂️ Quick Start

1. Clone and Build

git clone git@github.com:geekxflood/hermes.git
cd hermes
go build -o bin/hermes ./main.go

2. Run with Default Configuration

./bin/hermes --config ./configs/config.yaml

3. Verify Operation

The server will start and display:

time=2024-01-15T10:30:45.123Z level=INFO msg="Server starting..." host=127.0.0.1 port=8080
time=2024-01-15T10:30:45.456Z level=INFO msg="Component initialized" component=database
time=2024-01-15T10:30:45.789Z level=INFO msg="Component initialized" component=cache
time=2024-01-15T10:30:46.012Z level=INFO msg="Server started successfully"

4. Test Configuration Changes

Edit configs/config.yaml and save - the configuration will automatically reload.

🏗️ Current Architecture

Hermes currently implements the foundational components for a robust alert management system:

graph TD
    CLI[Hermes CLI] --> CM[Config Manager]
    CM --> S[Server]
    S --> C[Components]

    C --> HS[HTTP Server]
    C --> IM[Input Manager]
    C --> AP[Alert Processor]
    C --> R[Ruler Component]
    C --> AS[Alert Store]
    C --> OM[Output Manager]

    HS --> IM
    IM --> WH[Webhook Handler]
    IM --> AP
    AP --> R
    AP --> AS
    AP --> OM

    R --> RE[Rule Engine]
    R --> RT[Rule Types]

    OM --> LO[Logger Output]

    CM --> CUE[CUE Schema Validation]
    CM --> FW[File Watcher]
    S --> L[Structured Logging]

The system provides a complete alert management pipeline with rule-based processing, configuration management, server lifecycle, and component orchestration.

📦 Current Project Structure

hermes/
├── cmd/                    # Cobra CLI entrypoint
│   ├── root.go            # Main CLI command with server startup
│   └── validate.go        # Configuration validation command
├── internal/              # Core application logic
│   ├── alert/            # Alert data structures and utilities
│   │   ├── alert.go      # Alert struct and store implementation
│   │   ├── alert_test.go # Alert tests
│   │   └── templates/    # Alert CUE schema templates
│   ├── common/           # Shared functionality and utilities
│   │   ├── constants.go  # System constants
│   │   ├── inputs.go     # Input processing utilities
│   │   ├── outputs.go    # Output processing utilities
│   │   ├── ulid.go       # ULID generation utilities
│   │   └── utils.go      # General utilities
│   ├── config/           # Configuration management with CUE
│   │   ├── config.go     # Config loading and validation
│   │   ├── manager.go    # Live reloading and change notifications
│   │   └── templates/    # CUE schema definitions
│   ├── httpserver/       # HTTP server for webhook endpoints
│   │   └── httpserver.go # HTTP server implementation
│   ├── inputmgr/         # Input management system
│   │   ├── inputmgr.go   # Input manager component
│   │   └── inputmgr_test.go # Input manager tests
│   ├── inputs/           # Input processors and handlers
│   │   └── inputwebhook/ # Webhook input handler implementation
│   │       ├── webhook_handler.go # Webhook input handler
│   │       └── webhook_handler_test.go # Webhook handler tests
│   ├── logging/          # Structured logging with slog
│   │   └── logging.go    # Logger initialization and utilities
│   ├── outputmgr/        # Output management system
│   │   └── outputmgr.go  # Output manager component
│   ├── outputs/          # Output handlers
│   │   └── outputlogger/ # Console/file logging output
│   ├── processor/        # Alert processing pipeline
│   │   └── processor.go  # Alert processor component
│   ├── ruler/            # Rule-based alert processing
│   │   ├── ruler.go      # Main ruler implementation
│   │   ├── engine.go     # Rule evaluation engine
│   │   ├── rules.go      # Rule type implementations
│   │   └── benchmark_test.go # Performance benchmarks
│   ├── server/           # Server and component lifecycle
│   │   └── server.go     # Main server with graceful shutdown
│   ├── testutil/         # Testing utilities and mocks
│   │   └── testutil.go   # Test helper functions

├── configs/              # Configuration files
│   └── config.yaml       # Default configuration
├── docs/                 # Documentation
│   ├── architecture.md   # System architecture overview
│   ├── cmd.md            # CLI documentation
│   ├── config.md         # Configuration management
│   ├── logging.md        # Logging system
│   ├── ruler.md          # Rule system documentation
│   ├── server.md         # Server architecture
│   ├── testing.md        # Testing strategy
│   ├── testing-implementation-summary.md  # Testing implementation details
│   └── webhook.md        # Webhook input system
├── scripts/              # Build and utility scripts
│   └── test.sh           # Test execution script
├── tests/                # Comprehensive test suite
│   ├── testutil/         # Test utilities and factories
│   │   ├── factories/    # Test data factories
│   │   └── mocks/        # External service mocks
│   ├── integration/      # BDD-style integration tests
│   ├── benchmark/        # Performance benchmark tests
│   ├── chaos/            # Chaos engineering tests
│   ├── contract/         # API contract validation tests
│   └── testdata/         # Static test data
└── main.go              # Application entry point

⚙️ Configuration System

Hermes uses CUE for configuration schema validation and management:

CUE Schema Features

Type Safety: Strict validation of configuration values
Constraints: Port ranges, log levels, and format validation
Defaults: Sensible default values for all settings
Documentation: Self-documenting schema with constraints

Live Reloading

File Watching: Automatic detection of configuration changes
Section Notifications: Granular change notifications for specific config sections
Graceful Updates: Non-disruptive configuration updates

Supported Formats

YAML: Primary configuration format (.yaml, .yml)
JSON: Alternative configuration format (.json)

📋 Rule-Based Alert Processing

Hermes features a powerful rule engine that processes alerts through configurable rules:

Rule Types

Drop: Remove unwanted alerts early in the pipeline
Replace: Modify alert fields (severity, labels, annotations)
Enrich: Add contextual information (team, escalation, metadata)
Forward: Route alerts to specific outputs
Suppress: Temporarily suppress alerts based on conditions
Throttle: Rate-limit alerts to prevent flooding
Template: Apply templated transformations to alert content

Rule Processing Pipeline

Input → Input Rules → Alert Store → Output Rules → Output Manager

Example Rule Configuration

rules:
  - name: drop-test-alerts
    type: drop
    scope: input
    enabled: true
    priority: 10
    match:
      labels.environment: test
      severity: info|debug
    operator: and
    actions:
      drop: true

  - name: enrich-payment-alerts
    type: enrich
    enabled: true
    match:
      labels.service: payment-api
    actions:
      add:
        team: SRE
        escalation: tier2

Rule Features

Priority-Based Execution: Rules execute in configurable priority order
Flexible Matching: Support for exact, regex, contains, and complex patterns
Live Reloading: Rules can be updated without service restart
Performance Monitoring: Comprehensive statistics and execution metrics
Dry-Run Mode: Test rules without affecting alert processing

For detailed rule documentation, see docs/ruler.md.

🔧 Installation & Setup

Prerequisites

Go 1.24.5+
CUE CLI

Build from Source

# Clone the repository
git clone git@github.com:geekxflood/hermes.git
cd hermes

# Build the application (creates bin/ directory)
go build -o bin/hermes ./main.go

# Build with version information
VERSION=$(git describe --tags --always --dirty)
COMMIT=$(git rev-parse HEAD)
BUILD_TIME=$(date -u '+%Y-%m-%d_%H:%M:%S')
go build -ldflags "-X main.Version=$VERSION -X main.Commit=$COMMIT \
  -X main.BuildTime=$BUILD_TIME" -o bin/hermes ./main.go

# Or use go run for development
go run ./main.go --help

Development Setup

# Install Air for live reloading (optional)
go install github.com/air-verse/air@latest

# Run with live reloading
air

# Or run directly
go run ./main.go --config ./configs/config.yaml

🚀 Usage

Basic Commands

# Show help and available options
hermes --help

# Start Hermes with default configuration
hermes --config ./configs/config.yaml

# Start with debug logging
hermes --config ./configs/config.yaml --debug

# Validate configuration without starting server
hermes validate --config ./configs/config.yaml

# Start with custom configuration file
hermes --config /path/to/your/config.yaml

Available Commands

Command	Description
`hermes`	Start the Hermes server (default command)
`validate`	Validate configuration file against schema
`completion`	Generate autocompletion script for shell
`help`	Show help information for any command

Command Line Options

Flag	Short	Description	Default
`--config`	`-c`	Configuration file path	Required
`--debug`	`-d`	Enable debug logging	`false`
`--help`	`-h`	Show help information	-

⚙️ Configuration

Configuration File Structure

# Server configuration
server:
  port: 8080              # Server port (1024-65535)
  host: "127.0.0.1"       # Server host/IP address

# Logging configuration
logging:
  level: "info"           # Log level: debug, info, warn, error
  format: "logfmt"        # Log format: logfmt, json
  output: "stdout"        # Output: stdout, stderr, or file path

# Component configuration (array-based with CUE validation)
# Note: Input and output components are now implicitly managed through ruler configuration
components:

  - type: "alertProcessor"
    name: "main-processor"
    enabled: true
    config:
      batchSize: 100
      processingInterval: "30s"
      maxRetries: 3
      priority: "normal"

  - type: "alertStore"
    name: "memory-store"
    enabled: true
    config:
      backend: "memory"
      maxAlerts: 10000
      retentionDays: 30
      cleanupInterval: "1h"

  - type: "outputManager"
    name: "output-router"
    enabled: true
    config:
      defaultHandlers: ["logger"]
      retryAttempts: 3
      retryDelay: "30s"

# Rule-based alert processing
rules:
  - name: "drop-test-alerts"
    type: "drop"
    scope: "input"
    enabled: true
    priority: 10
    match:
      labels:
        environment: "test"
      severity: ["info", "debug"]
    operator: "and"
    actions:
      drop: true

  - name: "enrich-payment-alerts"
    type: "enrich"
    enabled: true
    match:
      labels:
        service: "payment-api"
    actions:
      add:
        labels:
          team: "SRE"
          escalation: "tier2"

Configuration Validation

The configuration is validated against a CUE schema that enforces:

Port Range: 1024-65535 for server ports with automatic collision detection
Log Levels: Only valid log levels (debug, info, warn, error)
Log Formats: Only supported formats (logfmt, JSON)
Output Targets: stdout, stderr, or valid file paths
Component Types: Validates component-specific configuration schemas
Component Names: Ensures unique component names and valid identifiers
Rule Syntax: Validates rule definitions, match conditions, and actions

📚 Documentation

Comprehensive documentation is available in the docs/ directory:

System Architecture - Overall system architecture and design
API Documentation - HTTP API endpoints and webhook interfaces
Rule System - Rule-based alert processing documentation
CLI Documentation - Command-line interface and usage
Configuration Management - CUE-based configuration system
Logging System - Structured logging with slog
Server Architecture - Server lifecycle and component management
Testing Strategy - Comprehensive testing framework covering unit, integration, benchmark, chaos, and contract testing
Webhook System - Webhook input system documentation

🔍 Current Status

Hermes is currently in foundational development with these implemented features:

✅ Completed Features

CLI Interface: Cobra-based command-line interface with validation command
Configuration System: CUE-based schema validation and live reloading
Logging System: Structured logging with slog
Server Framework: Component lifecycle and graceful shutdown
Rule Engine: Complete rule-based alert processing system
Alert Store: In-memory alert storage with CUE validation
Input Processors: Webhook input for flexible alert ingestion
Output Handlers: Logger output for console and file logging
Implicit I/O Management: Input and output components managed through ruler configuration
Component System: Modular component architecture with array-based configuration
Code Quality: Google Go Style Guide compliance with optimized linting

🚧 In Development

HTTP API: REST endpoints for alert management and status
Persistent Storage: Database backend for alert store
Web UI: Real-time alert management interface
Metrics & Monitoring: Comprehensive observability and statistics

🎯 Planned Features

Plugin System: Dynamic plugin loading and management
Advanced Outputs: Email, Slack, PagerDuty, and other integrations
Alert Correlation: Intelligent alert grouping and correlation
Multi-tenancy: Support for multiple teams and organizations

🧪 Code Quality & Linting

Hermes follows the Google Go Style Guide with optimized golangci-lint configuration:

Run Linting

# Verify configuration
golangci-lint config verify

# Run all linters
golangci-lint run

# Run with timeout for large codebases
golangci-lint run --timeout=5m

# Auto-fix issues where possible
golangci-lint run --fix

Enabled Linters

Core Quality: govet, staticcheck, revive, errcheck, unused
Maintainability: gocyclo, gocognit, nestif, unconvert, unparam
Security: gosec, copyloopvar
Performance: perfsprint, prealloc
Project-Specific: sloglint, errorlint, contextcheck, depguard

Google Go Style Compliance

✅ Error Handling: All errors must be checked
✅ Documentation: Package comments required
✅ Naming: MixedCaps with proper initialisms
✅ Complexity: Cognitive complexity limits
✅ Security: Vulnerability detection enabled

🤝 Contributing

We welcome contributions! Please follow these guidelines:

Code Quality Standards

Google Go Style Guide: All code must follow Google's Go style guide
Linting: Ensure golangci-lint run passes without errors
Testing: Write comprehensive tests for new functionality
Documentation: Update relevant documentation for changes

Development Workflow

# 1. Fork and clone the repository
git clone <your-fork-url>
cd hermes

# 2. Create a feature branch
git checkout -b feature/your-feature-name

# 3. Make changes and test
go test ./...
golangci-lint run

# 4. Update documentation
# Edit relevant files in docs/ directory

# 5. Commit and push
git commit -m "feat: add your feature description"
git push origin feature/your-feature-name

# 6. Create a pull request

Code Review Process

All changes require review and approval
Automated checks must pass (linting, tests)
Documentation must be updated for user-facing changes
Breaking changes require discussion and migration plan

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 215 Commits
.augment/rules		.augment/rules
.idea		.idea
.junie		.junie
.vscode		.vscode
assets		assets
cmd		cmd
configs		configs
docs		docs
internal		internal
scripts		scripts
tests		tests
.aiignore		.aiignore
.air.toml		.air.toml
.dockerignore		.dockerignore
.editorconfig		.editorconfig
.gitignore		.gitignore
.golangci.yml		.golangci.yml
.gosec.json		.gosec.json
.markdownlint.yml		.markdownlint.yml
.yamllint.yaml		.yamllint.yaml
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
go.mod		go.mod
go.sum		go.sum
main.go		main.go

License

geekxflood/Hermes

Folders and files

Latest commit

History

Repository files navigation