Hermes is a next-generation alert management system, written in Go, designed for high-scale, low-latency environments. Its modular architecture, rule-based processing engine, and CUE-based configuration system enable precise, automated alert routing and processing from multiple sources. Hermes ensures that the right alerts reach the right people, with the right context, at the right time.
Hermes is built to help modern SRE and operations teams manage alert noise.
Key principles:
- Modular: Each component is independently deployable and scalable
- Rule-Based: Flexible, declarative rule engine for alert processing
- Configurable: CUE-based configuration with live reloading and validation
- Reliable: Built-in graceful shutdown, component lifecycle management
- Observable: Structured logging with slog and comprehensive error handling
git clone git@github.com:geekxflood/hermes.git
cd hermes
go build -o bin/hermes ./main.go
./bin/hermes --config ./configs/config.yaml
The server will start and display:
time=2024-01-15T10:30:45.123Z level=INFO msg="Server starting..." host=127.0.0.1 port=8080
time=2024-01-15T10:30:45.456Z level=INFO msg="Component initialized" component=database
time=2024-01-15T10:30:45.789Z level=INFO msg="Component initialized" component=cache
time=2024-01-15T10:30:46.012Z level=INFO msg="Server started successfully"
Edit configs/config.yaml
and save - the configuration will automatically reload.
Hermes currently implements the foundational components for a robust alert management system:
graph TD
CLI[Hermes CLI] --> CM[Config Manager]
CM --> S[Server]
S --> C[Components]
C --> HS[HTTP Server]
C --> IM[Input Manager]
C --> AP[Alert Processor]
C --> R[Ruler Component]
C --> AS[Alert Store]
C --> OM[Output Manager]
HS --> IM
IM --> WH[Webhook Handler]
IM --> AP
AP --> R
AP --> AS
AP --> OM
R --> RE[Rule Engine]
R --> RT[Rule Types]
OM --> LO[Logger Output]
CM --> CUE[CUE Schema Validation]
CM --> FW[File Watcher]
S --> L[Structured Logging]
The system provides a complete alert management pipeline with rule-based processing, configuration management, server lifecycle, and component orchestration.
hermes/
βββ cmd/ # Cobra CLI entrypoint
β βββ root.go # Main CLI command with server startup
β βββ validate.go # Configuration validation command
βββ internal/ # Core application logic
β βββ alert/ # Alert data structures and utilities
β β βββ alert.go # Alert struct and store implementation
β β βββ alert_test.go # Alert tests
β β βββ templates/ # Alert CUE schema templates
β βββ common/ # Shared functionality and utilities
β β βββ constants.go # System constants
β β βββ inputs.go # Input processing utilities
β β βββ outputs.go # Output processing utilities
β β βββ ulid.go # ULID generation utilities
β β βββ utils.go # General utilities
β βββ config/ # Configuration management with CUE
β β βββ config.go # Config loading and validation
β β βββ manager.go # Live reloading and change notifications
β β βββ templates/ # CUE schema definitions
β βββ httpserver/ # HTTP server for webhook endpoints
β β βββ httpserver.go # HTTP server implementation
β βββ inputmgr/ # Input management system
β β βββ inputmgr.go # Input manager component
β β βββ inputmgr_test.go # Input manager tests
β βββ inputs/ # Input processors and handlers
β β βββ inputwebhook/ # Webhook input handler implementation
β β βββ webhook_handler.go # Webhook input handler
β β βββ webhook_handler_test.go # Webhook handler tests
β βββ logging/ # Structured logging with slog
β β βββ logging.go # Logger initialization and utilities
β βββ outputmgr/ # Output management system
β β βββ outputmgr.go # Output manager component
β βββ outputs/ # Output handlers
β β βββ outputlogger/ # Console/file logging output
β βββ processor/ # Alert processing pipeline
β β βββ processor.go # Alert processor component
β βββ ruler/ # Rule-based alert processing
β β βββ ruler.go # Main ruler implementation
β β βββ engine.go # Rule evaluation engine
β β βββ rules.go # Rule type implementations
β β βββ benchmark_test.go # Performance benchmarks
β βββ server/ # Server and component lifecycle
β β βββ server.go # Main server with graceful shutdown
β βββ testutil/ # Testing utilities and mocks
β β βββ testutil.go # Test helper functions
βββ configs/ # Configuration files
β βββ config.yaml # Default configuration
βββ docs/ # Documentation
β βββ architecture.md # System architecture overview
β βββ cmd.md # CLI documentation
β βββ config.md # Configuration management
β βββ logging.md # Logging system
β βββ ruler.md # Rule system documentation
β βββ server.md # Server architecture
β βββ testing.md # Testing strategy
β βββ testing-implementation-summary.md # Testing implementation details
β βββ webhook.md # Webhook input system
βββ scripts/ # Build and utility scripts
β βββ test.sh # Test execution script
βββ tests/ # Comprehensive test suite
β βββ testutil/ # Test utilities and factories
β β βββ factories/ # Test data factories
β β βββ mocks/ # External service mocks
β βββ integration/ # BDD-style integration tests
β βββ benchmark/ # Performance benchmark tests
β βββ chaos/ # Chaos engineering tests
β βββ contract/ # API contract validation tests
β βββ testdata/ # Static test data
βββ main.go # Application entry point
Hermes uses CUE for configuration schema validation and management:
- Type Safety: Strict validation of configuration values
- Constraints: Port ranges, log levels, and format validation
- Defaults: Sensible default values for all settings
- Documentation: Self-documenting schema with constraints
- File Watching: Automatic detection of configuration changes
- Section Notifications: Granular change notifications for specific config sections
- Graceful Updates: Non-disruptive configuration updates
- YAML: Primary configuration format (
.yaml
,.yml
) - JSON: Alternative configuration format (
.json
)
Hermes features a powerful rule engine that processes alerts through configurable rules:
- Drop: Remove unwanted alerts early in the pipeline
- Replace: Modify alert fields (severity, labels, annotations)
- Enrich: Add contextual information (team, escalation, metadata)
- Forward: Route alerts to specific outputs
- Suppress: Temporarily suppress alerts based on conditions
- Throttle: Rate-limit alerts to prevent flooding
- Template: Apply templated transformations to alert content
Input β Input Rules β Alert Store β Output Rules β Output Manager
rules:
- name: drop-test-alerts
type: drop
scope: input
enabled: true
priority: 10
match:
labels.environment: test
severity: info|debug
operator: and
actions:
drop: true
- name: enrich-payment-alerts
type: enrich
enabled: true
match:
labels.service: payment-api
actions:
add:
team: SRE
escalation: tier2
- Priority-Based Execution: Rules execute in configurable priority order
- Flexible Matching: Support for exact, regex, contains, and complex patterns
- Live Reloading: Rules can be updated without service restart
- Performance Monitoring: Comprehensive statistics and execution metrics
- Dry-Run Mode: Test rules without affecting alert processing
For detailed rule documentation, see docs/ruler.md.
- Go 1.24.5+
- CUE CLI
# Clone the repository
git clone git@github.com:geekxflood/hermes.git
cd hermes
# Build the application (creates bin/ directory)
go build -o bin/hermes ./main.go
# Build with version information
VERSION=$(git describe --tags --always --dirty)
COMMIT=$(git rev-parse HEAD)
BUILD_TIME=$(date -u '+%Y-%m-%d_%H:%M:%S')
go build -ldflags "-X main.Version=$VERSION -X main.Commit=$COMMIT \
-X main.BuildTime=$BUILD_TIME" -o bin/hermes ./main.go
# Or use go run for development
go run ./main.go --help
# Install Air for live reloading (optional)
go install github.com/air-verse/air@latest
# Run with live reloading
air
# Or run directly
go run ./main.go --config ./configs/config.yaml
# Show help and available options
hermes --help
# Start Hermes with default configuration
hermes --config ./configs/config.yaml
# Start with debug logging
hermes --config ./configs/config.yaml --debug
# Validate configuration without starting server
hermes validate --config ./configs/config.yaml
# Start with custom configuration file
hermes --config /path/to/your/config.yaml
Command | Description |
---|---|
hermes |
Start the Hermes server (default command) |
validate |
Validate configuration file against schema |
completion |
Generate autocompletion script for shell |
help |
Show help information for any command |
Flag | Short | Description | Default |
---|---|---|---|
--config |
-c |
Configuration file path | Required |
--debug |
-d |
Enable debug logging | false |
--help |
-h |
Show help information | - |
# Server configuration
server:
port: 8080 # Server port (1024-65535)
host: "127.0.0.1" # Server host/IP address
# Logging configuration
logging:
level: "info" # Log level: debug, info, warn, error
format: "logfmt" # Log format: logfmt, json
output: "stdout" # Output: stdout, stderr, or file path
# Component configuration (array-based with CUE validation)
# Note: Input and output components are now implicitly managed through ruler configuration
components:
- type: "alertProcessor"
name: "main-processor"
enabled: true
config:
batchSize: 100
processingInterval: "30s"
maxRetries: 3
priority: "normal"
- type: "alertStore"
name: "memory-store"
enabled: true
config:
backend: "memory"
maxAlerts: 10000
retentionDays: 30
cleanupInterval: "1h"
- type: "outputManager"
name: "output-router"
enabled: true
config:
defaultHandlers: ["logger"]
retryAttempts: 3
retryDelay: "30s"
# Rule-based alert processing
rules:
- name: "drop-test-alerts"
type: "drop"
scope: "input"
enabled: true
priority: 10
match:
labels:
environment: "test"
severity: ["info", "debug"]
operator: "and"
actions:
drop: true
- name: "enrich-payment-alerts"
type: "enrich"
enabled: true
match:
labels:
service: "payment-api"
actions:
add:
labels:
team: "SRE"
escalation: "tier2"
The configuration is validated against a CUE schema that enforces:
- Port Range: 1024-65535 for server ports with automatic collision detection
- Log Levels: Only valid log levels (debug, info, warn, error)
- Log Formats: Only supported formats (logfmt, JSON)
- Output Targets: stdout, stderr, or valid file paths
- Component Types: Validates component-specific configuration schemas
- Component Names: Ensures unique component names and valid identifiers
- Rule Syntax: Validates rule definitions, match conditions, and actions
Comprehensive documentation is available in the docs/
directory:
- System Architecture - Overall system architecture and design
- API Documentation - HTTP API endpoints and webhook interfaces
- Rule System - Rule-based alert processing documentation
- CLI Documentation - Command-line interface and usage
- Configuration Management - CUE-based configuration system
- Logging System - Structured logging with slog
- Server Architecture - Server lifecycle and component management
- Testing Strategy - Comprehensive testing framework covering unit, integration, benchmark, chaos, and contract testing
- Webhook System - Webhook input system documentation
Hermes is currently in foundational development with these implemented features:
- CLI Interface: Cobra-based command-line interface with validation command
- Configuration System: CUE-based schema validation and live reloading
- Logging System: Structured logging with slog
- Server Framework: Component lifecycle and graceful shutdown
- Rule Engine: Complete rule-based alert processing system
- Alert Store: In-memory alert storage with CUE validation
- Input Processors: Webhook input for flexible alert ingestion
- Output Handlers: Logger output for console and file logging
- Implicit I/O Management: Input and output components managed through ruler configuration
- Component System: Modular component architecture with array-based configuration
- Code Quality: Google Go Style Guide compliance with optimized linting
- HTTP API: REST endpoints for alert management and status
- Persistent Storage: Database backend for alert store
- Web UI: Real-time alert management interface
- Metrics & Monitoring: Comprehensive observability and statistics
- Plugin System: Dynamic plugin loading and management
- Advanced Outputs: Email, Slack, PagerDuty, and other integrations
- Alert Correlation: Intelligent alert grouping and correlation
- Multi-tenancy: Support for multiple teams and organizations
Hermes follows the Google Go Style Guide with optimized golangci-lint configuration:
# Verify configuration
golangci-lint config verify
# Run all linters
golangci-lint run
# Run with timeout for large codebases
golangci-lint run --timeout=5m
# Auto-fix issues where possible
golangci-lint run --fix
- Core Quality:
govet
,staticcheck
,revive
,errcheck
,unused
- Maintainability:
gocyclo
,gocognit
,nestif
,unconvert
,unparam
- Security:
gosec
,copyloopvar
- Performance:
perfsprint
,prealloc
- Project-Specific:
sloglint
,errorlint
,contextcheck
,depguard
- β Error Handling: All errors must be checked
- β Documentation: Package comments required
- β Naming: MixedCaps with proper initialisms
- β Complexity: Cognitive complexity limits
- β Security: Vulnerability detection enabled
We welcome contributions! Please follow these guidelines:
- Google Go Style Guide: All code must follow Google's Go style guide
- Linting: Ensure
golangci-lint run
passes without errors - Testing: Write comprehensive tests for new functionality
- Documentation: Update relevant documentation for changes
# 1. Fork and clone the repository
git clone <your-fork-url>
cd hermes
# 2. Create a feature branch
git checkout -b feature/your-feature-name
# 3. Make changes and test
go test ./...
golangci-lint run
# 4. Update documentation
# Edit relevant files in docs/ directory
# 5. Commit and push
git commit -m "feat: add your feature description"
git push origin feature/your-feature-name
# 6. Create a pull request
- All changes require review and approval
- Automated checks must pass (linting, tests)
- Documentation must be updated for user-facing changes
- Breaking changes require discussion and migration plan
This project is licensed under the MIT License - see the LICENSE file for details.