This repository contains configurations and scripts to implement a full-stack observability layer for CI/CD pipelines, as described in the CI/CD observability handbook. The setup includes Grafana Loki, a lightweight ELK stack, Vector, and OpenTelemetry for log aggregation, metrics, and tracing.
- loki/: Configuration files for Grafana Loki and Promtail to aggregate and scrape logs.
loki-config.yaml
includes core setup, retention policies (7-day log retention), and storage settings.promtail-config.yaml
includes setups for both general system logs (/var/log/*.log
) and GitHub Actions logs (/var/log/gha/*.log
). - elk/: Docker Compose and configuration files for a lightweight ELK stack (Elasticsearch, Logstash, Kibana) and Vector.
- scripts/: Shell scripts for log collection, analysis, notifications, and self-healing pipelines.
- examples/: Example CI/CD pipeline configurations (GitHub Actions, GitLab CI, Jenkins) and code snippets for logging and tracing. The
github-actions-workflow.yml
includes both log forwarding to Loki and correlation ID for tracing.
- Navigate to
loki/
. - Run
docker-compose up -d
to start Loki and Promtail. - Update
loki-config.yaml
for storage paths or retention periods if needed (e.g.,/tmp/loki/chunks
for log storage, 7-day retention). - Update
promtail-config.yaml
to point to your CI/CD log directories (e.g.,/var/log/*.log
for system logs,/var/log/gha/*.log
for GitHub Actions).
- Navigate to
elk/
. - Run
docker-compose up -d
to start Elasticsearch, Logstash, and Kibana. - Configure Filebeat (
filebeat.yml
) or Vector (vector.toml
) to forward logs to Logstash or Elasticsearch.
- Use scripts in
scripts/
for automated log collection (collect_logs.sh
), error analysis (log-analysis.sh
), notifications (slack-notification.sh
), and self-healing (self-healing.sh
).
- Use example configurations in
examples/
for GitHub Actions, GitLab CI, or Jenkins to forward logs to Loki or ELK. Thegithub-actions-workflow.yml
demonstrates running tests with log forwarding to Loki and logging with a correlation ID for traceability. - Implement structured logging with correlation IDs using
winston-logger.js
or tracing withopentelemetry-python.py
.
- Configure log rotation with
logrotate.conf
and Docker logging withdaemon.json
. - Set retention policies in
loki-config.yaml
(Loki, 7-day retention) orelasticsearch-ilm.json
(ELK).
- Docker and Docker Compose for running Loki and ELK.
- Node.js for running JavaScript examples (
winston-logger.js
,prometheus-exemplar.js
). - Python for OpenTelemetry example (
opentelemetry-python.py
). - At least 4GB RAM and 20GB disk space for Loki; 8GB RAM and 30GB disk for ELK.
- Replace placeholders like
http://your-loki-endpoint
orYOUR_SLACK_WEBHOOK_URL
with your actual endpoints. - Ensure logs are written to the paths specified in configuration files (e.g., `/var/log/ci/*.