SRE@Kyndryl

SRE Public Labs - OBSSIM

Version: 0.1.2
License: MIT

Architecture

Learning objectives

Learn the basics on monitoring and alerting tools
Learn how to deploy a monitoring platform to a Kubernetes cluster
Learn how use Prometheus, Grafana, Alertmanager, and exporters together

Pre-requisite knowledge

Familiarity with Kubernetes
Basic notions on Node.js (JavaScript) programming language
Good understanding of YAML (Yet Another Markup Language)

Kubernetes cluster

This lab runs on any K8s cluster with few adjusts on the ingresses, we recommend using a free trial account on either the Google Cloud Platform or Microsoft Azure.

You can create one through this documentation. You can download and install the Google CLI gcloud by checking this document.

Alternatively, you can create one through this documentation. You can download and install the Azure CLI az by checking this document.

Google Kubernetes Engine

In case you use a GCP account for this lab, we provided the Google Kubernetes Engine (GKE) recommended configuration below:

GKE Configuration

Parameter	Value
GKE mode	`Standard with static K8s version`
Location type	`Zonal`
Release channel	`None`
Kubernetes version	`1.27.x`
Number of nodes	`3`
Machine type	`e2-standard-2`
Image type	`cos_containerd`

GKE cluster creation

You can create a K8s cluster for this lab with the following commands:

gcloud auth login
gcloud container clusters create cluster-1 --no-enable-autoupgrade --enable-service-externalips --enable-kubernetes-alpha --region=<your_closest_region> --cluster-version=<k8s_version> --machine-type=e2-standard-2 --monitoring=NONE

kubectl configuration

You can configure your local kubectl environment and credentials with the following command:

gcloud container clusters get-credentials cluster-1 --zone <your_closest_region> --project <your_project_id>

Azure Kubernetes Services

In case you use a Azure account for this lab, we provided the Azure Kubernetes Services (AKS) recommended configuration below:

AKS Configuration

Parameter	Value
AKS SKU	`Basic`, `Free`
Type	`Microsoft.ContainerService/ManagedClusters`
Location	`East US` (closest to you`)
Auto Upgrade Type	`Disabled`
Kubernetes version	`1.27.x`
Node pools	`1 node pool`
Node size	`Standard_B4ms`
Network type	`Kubenet`
Network policy	`calico`

AKS cluster creation

You can create a K8s cluster for this lab with the following commands:

az login
az account set --subscription <your_subscription_id>
az aks create -g <your_resource_group_name> -n cluster-1 --auto-upgrade-channel none --network-plugin kubenet --network-policy calico --location <your_closest_location> --node-vm-size Standard_B4ms --kubernetes-version <1.24.x>

kubectl configuration

You can configure your local kubectl environment and credentials with the following command:

az aks get-credentials --resource-group <your_resource_group_name> --name cluster-1

File / folder	Description
grafana	`Kubernetes manifest files to deploy Grafana`
kube-state	`Kuberenetes manifest files to deploy kube-state-metrics (KSM)`
prom-alert	`Kubernetes manifest files to deploy Alermanager`
prom-blackbox	`Kubernetes manifest files to deploy Blackbox exporter`
prom-node	`Kubernetes manifest files to deploy Node exporter`
prom-server	`Kubernetes manifest files to deploy Prometheus Server`
deploy-monitoring.sh	`Shell script to deploy the monitoring platform into a K8s cluster`
promql-samples.md	`Examples of PromQL metrics that be used on Prometheus and Grafana`

Installation

Application deployment

The dummy Node.js application is located in this Docker hub repo. To deploy this app to your K8s cluster, use the following commands:

cd microservices
./deploy-app.sh

The deploy-app.sh script will create a Deployment in the default namespace that has 3 pods. Also, it will create a NodePort service and a GKE LoadBalancer service in the same namespace.

Monitoring platform deployment

To deploy Prometheus Server, Alertmanager, Grafana, kube-state-metrics, Node exporter, and Blackbox exporter, use the following commands:

cd microservices
./deploy-monitoring.sh

The deploy-monitoring.sh script will create the following objects inside the K8s cluster:

Component	namespace	Objects
Prometheus	monitoring	`ClusterRole`, `ClusterRoleBinding`, `Deployment`, `StorageClass`, `PersistentVolumeClaim`, `ConfigMap`, `Service`, and GKE `Ingress`
Alertmanager	monitoring	`Deployment`, `ConfigMap`, and `Service`
Node exporter	monitoring	`DeamonSet` and `Service`
Blackbox exporter	monitoring	`Deployment`, `ConfigMap`, and `Service`
Grafana	monitoring	`Deployment`, `ConfigMap`, `Service`, and GKE `Ingress`
kube-state-metrics	kube-system	`ClusterRole`, `ClusterRoleBinding`, `Deployment`, `ServiceAccount`, and `Service`

Configuration

App runtime config

This app follows most of the Twelve-Factor App framework, so you can pass environment variables to change its behavior. For instance, you can change the listener port and the memory threshold.

File: microservices/k8s/node-api-deployment.yaml

env:
  - name: PORT
    value: "8081"
  - name: MEMORY_THRESHOLD
    value: "50000000"

If you change the web server port, you will need to change the services (NodePort and LoadBalancer) as well.

Prometheus rules

File: monitoring/prom-server/prometheus-configmap.yaml

There are two alerts configured for the lab.

groups:
- name: OBSSIM Alerts
  rules:
  - alert: HighPodMemory
    expr: (container_memory_usage_bytes{namespace="default",image!="k8s.gcr.io/pause:3.5",name!=""} / (1024*1024) > 14)
    for: 5m
    labels:
      severity: critical
    annotations:
      title: Pods memory usage
      description: "Pods with high memory utilization\n VALUE = {{ printf \"%.2f\" $value}} MB\n LABELS = {{ $labels }}"
      message: "Pods have consumed over 14 Mbytes - (instance(s): {{  $labels.pod }})"
      summary: "Pods High Memory Usage - (instance(s): {{ $labels.pod }})"
      runbook_url: "https://ansibletower.example.com"
      dashboard_url: "https://grafana.example.com"
  - alert: AppHTTPResolveTimePercentile
    expr: (quantile_over_time(0.90,probe_http_duration_seconds{instance="http://<load-balancer-vip>:60000/fortune",phase="resolve"}[28d]) > 0.5)
    for: 5m
    labels:
      severity: critical
    annotations:
      title: App response time
      description: "App with high resolve time\n VALUE = {{ printf \"%.2f\" $value}} ms\n LABELS = {{ $labels }}"
      message: "App have high resolve time over SLO - (instance(s): {{  $labels.pod }})"
      summary: "App with high resolve time - (instance(s): {{ $labels.pod }})"
      runbook_url: "https://ansibletower.example.com"
      dashboard_url: "https://grafana.example.com"

Prometheus config

File: monitoring/prom-server/prometheus-configmap.yaml

The Prometheus Server configuration is in the prometheus.yml section in this manifest file.

Global parameters are under the global sub-section:

global:
  scrape_interval: 10s
  evaluation_interval: 10s

Alerting parameters are under the alerting sub-section:

alerting:
  alertmanagers:
  - scheme: http
    static_configs:
    - targets:
      - "alertmanager-service.monitoring.svc:9093"

Monitoring targets and their configurations are under the scrape_configs sub-section:

scrape_configs:
  # Blackbox exporter section (Static)
  - job_name: 'blackbox-exporter'
    scrape_interval: 10m
    metrics_path: /probe
    params:
      module: [http_2xx]
  ...

Alertmanager templates

File: monitoring/prom-alert/alertmanager-templateconfig.yaml

The Alertmanager templates for the notification systems is in this manifest file. default.tmpl has the general definitions of an alert for all notification systems. pagerduty.tmpl has specific template fields for a PagerDuty alert. slack.tmpl has the specific template fields for a Slack message.

Alertmanager config

File: monitoring/prom-alert/alertmanager-configmap.yaml

The Alertmanager global configuration is in this manifest file. It's pre-configured for PagerDuty and Slack integrations. You need to have a Slack app and PagerDuty instance available.

To configure the PagerDuty integration, you need to provide the integration key:

pagerduty_configs:
  - service_key: xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

To configure the Slack integration, you need to provide the Slack incoming webhook URL and channel name:

slack_configs:
- api_url: https://hooks.slack.com/services/XXXXXXXXXXX/YYYYYYYYYYY/000000000000000000000000
  channel: '#obssim-demo'

You can find more information on setting a PagerDuty development instance at this link.

And you can consult instructions on configuring a Slack incoming webhook at this link.

Grafana config

File: monitoring/grafana/grafana-config.yaml

The Grafana global configuration is described in this manifest file.

prometheus.yaml: |-
  {
      "apiVersion": 1,
      "datasources": [
          {
             "access":"proxy",
              "editable": true,
              "name": "prometheus",
              "orgId": 1,
              "type": "prometheus",
              "url": "http://prometheus-service.monitoring.svc:9090",
              "version": 1
          }
      ]
  }

It defines a Grafana datasource to be the Prometheus server inside the same namespace.

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
images		images
microservices		microservices
monitoring		monitoring
.gitignore		.gitignore
README.md		README.md
gcloud-gke-create.sh		gcloud-gke-create.sh

File / folder	Description
k8s	`Kubernetes manifest files`
docker-app.sh	`Shell script to deploy the app into a K8s cluster`
docker-build.sh	`Shell script used to build and push the Docker image`
Dockerfile	`Docker commands to build the image`
server.js	`Node.js application`

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

SRE@Kyndryl

SRE Public Labs - OBSSIM

Architecture

Learning objectives

Pre-requisite knowledge

Kubernetes cluster

Google Kubernetes Engine

Azure Kubernetes Services

Contents

Installation

Configuration

End of document

About

Uh oh!

Releases

Packages

Uh oh!

Languages

kyndryl-open-source/observability-sim-lab

Folders and files

Latest commit

History

Repository files navigation

SRE@Kyndryl

SRE Public Labs - OBSSIM

Architecture

Learning objectives

Pre-requisite knowledge

Kubernetes cluster

Google Kubernetes Engine

Azure Kubernetes Services

Contents

Installation

Configuration

End of document

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages