Skip to content
This repository was archived by the owner on Oct 21, 2020. It is now read-only.

Commit c753e2e

Browse files
author
pilillo
committed
added README for monitoring stack
1 parent 3a94e33 commit c753e2e

File tree

1 file changed

+53
-0
lines changed
  • infrastructure/components/monitoring-stack

1 file changed

+53
-0
lines changed
Lines changed: 53 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,53 @@
1+
# Monitoring Stack
2+
3+
This component installs the CoreOS Prometheus operator, along with an optional [Grafana](https://prometheus.io/docs/visualization/grafana/) dashboard to visualize collected metrics.
4+
5+
## Prometheus
6+
7+
Prometheus is an open-source platform to collect and store metrics from monitored components, which are accessed by scraping their exposed HTTP endpoints or by using specific exporters.
8+
Prometheus consists of a number of components:
9+
* a [server](https://github.com/prometheus/prometheus), which scrapes (pull) and stores time series;
10+
* a bunch of [exporters](https://prometheus.io/docs/instrumenting/exporters/) from most common systems; which on K8s are typically deployed as sidecar container to monitor the main pod and expose Prometheus metrics;
11+
* an [alert manager](https://github.com/prometheus/alertmanager) that takes care of handling alert messages;
12+
* a [push gateway](https://github.com/prometheus/pushgateway) where short-lived jobs can export their metrics, i.e. they push the metrics to this endpoint instead of waiting Prometheus to pull them;
13+
* a [query engine (PromQL)](https://prometheus.io/docs/prometheus/latest/querying/basics/) to expose the metrics and allow for basic processing capabilities, and an expression browser, available at `graph`, allowing to evaluate an expression and visualize it as
14+
either table or graph;
15+
16+
![Prometheus architecture](https://prometheus.io/assets/architecture.png)
17+
18+
### Data Model
19+
Prometheus stores all data as time series, i.e. timestamped values grouped under a specific metric name.
20+
In practice, a metric consists of multiple domains or labels, thus prometheus a metric is a multivariate time series. Metrics can be [exposed](https://prometheus.io/docs/instrumenting/exposition_formats/) using a simple text format.
21+
A metric has format `<metric name>{<label name>=<label value>, ...} <value> <timestamp>`, where value is a float related to the metric and timestamp is an int64 (milliseconds since epoch).
22+
Labels enrich the time series by providing context. The name is a UTF-8 string, whereas the value can be of numerical type float as well as `NaN`, `+Inf`, `-Inf`. Naming convention is described [here](https://prometheus.io/docs/practices/naming/). As perceivable,
23+
Prometheus does not actually care of labels data types and flattens them into an untyped time series.
24+
More complex [metrics](https://prometheus.io/docs/concepts/metric_types/), such as counters, gauges, histograms and summaries, can be defined on data. See [this](https://prometheus.io/docs/instrumenting/exposition_formats/#histograms-and-summaries) example.
25+
26+
By default metrics are stored in a local folder for a period of 15 days. On K8s this folder can be maintained on a persistent volume (PV) or a [StatefulSet](https://kubernetes.io/docs/tutorials/stateful-application/basic-stateful-set/) can be otherwise used to
27+
deploy the Prometheus server so that each of its pod are uniquely binded to specific volumes that are unambiguously re-used in case of failure and restart.
28+
29+
### Exporters
30+
31+
[Exporters](https://prometheus.io/docs/instrumenting/exporters/) are used in those cases in which a service does not directly expose Prometheus metrics.
32+
Typical examples are:
33+
* the [JMX exporter](https://github.com/prometheus/jmx_exporter) for JVM-based applications;
34+
* the [cAdvisor](https://prometheus.io/docs/guides/cadvisor/) to monitor docker containers;
35+
* the [node exporter](https://github.com/prometheus/node_exporter) for the node phyisical metrics (e.g. CPU, RAM)
36+
* the [kube-state-metrics](https://github.com/kubernetes/kube-state-metrics) that accesses the k8s gateway and exports cluster metrics (e.g. deployment, pods)
37+
38+
## The Prometheus K8s Operator
39+
40+
As mentioned, this component installs the Prometheus K8s operator, along with the alert manager and grafana.
41+
As such, the component eases not only the setup but also the maintanance of Prometheus and its monitored targets.
42+
The operator uses a [custom controller](https://kubernetes.io/docs/concepts/extend-kubernetes/api-extension/custom-resources/#custom-controllers) to implement basic expert rules for cluster management, such as cluster setup (prometheus servers, alert manager,
43+
grafana, kube-state-metrics and host node_exporter) and cluster scale up/down.
44+
Moreover, the operator uses Custom Resource Definitions (CRDs) and ConfigMaps to make Prometheus configuration accessible as any other K8s resource, specifically:
45+
* `Prometheus` defining the prometheus server deployment;
46+
* `Alertmanager` defining the alert manager deployment;
47+
* `ServiceMonitor` defining the actual targets to be monitored by the server, by automatically generating scraping rules for those;
48+
* `PrometheusRule` defining prometheus rules, i.e. i) [recording rules]((https://prometheus.io/docs/prometheus/latest/configuration/recording_rules/)) to perform certain calculations and produce new time series and ii) [alerting
49+
rules](https://prometheus.io/docs/prometheus/latest/configuration/alerting_rules/) to define event-condition-action rules and send alerts on external services;
50+
51+
A more complete guide to setting up rules on the Prometheus operator is provided [here](https://sysdig.com/blog/kubernetes-monitoring-with-prometheus-alertmanager-grafana-pushgateway-part-2/) and
52+
[here](https://sysdig.com/blog/kubernetes-monitoring-prometheus-operator-part3/).
53+

0 commit comments

Comments
 (0)