Skip to content

docs(metrics): updates for kafka exporter #11520

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 5 additions & 4 deletions documentation/assemblies/metrics/assembly-metrics.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -33,10 +33,6 @@ For more information on the metrics and monitoring tools, refer to the supportin
* {GrafanaHome}
* link:http://kafka.apache.org/documentation/#monitoring[Apache Kafka Monitoring^] describes JMX metrics exposed by Apache Kafka

//what is Consumer lag?
include::../../modules/metrics/con_kafka-exporter-lag.adoc[leveloffset=+1]
//Understanding metrics for Cruise Control
include::../../modules/metrics/con-metrics-cruise-control.adoc[leveloffset=+1]
//Example metrics files
include::assembly-metrics-config-files.adoc[leveloffset=+1]
//How to set up Prometheus
Expand All @@ -45,3 +41,8 @@ include::assembly_metrics-prometheus-setup.adoc[leveloffset=+1]
include::../../modules/metrics/proc_metrics-grafana-dashboard.adoc[leveloffset=+1]
//How to monitor custom resources managed by Strimzi
include::../../modules/metrics/proc_metrics-custom-resource-monitoring.adoc[leveloffset=+1]
//How to monitor Consumer lag
include::../../modules/metrics/con_kafka-exporter-lag.adoc[leveloffset=+1]
include::../../modules/metrics/proc-kafka-exporter-deploy.adoc[leveloffset=+2]
//Understanding metrics for Cruise Control
include::../../modules/metrics/con-metrics-cruise-control.adoc[leveloffset=+1]
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@
= Using Prometheus with Strimzi

[role="_abstract"]
You can use Prometheus to provide monitoring data for the example Grafana dashboards provided with Strimzi.
Use Prometheus to provide monitoring data for the example Grafana dashboards provided with Strimzi.

To expose metrics in Prometheus format, you add configuration to a custom resource.
You must also make sure that the metrics are scraped by your monitoring stack.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
// metrics/assembly-metrics.adoc

[id='con-metrics-cruise-control-{context}']
= Monitoring Cruise Control operations
= Cruise Control operations monitoring

[role="_abstract"]
Cruise Control monitors Kafka brokers in order to track the utilization of brokers, topics, and partitions.
Expand Down
26 changes: 9 additions & 17 deletions documentation/modules/metrics/con_kafka-exporter-lag.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -4,22 +4,16 @@

[id='con-metrics-kafka-exporter-lag-{context}']

= Monitoring consumer lag with Kafka Exporter
= Consumer lag monitoring

[role="_abstract"]
{kafka-exporter-project} is an open source project to enhance monitoring of Apache Kafka brokers and clients.
You can configure the `Kafka` resource to xref:assembly-metrics-setup-{context}[deploy Kafka Exporter with your Kafka cluster].
Kafka Exporter extracts additional metrics data from Kafka brokers related to offsets, consumer groups, consumer lag, and topics.
The metrics data is used, for example, to help identify slow consumers.
Lag data is exposed as Prometheus metrics, which can then be presented in Grafana for analysis.
{kafka-exporter-project} is an open source project that enhances the monitoring of Apache Kafka brokers and clients.
Kafka Exporter extracts additional metrics data from Kafka brokers related to consumer groups, consumer lag, topic offsets, and partitions.
The metrics are exposed in Prometheus format and can be collected by Prometheus, then visualized in Grafana.

Kafka Exporter reads from the `__consumer_offsets` topic, which stores information on committed offsets for consumer groups.
For Kafka Exporter to be able to work properly, consumer groups needs to be in use.

A Grafana dashboard for Kafka Exporter is one of a number of xref:ref-metrics-dashboards-{context}[example Grafana dashboards] provided by Strimzi.

IMPORTANT: Kafka Exporter provides only additional metrics related to consumer lag and consumer offsets.
For regular Kafka metrics, you have to configure the Prometheus metrics in xref:assembly-metrics-setup-{context}[Kafka brokers].
Kafka Exporter relies on data from the `__consumer_offsets` topic to report lag metrics.
This topic only contains information if consumer groups are actively committing offsets.
Consumer groups must therefore be in use for Kafka Exporter to function correctly.

Consumer lag indicates the difference in the rate of production and consumption of messages.
Specifically, consumer lag for a given consumer group indicates the delay between the last message in the partition and the message being currently picked up by that consumer.
Expand All @@ -34,16 +28,14 @@ This difference is sometimes referred to as the _delta_ between the producer off

Suppose a topic streams 100 messages a second. A lag of 1000 messages between the producer offset (the topic partition head) and the last offset the consumer has read means a 10-second delay.

[discrete]
== The importance of monitoring consumer lag
.Why monitor consumer lag?

For applications that rely on the processing of (near) real-time data, it is critical to monitor consumer lag to check that it does not become too big.
The greater the lag becomes, the further the process moves from the real-time processing objective.

Consumer lag, for example, might be a result of consuming too much old data that has not been purged, or through unplanned shutdowns.

[discrete]
== Reducing consumer lag
.Reducing consumer lag

Use the Grafana charts to analyze lag and to check if actions to reduce lag are having an impact on an affected consumer group.
If, for example, Kafka brokers are adjusted to reduce lag, the dashboard will show the _Lag by consumer group_ chart going down and the _Messages consumed per minute_ chart going up.
Expand Down
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
// This assembly is included in the following assemblies:
//
// metrics/assembly_metrics-kafka.adoc
// metrics/assembly_metrics-prometheus-setup.adoc

[id='proc-jmx-exporter-metrics-kafka-deploy-options-{context}']
= Enabling Prometheus JMX Exporter
Expand All @@ -25,11 +25,6 @@ Set `enableMetrics` to `true` to expose metrics for the following:
** Configure in the `Kafka` resource for `oauth` or `keycloak` cluster authorization, or `oauth` listener authentication.
** Configure in the `KafkaBridge`, `KafkaConnect`, or `KafkaMirrorMaker2` resources for `oauth` authentication.

To include xref:con-metrics-kafka-exporter-lag-str[Kafka Exporter] metrics, add `kafkaExporter` configuration to the `Kafka` resource.

IMPORTANT: Kafka Exporter provides additional metrics for consumer lag and offsets only.
You still need to configure Prometheus metrics in the `Kafka` resource to collect standard Kafka metrics.

You can create your own Prometheus configuration or use the xref:assembly-metrics-config-files-{context}[example custom resource files] provided with Strimzi:

* `kafka-metrics.yaml`
Expand Down Expand Up @@ -96,69 +91,6 @@ data:
<1> Copy the `metricsConfig` property that references the `ConfigMap` containing metrics configuration.
<2> Copy the whole `ConfigMap` specifying the metrics configuration.

. To deploy Kafka Exporter, add `kafkaExporter` configuration.
+
`kafkaExporter` configuration is specified only in the `Kafka` resource.
+
.Example configuration for deploying Kafka Exporter
[source,yaml,subs="attributes+"]
----
apiVersion: {KafkaApiVersion}
kind: Kafka
metadata:
name: my-cluster
spec:
# ...
kafkaExporter:
image: my-registry.io/my-org/my-exporter-cluster:latest # <1>
groupRegex: ".*" # <2>
topicRegex: ".*" # <3>
groupExcludeRegex: "^excluded-.*" # <4>
topicExcludeRegex: "^excluded-.*" # <5>
showAllOffsets: false # <6>
resources: # <7>
requests:
cpu: 200m
memory: 64Mi
limits:
cpu: 500m
memory: 128Mi
logging: debug # <8>
enableSaramaLogging: true # <9>
template: # <10>
pod:
metadata:
labels:
label1: value1
imagePullSecrets:
- name: my-docker-credentials
securityContext:
runAsUser: 1000001
fsGroup: 0
terminationGracePeriodSeconds: 120
readinessProbe: # <11>
initialDelaySeconds: 15
timeoutSeconds: 5
livenessProbe: # <12>
initialDelaySeconds: 15
timeoutSeconds: 5
# ...
----
<1> ADVANCED OPTION: Container image configuration, which is recommended only in special situations.
<2> A regular expression to specify the consumer groups to include in the metrics.
<3> A regular expression to specify the topics to include in the metrics.
<4> A regular expression to specify the consumer groups to exclude in the metrics.
<5> A regular expression to specify the topics to exclude in the metrics.
<6> By default, metrics are collected for all consumers regardless of their connection status. Setting `showAllOffsets` to `false` stops collecting metrics on disconnected consumers.
<7> CPU and memory resources to reserve.
<8> Logging configuration, to log messages with a given severity (debug, info, warn, error, fatal) or above.
<9> Boolean to enable Sarama logging, a Go client library used by Kafka Exporter.
<10> Customization of deployment templates and pods.
<11> Healthcheck readiness probes.
<12> Healthcheck liveness probes.

NOTE: For Kafka Exporter to be able to work properly, consumer groups need to be in use.

.Enabling metrics for Kafka Bridge

To expose metrics for Kafka Bridge, set the `enableMetrics` property to `true` in the `KafkaBridge` resource.
Expand Down
108 changes: 108 additions & 0 deletions documentation/modules/metrics/proc-kafka-exporter-deploy.adoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,108 @@
// This assembly is included in the following assemblies:
//
// metrics/assembly-metrics.adoc

[id='proc-kafka-exporter-deploy-{context}']
= Deploying Kafka Exporter

[role="_abstract"]
To monitor consumer lag in your Kafka cluster, configure Kafka Exporter in the `Kafka` custom resource.
Kafka Exporter exposes lag data as Prometheus metrics, which can be visualized in Grafana.
A Grafana dashboard for Kafka Exporter is included in the xref:ref-metrics-dashboards-{context}[example Grafana dashboards] provided by Strimzi.

IMPORTANT: Kafka Exporter provides only metrics related to consumer groups and lag.
To collect general Kafka metrics, configure metrics on the Kafka brokers.
For more information, see xref:assembly-metrics-setup-{context}[].

.Prerequisites

* Consumer groups must be in use.
+
Kafka Exporter relies on data from the `__consumer_offsets` topic to report lag metrics.
This topic only contains information if consumer groups are actively committing offsets.

.Procedure

. Add `kafkaExporter` configuration to the `spec` section of the `Kafka` resource.
+
.Example configuration for deploying Kafka Exporter
[source,yaml,subs="attributes+"]
----
apiVersion: {KafkaApiVersion}
kind: Kafka
metadata:
name: my-cluster
spec:
# ...
kafkaExporter:
# Collection filters (recommended)
groupRegex: ".*" # <1>
topicRegex: ".*" # <2>
groupExcludeRegex: "^excluded-.*" # <3>
topicExcludeRegex: "^excluded-.*" # <4>
# Resources requests and limits (recommended)
resources: # <6>
requests:
cpu: 200m
memory: 64Mi
limits:
cpu: 500m
memory: 128Mi
# Metrics for all consumers (optional)
showAllOffsets: false # <5>
# Logging configuration (optional)
logging: debug # <7>
# Sarama logging (optional)
enableSaramaLogging: true # <8>
# Readiness probe (optional)
readinessProbe: # <9>
initialDelaySeconds: 15
timeoutSeconds: 5
# Liveness probe (optional)
livenessProbe: # <10>
initialDelaySeconds: 15
timeoutSeconds: 5
# Pod template (optional)
template: # <11>
pod:
metadata:
labels:
label1: value1
imagePullSecrets:
- name: my-docker-credentials
securityContext:
runAsUser: 1000001
fsGroup: 0
terminationGracePeriodSeconds: 120
# Custom image (optional)
image: my-registry.io/my-org/my-exporter-cluster:latest # <12>
# ...
----
<1> Regular expression to specify consumer groups to include in metrics.
<2> Regular expression to specify topics to include in metrics.
<3> Regular expression to specify consumer groups to exclude from metrics.
<4> Regular expression to specify topics to exclude from metrics.
<5> By default, metrics are collected for all consumers regardless of connection status. Setting `showAllOffsets` to `false` stops collecting metrics for disconnected consumers.
<6> CPU and memory resources to reserve.
<7> Logging configuration, to log messages with a given severity (debug, info, warn, error, fatal) or above.
<8> Boolean to enable Sarama logging, which provides detailed logs from the Go client library used by Kafka Exporter. Useful for debugging Kafka client interactions.
<9> Readiness probe to check when Kafka Exporter is ready to serve metrics.
<10> Liveness probe to detect and restart Kafka Exporter if it becomes unresponsive.
<11> Template customization. Here a pod is scheduled with additional security attributes.
<12> **ADVANCED OPTION:** Container image configuration, which is recommended only in special situations.

. Apply the changes to the `Kafka` configuration.
+
Resources, including a `Service` and `Pod`, are created for the Kafka Exporter with the naming convention `<kafka_cluster_name>-kafka-exporter`.

. Configure Prometheus to scrape metrics from the Kafka Exporter endpoint.
+
If you are using the example Prometheus deployment, it is already set up to discover and scrape Kafka Exporter metrics.
The `PodMonitor` resource named `kafka-resources-metrics` matches the `strimzi.io/kind: Kafka` label used to identify the Kafka Exporter.
For more information, see xref:proc-metrics-deploying-prometheus-{context}[].

. Import the Kafka Exporter dashboard into Grafana to visualize consumer lag.
+
For more information, see xref:proc-metrics-grafana-dashboard-{context}[].
+
TIP: Use the _Lag by consumer group_ and _Messages consumed per second_ panels to evaluate lag and the impact of tuning actions.
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
// This assembly is included in the following assemblies:
//
// metrics/assembly_metrics-kafka.adoc
// metrics/assembly_metrics-prometheus-setup.adoc

[id='proc-metrics-reporter-kafka-deploy-options-{context}']
= Enabling Strimzi Metrics Reporter
Expand Down
Loading