Skip to content

Commit f75ecae

Browse files
authored
Merge pull request #96166 from openshift-cherrypick-robot/cherry-pick-94828-to-standalone-logging-docs-6.3
[standalone-logging-docs-6.3] OBSDOCS-1801: Loki - Adjust Queries for both OTEL and ViaQ
2 parents f24d229 + 8c61577 commit f75ecae

11 files changed

+168
-69
lines changed

_topic_maps/_topic_map.yml

Lines changed: 9 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -52,7 +52,6 @@ Topics:
5252
# File: logging-affinity-and-anti-afinity
5353
#---
5454

55-
#---
5655
#Name: Configuring your Logging deployment
5756
#Dir: config
5857
#Distros: openshift-logging
@@ -103,14 +102,15 @@ Topics:
103102
# File: cluster-logging-dashboards
104103
#- Name: Log visualization with Kibana
105104
# File: logging-kibana
106-
#---
107-
#Name: Logging alerts
108-
#Dir: logging_alerts
109-
#Topics:
110-
#- Name: Default logging alerts
111-
# File: default-logging-alerts
112-
#- Name: Custom logging alerts
113-
# File: custom-logging-alerts
105+
---
106+
Name: Logging alerts
107+
Dir: logging_alerts
108+
Distros: openshift-logging
109+
Topics:
110+
- Name: Default logging alerts
111+
File: default-logging-alerts
112+
- Name: Custom logging alerts
113+
File: custom-logging-alerts
114114
#---
115115
#Name: Performance and reliability tuning
116116
#Dir: performance_reliability

logging_alerts/custom-logging-alerts.adoc

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -6,11 +6,11 @@ include::_attributes/common-attributes.adoc[]
66

77
toc::[]
88

9-
In logging 5.7 and later versions, users can configure the LokiStack deployment to produce customized alerts and recorded metrics. If you want to use customized link:https://grafana.com/docs/loki/latest/alert/[alerting and recording rules], you must enable the LokiStack ruler component.
9+
You can configure the LokiStack deployment to produce customized alerts and recorded metrics. If you want to use customized link:https://grafana.com/docs/loki/latest/alert/[alerting and recording rules], you must enable the LokiStack ruler component.
1010

11-
LokiStack log-based alerts and recorded metrics are triggered by providing link:https://grafana.com/docs/loki/latest/query/[LogQL] expressions to the ruler component. The {loki-op} manages a ruler that is optimized for the selected LokiStack size, which can be `1x.extra-small`, `1x.small`, or `1x.medium`.
11+
LokiStack log-based alerts and recorded metrics are triggered by providing link:https://grafana.com/docs/loki/latest/query/[LogQL] (Grafana documentation) expressions to the ruler component.
1212

13-
To provide these expressions, you must create an `AlertingRule` custom resource (CR) containing Prometheus-compatible link:https://prometheus.io/docs/prometheus/latest/configuration/alerting_rules/[alerting rules], or a `RecordingRule` CR containing Prometheus-compatible link:https://prometheus.io/docs/prometheus/latest/configuration/recording_rules/[recording rules].
13+
To provide these expressions, you must create an `AlertingRule` custom resource (CR) containing link:https://prometheus.io/docs/prometheus/latest/configuration/alerting_rules/[alerting rules], or a `RecordingRule` CR containing Prometheus-compatible link:https://prometheus.io/docs/prometheus/latest/configuration/recording_rules/[recording rules] (Prometheus documentation).
1414

1515
Administrators can configure log-based alerts or recorded metrics for `application`, `audit`, or `infrastructure` tenants. Users without administrator permissions can configure log-based alerts or recorded metrics for `application` tenants of the applications that they have access to.
1616

logging_alerts/default-logging-alerts.adoc

Lines changed: 6 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -11,11 +11,15 @@ Logging alerts are installed as part of the {clo} installation. Alerts depend on
1111
Default logging alerts are sent to the {ocp-product-title} monitoring stack Alertmanager in the `openshift-monitoring` namespace, unless you have disabled the local Alertmanager instance.
1212

1313
// TODO MONITORING REMOVE DEPENDENCY
14-
include::modules/monitoring-accessing-the-alerting-ui.adoc[leveloffset=+1]
15-
include::modules/logging-collector-alerts.adoc[leveloffset=+1]
14+
include::modules/monitoring-accessing-the-alerting-ui.adoc[leveloffset=+1,tag=ADM]
15+
//include::modules/logging-collector-alerts.adoc[leveloffset=+1]
1616
include::modules/logging-vector-collector-alerts.adoc[leveloffset=+1]
17+
include::modules/loki-alerts.adoc[leveloffset=+1]
18+
19+
////
1720
include::modules/logging-fluentd-collector-alerts.adoc[leveloffset=+1]
1821
include::modules/cluster-logging-elasticsearch-rules.adoc[leveloffset=+1]
22+
////
1923

2024
[role="_additional-resources"]
2125
[id="additional-resources_default-logging-alerts"]

logging_alerts/docinfo.xml

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,12 @@
1+
<title>Logging alerts</title>
2+
<productname>{product-title}</productname>
3+
<productnumber>{product-version}</productnumber>
4+
<subtitle>Configuring logging alerts.</subtitle>
5+
<abstract>
6+
<para>This document provides information about configuring logging alerts.
7+
</para>
8+
</abstract>
9+
<authorgroup>
10+
<orgname>Red Hat OpenShift Documentation Team</orgname>
11+
</authorgroup>
12+
<xi:include href="Common_Content/Legal_Notice.xml" xmlns:xi="http://www.w3.org/2001/XInclude" />
Lines changed: 11 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -1,12 +1,12 @@
11
// Module included in the following assemblies:
22
//
3-
// * observability/logging/logging_alerts/custom-logging-alerts.adoc
3+
// * logging_alerts/custom-logging-alerts.adoc
44

55
:_mod-docs-content-type: PROCEDURE
66
[id="configuring-logging-loki-ruler_{context}"]
77
= Configuring the ruler
88

9-
When the LokiStack ruler component is enabled, users can define a group of link:https://grafana.com/docs/loki/latest/query/[LogQL] expressions that trigger logging alerts or recorded metrics.
9+
When the `LokiStack` ruler component is enabled, users can define a group of link:https://grafana.com/docs/loki/latest/query/[LogQL] (Grafana documentation) expressions that trigger logging alerts or recorded metrics.
1010

1111
Administrators can enable the ruler by modifying the `LokiStack` custom resource (CR).
1212

@@ -18,7 +18,7 @@ Administrators can enable the ruler by modifying the `LokiStack` custom resource
1818
1919
.Procedure
2020

21-
* Enable the ruler by ensuring that the `LokiStack` CR contains the following spec configuration:
21+
* Enable the ruler by ensuring that the `LokiStack` CR has the following spec configuration:
2222
+
2323
[source,yaml]
2424
----
@@ -30,14 +30,16 @@ metadata:
3030
spec:
3131
# ...
3232
rules:
33-
enabled: true <1>
34-
selector:
33+
enabled: true #<1>
34+
selector: #<2>
3535
matchLabels:
36-
openshift.io/<label_name>: "true" <2>
37-
namespaceSelector:
36+
<label_name>: "true" #<3>
37+
namespaceSelector: #<4>
3838
matchLabels:
39-
openshift.io/<label_name>: "true" <3>
39+
<label_name>: "true" #<5>
4040
----
4141
<1> Enable Loki alerting and recording rules in your cluster.
42-
<2> Add a custom label that can be added to namespaces where you want to enable the use of logging alerts and metrics.
42+
<2> Specify the selector for the alerting and recording resources.
4343
<3> Add a custom label that can be added to namespaces where you want to enable the use of logging alerts and metrics.
44+
<4> Specify the namespaces in which the alerting and recording rules are defined for the {loki-op}. If undefined, only the rules defined in the same namespace as the `LokiStack` are used.
45+
<5> Add a custom label that can be added to namespaces where you want to enable the use of logging alerts and metrics.

modules/logging-collector-alerts.adoc

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,12 +1,12 @@
11
// Module included in the following assemblies:
22
//
3-
// * logging/logging_alerts/default-logging-alerts.adoc
3+
// * logging_alerts/default-logging-alerts.adoc
44

55
:_content-type: REFERENCE
66
[id="logging-collector-alerts_{context}"]
77
= Logging collector alerts
88

9-
In logging 5.8 and later versions, the following alerts are generated by the {clo}. You can view these alerts in the {ocp-product-title} web console.
9+
The following alerts are generated by the {clo}. You can view these alerts in the {ocp-product-title} web console.
1010

1111
[cols="4", options="header"]
1212
|===

modules/logging-enabling-loki-alerts.adoc

Lines changed: 22 additions & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
// Module included in the following assemblies:
22
//
3-
// * observability/logging/logging_alerts/custom-logging-alerts.adoc
3+
// * logging_alerts/custom-logging-alerts.adoc
44

55
:_mod-docs-content-type: PROCEDURE
66
[id="logging-enabling-loki-alerts_{context}"]
@@ -12,14 +12,14 @@ The `AlertingRule` CR contains a set of specifications and webhook validation de
1212
* If an `AlertingRule` CR includes an invalid `for` period, it is an invalid alerting rule.
1313
* If an `AlertingRule` CR includes an invalid LogQL `expr`, it is an invalid alerting rule.
1414
* If an `AlertingRule` CR includes two groups with the same name, it is an invalid alerting rule.
15-
* If none of above applies, an alerting rule is considered valid.
15+
* If none of the above applies, an alerting rule is considered valid.
1616
1717
[options="header"]
1818
|================================================
1919
| Tenant type | Valid namespaces for `AlertingRule` CRs
20-
| application |
2120
| audit | `openshift-logging`
22-
| infrastructure | `openshift-/\*`, `kube-/\*`, `default`
21+
| infrastructure | `openshift-\*`, `kube-*`, `default`
22+
| application | All other namespaces.
2323
|================================================
2424

2525
.Prerequisites
@@ -38,30 +38,30 @@ The `AlertingRule` CR contains a set of specifications and webhook validation de
3838
kind: AlertingRule
3939
metadata:
4040
name: loki-operator-alerts
41-
namespace: openshift-operators-redhat <1>
42-
labels: <2>
43-
openshift.io/<label_name>: "true"
41+
namespace: openshift-operators-redhat #<1>
42+
labels: #<2>
43+
openshift.io/cluster-monitoring: "true"
4444
spec:
45-
tenantID: "infrastructure" <3>
45+
tenantID: infrastructure #<3>
4646
groups:
4747
- name: LokiOperatorHighReconciliationError
4848
rules:
4949
- alert: HighPercentageError
50-
expr: | <4>
50+
expr: | #<4>
5151
sum(rate({kubernetes_namespace_name="openshift-operators-redhat", kubernetes_pod_name=~"loki-operator-controller-manager.*"} |= "error" [1m])) by (job)
5252
/
5353
sum(rate({kubernetes_namespace_name="openshift-operators-redhat", kubernetes_pod_name=~"loki-operator-controller-manager.*"}[1m])) by (job)
5454
> 0.01
5555
for: 10s
5656
labels:
57-
severity: critical <5>
57+
severity: critical #<5>
5858
annotations:
59-
summary: High Loki Operator Reconciliation Errors <6>
60-
description: High Loki Operator Reconciliation Errors <7>
59+
summary: High Loki Operator Reconciliation Errors #<6>
60+
description: High Loki Operator Reconciliation Errors #<7>
6161
----
6262
<1> The namespace where this `AlertingRule` CR is created must have a label matching the LokiStack `spec.rules.namespaceSelector` definition.
6363
<2> The `labels` block must match the LokiStack `spec.rules.selector` definition.
64-
<3> `AlertingRule` CRs for `infrastructure` tenants are only supported in the `openshift-\*`, `kube-\*`, or `default` namespaces.
64+
<3> `AlertingRule` CRs for `infrastructure` tenants are only supported in the `openshift-\*`, `kube-*`, or `default` namespaces.
6565
<4> The value for `kubernetes_namespace_name:` must match the value for `metadata.namespace`.
6666
<5> The value of this mandatory field must be `critical`, `warning`, or `info`.
6767
<6> This field is mandatory.
@@ -74,23 +74,23 @@ The `AlertingRule` CR contains a set of specifications and webhook validation de
7474
kind: AlertingRule
7575
metadata:
7676
name: app-user-workload
77-
namespace: app-ns <1>
78-
labels: <2>
79-
openshift.io/<label_name>: "true"
77+
namespace: app-ns #<1>
78+
labels: #<2>
79+
openshift.io/cluster-monitoring: "true"
8080
spec:
81-
tenantID: "application"
81+
tenantID: application
8282
groups:
8383
- name: AppUserWorkloadHighError
8484
rules:
8585
- alert:
86-
expr: | <3>
87-
sum(rate({kubernetes_namespace_name="app-ns", kubernetes_pod_name=~"podName.*"} |= "error" [1m])) by (job)
86+
expr: | #<3>
87+
sum(rate({kubernetes_namespace_name="app-ns", kubernetes_pod_name=~"podName.*"} |= "error" [1m])) by (job)
8888
for: 10s
8989
labels:
90-
severity: critical <4>
90+
severity: critical #<4>
9191
annotations:
92-
summary: <5>
93-
description: <6>
92+
summary: This is an example summary. #<5>
93+
description: This is an example description. #<6>
9494
----
9595
<1> The namespace where this `AlertingRule` CR is created must have a label matching the LokiStack `spec.rules.namespaceSelector` definition.
9696
<2> The `labels` block must match the LokiStack `spec.rules.selector` definition.
Lines changed: 10 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -1,36 +1,30 @@
11
// Module included in the following assemblies:
22
//
3-
// * observability/logging/logging_alerts/default-logging-alerts.adoc
3+
// * logging_alerts/default-logging-alerts.adoc
44

55
:_mod-docs-content-type: REFERENCE
66
[id="logging-vector-collector-alerts_{context}"]
7-
= Vector collector alerts
7+
= {clo} alerts
88

9-
In logging 5.7 and later versions, the following alerts are generated by the Vector collector. You can view these alerts in the {ocp-product-title} web console.
9+
The following alerts are generated by the Vector collector. You can view these alerts in the {ocp-product-title} web console.
1010

1111
.Vector collector alerts
1212
[cols="2,2,2,1",options="header"]
1313
|===
1414
|Alert |Message |Description |Severity
1515

16-
|`CollectorHighErrorRate`
17-
|`<value> of records have resulted in an error by vector <instance>.`
18-
|The number of vector output errors is high, by default more than 10 in the previous 15 minutes.
19-
|Warning
20-
2116
|`CollectorNodeDown`
2217
|`Prometheus could not scrape vector <instance> for more than 10m.`
2318
|Vector is reporting that Prometheus could not scrape a specific Vector instance.
2419
|Critical
2520

26-
|`CollectorVeryHighErrorRate`
27-
|`<value> of records have resulted in an error by vector <instance>.`
28-
|The number of Vector component errors are very high, by default more than 25 in the previous 15 minutes.
29-
|Critical
30-
31-
|`FluentdQueueLengthIncreasing`
32-
|`In the last 1h, fluentd <instance> buffer queue length constantly increased more than 1. Current value is <value>.`
33-
|Fluentd is reporting that the queue size is increasing.
21+
|`DiskBufferUsage`
22+
|`Collectors potentially consuming too much node disk, <value>`
23+
|Collectors are consuming too much node disk on the host.
3424
|Warning
3525

26+
|`CollectorHigh403ForbiddenResponseRate`
27+
|`High rate of "HTTP 403 Forbidden" responses detected for collector <instance> in namespace <namespace> for output <label>. The rate of 403 responses is <rate> over the last 2 minutes, persisting for more than 5 minutes. This could indicate an authorization issue.`
28+
|At least 10% of sent requests responded with "HTTP 403 Forbidden" for collector "<intance>" in namespace <namespace> for the output "<output>".
29+
|Critical
3630
|===

modules/loki-alerts.adoc

Lines changed: 71 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,71 @@
1+
// Module included in the following assemblies:
2+
//
3+
// * logging_alerts/default-logging-alerts.adoc
4+
5+
:_mod-docs-content-type: REFERENCE
6+
[id="loki-alerts_{context}"]
7+
= {loki-op} alerts
8+
9+
The following alerts are generated by the {loki-op}. You can view these alerts in the {ocp-product-title} web console.
10+
11+
.{loki-op} alerts
12+
[cols="2,2,2,1",options="header"]
13+
|===
14+
|Alert |Message |Description |Severity
15+
16+
|`LokiRequestErrors`
17+
|`{{ $labels.job }} {{ $labels.route }} is experiencing <value>% errors.`
18+
|At least 10% of requests result in `5xx` server errors.
19+
|critical
20+
21+
|`LokiStackWriteRequestErrors`
22+
|`<value>% of write requests from {{ $labels.job }} in <namespace> are returned with server errors.`
23+
|At least 10% of write requests to the lokistack-gateway result in `5xx` server errors.
24+
|critical
25+
26+
|`LokiStackReadRequestErrors`
27+
|`<value>% of query requests from {{ $labels.job }} in <namespace> are returned with server errors.`
28+
|At least 10% of query requests to the lokistack-gateway result in `5xx` server errors.
29+
|critical
30+
31+
|`LokiRequestPanics`
32+
|`{{ $labels.job }} is experiencing an increase of <value> panics.`
33+
|A panic was triggered.
34+
|critical
35+
36+
|`LokiRequestLatency`
37+
|`{{ $labels.job }} {{ $labels.route }} is experiencing <value>s 99th percentile latency.`
38+
|The 99th percentile is experiencing latency higher than 1 second.
39+
|critical
40+
41+
|`LokiTenantRateLimit`
42+
|`{{ $labels.job }} {{ $labels.route }} is experiencing 429 errors.`
43+
|At least 10% of requests are received the rate limit error code.
44+
|warning
45+
46+
|`LokiStorageSlowWrite`
47+
|`The storage path is experiencing slow write response rates.`
48+
|The storage path is experiencing slow read response rates.
49+
|warning
50+
51+
|`LokiWritePathHighLoad`
52+
|`The write path is experiencing high load.``
53+
|The write path is experiencing high load causing backpressure storage flushing.
54+
|warning
55+
56+
|`LokiReadPathHighLoad`
57+
|`The read path is experiencing high load.`
58+
|The read path has a high volume of queries, causing longer response times.
59+
|warning
60+
61+
|`LokiDiscardedSamplesWarning`
62+
|`Loki in namespace "<namespace>" is discarding samples in the "<tenant>" tenant during ingestion. Samples are discarded because of "<reason>" at a rate of <value> samples per second.`
63+
|Loki is discarding samples during ingestion because they fail validation.
64+
|warning
65+
66+
|`LokistackSchemaUpgradesRequired`
67+
|`The LokiStack "{{ $labels.stack_name }}" in namespace "<namespace>" is using a storage schema
68+
configuration that does not contain the latest schema version. It is recommended to update the schema configuration to update the schema version to the latest`
69+
|One or more of the deployed LokiStacks contains an outdated storage schema configuration.
70+
|warning
71+
|===

modules/loki-rbac-rules-permissions.adoc

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
1-
// Module is included in the following assemblies:
1+
// Module included in the following assemblies:
22
//
3-
// * configuring/configuring-the-log-store.adoc
3+
// * logging_alerts/custom-logging-alerts.adoc
44

55
:_mod-docs-content-type: REFERENCE
66
[id="loki-rbac-rules-permissions_{context}"]

0 commit comments

Comments
 (0)