Skip to content

Commit 256cf5b

Browse files
authored
Merge pull request #73686 from eromanova97/OBSDOCS-920
OBSDOCS-920: Add troubleshooting steps for KubePersistentVolumeFillin…
2 parents cdd0e66 + 4ea6284 commit 256cf5b

File tree

3 files changed

+104
-1
lines changed

3 files changed

+104
-1
lines changed
Lines changed: 93 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,93 @@
1+
// Module included in the following assemblies:
2+
//
3+
// * monitoring/troubleshooting-monitoring-issues.adoc
4+
// * support/troubleshooting/investigating-monitoring-issues.adoc
5+
6+
:_mod-docs-content-type: PROCEDURE
7+
[id="resolving-the-kubepersistentvolumefillingup-alert-firing-for-prometheus_{context}"]
8+
= Resolving the KubePersistentVolumeFillingUp alert firing for Prometheus
9+
10+
As a cluster administrator, you can resolve the `KubePersistentVolumeFillingUp` alert being triggered for Prometheus.
11+
12+
The critical alert fires when a persistent volume (PV) claimed by a `prometheus-k8s-*` pod in the `openshift-monitoring` project has less than 3% total space remaining. This can cause Prometheus to function abnormally.
13+
14+
[NOTE]
15+
====
16+
There are two `KubePersistentVolumeFillingUp` alerts:
17+
18+
* *Critical alert*: The alert with the `severity="critical"` label is triggered when the mounted PV has less than 3% total space remaining.
19+
* *Warning alert*: The alert with the `severity="warning"` label is triggered when the mounted PV has less than 15% total space remaining and is expected to fill up within four days.
20+
====
21+
22+
To address this issue, you can remove Prometheus time-series database (TSDB) blocks to create more space for the PV.
23+
24+
.Prerequisites
25+
26+
ifndef::openshift-dedicated,openshift-rosa[]
27+
* You have access to the cluster as a user with the `cluster-admin` cluster role.
28+
endif::openshift-dedicated,openshift-rosa[]
29+
ifdef::openshift-dedicated,openshift-rosa[]
30+
* You have access to the cluster as a user with the `dedicated-admin` role.
31+
endif::openshift-dedicated,openshift-rosa[]
32+
* You have installed the OpenShift CLI (`oc`).
33+
34+
.Procedure
35+
36+
. List the size of all TSDB blocks, sorted from oldest to newest, by running the following command:
37+
+
38+
[source,terminal]
39+
----
40+
$ oc debug <prometheus_k8s_pod_name> -n openshift-monitoring \// <1>
41+
-c prometheus --image=$(oc get po -n openshift-monitoring <prometheus_k8s_pod_name> \// <1>
42+
-o jsonpath='{.spec.containers[?(@.name=="prometheus")].image}') \
43+
-- sh -c 'cd /prometheus/;du -hs $(ls -dt */ | grep -Eo "[0-9|A-Z]{26}")'
44+
----
45+
<1> Replace `<prometheus_k8s_pod_name>` with the pod mentioned in the `KubePersistentVolumeFillingUp` alert description.
46+
+
47+
.Example output
48+
[source,terminal]
49+
----
50+
308M 01HVKMPKQWZYWS8WVDAYQHNMW6
51+
52M 01HVK64DTDA81799TBR9QDECEZ
52+
102M 01HVK64DS7TRZRWF2756KHST5X
53+
140M 01HVJS59K11FBVAPVY57K88Z11
54+
90M 01HVH2A5Z58SKT810EM6B9AT50
55+
152M 01HV8ZDVQMX41MKCN84S32RRZ1
56+
354M 01HV6Q2N26BK63G4RYTST71FBF
57+
156M 01HV664H9J9Z1FTZD73RD1563E
58+
216M 01HTHXB60A7F239HN7S2TENPNS
59+
104M 01HTHMGRXGS0WXA3WATRXHR36B
60+
----
61+
62+
. Identify which and how many blocks could be removed, then remove the blocks. The following example command removes the three oldest Prometheus TSDB blocks from the `prometheus-k8s-0` pod:
63+
+
64+
[source,terminal]
65+
----
66+
$ oc debug prometheus-k8s-0 -n openshift-monitoring \
67+
-c prometheus --image=$(oc get po -n openshift-monitoring prometheus-k8s-0 \
68+
-o jsonpath='{.spec.containers[?(@.name=="prometheus")].image}') \
69+
-- sh -c 'ls -latr /prometheus/ | egrep -o "[0-9|A-Z]{26}" | head -3 | \
70+
while read BLOCK; do rm -r /prometheus/$BLOCK; done'
71+
----
72+
73+
. Verify the usage of the mounted PV and ensure there is enough space available by running the following command:
74+
+
75+
[source,terminal]
76+
----
77+
$ oc debug <prometheus_k8s_pod_name> -n openshift-monitoring \// <1>
78+
--image=$(oc get po -n openshift-monitoring <prometheus_k8s_pod_name> \// <1>
79+
-o jsonpath='{.spec.containers[?(@.name=="prometheus")].image}') -- df -h /prometheus/
80+
----
81+
<1> Replace `<prometheus_k8s_pod_name>` with the pod mentioned in the `KubePersistentVolumeFillingUp` alert description.
82+
+
83+
The following example output shows the mounted PV claimed by the `prometheus-k8s-0` pod that has 63% of space remaining:
84+
+
85+
.Example output
86+
[source,terminal]
87+
----
88+
Starting pod/prometheus-k8s-0-debug-j82w4 ...
89+
Filesystem Size Used Avail Use% Mounted on
90+
/dev/nvme0n1p4 40G 15G 40G 37% /prometheus
91+
92+
Removing debug pod ...
93+
----

observability/monitoring/troubleshooting-monitoring-issues.adoc

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -39,3 +39,6 @@ include::modules/monitoring-determining-why-prometheus-is-consuming-disk-space.a
3939
* xref:../../observability/monitoring/accessing-third-party-monitoring-apis.adoc#about-accessing-monitoring-web-service-apis_accessing-monitoring-apis-by-using-the-cli[Accessing monitoring APIs by using the CLI]
4040
* xref:../../observability/monitoring/configuring-the-monitoring-stack.adoc#setting-scrape-sample-and-label-limits-for-user-defined-projects_configuring-the-monitoring-stack[Setting a scrape sample limit for user-defined projects]
4141
* xref:../../support/getting-support.adoc#support-submitting-a-case_getting-support[Submitting a support case]
42+
43+
// Resolving the KubePersistentVolumeFillingUp alert firing for Prometheus
44+
include::modules/monitoring-resolving-the-kubepersistentvolumefillingup-alert-firing-for-prometheus.adoc[leveloffset=+1]

support/troubleshooting/investigating-monitoring-issues.adoc

Lines changed: 8 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,11 @@ toc::[]
99
{product-title} includes a preconfigured, preinstalled, and self-updating monitoring stack that provides monitoring for core platform components. In {product-title} {product-version}, cluster administrators can optionally enable monitoring for user-defined projects.
1010

1111
// Note - please update the following sentence if you add further modules to this assembly.
12-
You can follow these procedures if your own metrics are unavailable or if Prometheus is consuming a lot of disk space.
12+
Use these procedures if the following issues occur:
13+
14+
* Your own metrics are unavailable.
15+
* Prometheus is consuming a lot of disk space.
16+
* The `KubePersistentVolumeFillingUp` alert is firing for Prometheus.
1317
1418
// Investigating why user-defined metrics are unavailable
1519
include::modules/monitoring-investigating-why-user-defined-metrics-are-unavailable.adoc[leveloffset=+1]
@@ -28,3 +32,6 @@ include::modules/monitoring-determining-why-prometheus-is-consuming-disk-space.a
2832
.Additional resources
2933

3034
* See xref:../../observability/monitoring/configuring-the-monitoring-stack.adoc#setting-scrape-sample-and-label-limits-for-user-defined-projects_configuring-the-monitoring-stack[Setting a scrape sample limit for user-defined projects] for details on how to set a scrape sample limit and create related alerting rules
35+
36+
// Resolving the KubePersistentVolumeFillingUp alert firing for Prometheus
37+
include::modules/monitoring-resolving-the-kubepersistentvolumefillingup-alert-firing-for-prometheus.adoc[leveloffset=+1]

0 commit comments

Comments
 (0)