|
| 1 | +// Module included in the following assemblies: |
| 2 | +// |
| 3 | +// * monitoring/troubleshooting-monitoring-issues.adoc |
| 4 | +// * support/troubleshooting/investigating-monitoring-issues.adoc |
| 5 | + |
| 6 | +:_mod-docs-content-type: PROCEDURE |
| 7 | +[id="resolving-the-kubepersistentvolumefillingup-alert-firing-for-prometheus_{context}"] |
| 8 | += Resolving the KubePersistentVolumeFillingUp alert firing for Prometheus |
| 9 | + |
| 10 | +As a cluster administrator, you can resolve the `KubePersistentVolumeFillingUp` alert being triggered for Prometheus. |
| 11 | + |
| 12 | +The critical alert fires when a persistent volume (PV) claimed by a `prometheus-k8s-*` pod in the `openshift-monitoring` project has less than 3% total space remaining. This can cause Prometheus to function abnormally. |
| 13 | + |
| 14 | +[NOTE] |
| 15 | +==== |
| 16 | +There are two `KubePersistentVolumeFillingUp` alerts: |
| 17 | +
|
| 18 | +* *Critical alert*: The alert with the `severity="critical"` label is triggered when the mounted PV has less than 3% total space remaining. |
| 19 | +* *Warning alert*: The alert with the `severity="warning"` label is triggered when the mounted PV has less than 15% total space remaining and is expected to fill up within four days. |
| 20 | +==== |
| 21 | + |
| 22 | +To address this issue, you can remove Prometheus time-series database (TSDB) blocks to create more space for the PV. |
| 23 | + |
| 24 | +.Prerequisites |
| 25 | + |
| 26 | +ifndef::openshift-dedicated,openshift-rosa[] |
| 27 | +* You have access to the cluster as a user with the `cluster-admin` cluster role. |
| 28 | +endif::openshift-dedicated,openshift-rosa[] |
| 29 | +ifdef::openshift-dedicated,openshift-rosa[] |
| 30 | +* You have access to the cluster as a user with the `dedicated-admin` role. |
| 31 | +endif::openshift-dedicated,openshift-rosa[] |
| 32 | +* You have installed the OpenShift CLI (`oc`). |
| 33 | + |
| 34 | +.Procedure |
| 35 | + |
| 36 | +. List the size of all TSDB blocks, sorted from oldest to newest, by running the following command: |
| 37 | ++ |
| 38 | +[source,terminal] |
| 39 | +---- |
| 40 | +$ oc debug <prometheus_k8s_pod_name> -n openshift-monitoring \// <1> |
| 41 | +-c prometheus --image=$(oc get po -n openshift-monitoring <prometheus_k8s_pod_name> \// <1> |
| 42 | +-o jsonpath='{.spec.containers[?(@.name=="prometheus")].image}') \ |
| 43 | +-- sh -c 'cd /prometheus/;du -hs $(ls -dt */ | grep -Eo "[0-9|A-Z]{26}")' |
| 44 | +---- |
| 45 | +<1> Replace `<prometheus_k8s_pod_name>` with the pod mentioned in the `KubePersistentVolumeFillingUp` alert description. |
| 46 | ++ |
| 47 | +.Example output |
| 48 | +[source,terminal] |
| 49 | +---- |
| 50 | +308M 01HVKMPKQWZYWS8WVDAYQHNMW6 |
| 51 | +52M 01HVK64DTDA81799TBR9QDECEZ |
| 52 | +102M 01HVK64DS7TRZRWF2756KHST5X |
| 53 | +140M 01HVJS59K11FBVAPVY57K88Z11 |
| 54 | +90M 01HVH2A5Z58SKT810EM6B9AT50 |
| 55 | +152M 01HV8ZDVQMX41MKCN84S32RRZ1 |
| 56 | +354M 01HV6Q2N26BK63G4RYTST71FBF |
| 57 | +156M 01HV664H9J9Z1FTZD73RD1563E |
| 58 | +216M 01HTHXB60A7F239HN7S2TENPNS |
| 59 | +104M 01HTHMGRXGS0WXA3WATRXHR36B |
| 60 | +---- |
| 61 | + |
| 62 | +. Identify which and how many blocks could be removed, then remove the blocks. The following example command removes the three oldest Prometheus TSDB blocks from the `prometheus-k8s-0` pod: |
| 63 | ++ |
| 64 | +[source,terminal] |
| 65 | +---- |
| 66 | +$ oc debug prometheus-k8s-0 -n openshift-monitoring \ |
| 67 | +-c prometheus --image=$(oc get po -n openshift-monitoring prometheus-k8s-0 \ |
| 68 | +-o jsonpath='{.spec.containers[?(@.name=="prometheus")].image}') \ |
| 69 | +-- sh -c 'ls -latr /prometheus/ | egrep -o "[0-9|A-Z]{26}" | head -3 | \ |
| 70 | +while read BLOCK; do rm -r /prometheus/$BLOCK; done' |
| 71 | +---- |
| 72 | + |
| 73 | +. Verify the usage of the mounted PV and ensure there is enough space available by running the following command: |
| 74 | ++ |
| 75 | +[source,terminal] |
| 76 | +---- |
| 77 | +$ oc debug <prometheus_k8s_pod_name> -n openshift-monitoring \// <1> |
| 78 | +--image=$(oc get po -n openshift-monitoring <prometheus_k8s_pod_name> \// <1> |
| 79 | +-o jsonpath='{.spec.containers[?(@.name=="prometheus")].image}') -- df -h /prometheus/ |
| 80 | +---- |
| 81 | +<1> Replace `<prometheus_k8s_pod_name>` with the pod mentioned in the `KubePersistentVolumeFillingUp` alert description. |
| 82 | ++ |
| 83 | +The following example output shows the mounted PV claimed by the `prometheus-k8s-0` pod that has 63% of space remaining: |
| 84 | ++ |
| 85 | +.Example output |
| 86 | +[source,terminal] |
| 87 | +---- |
| 88 | +Starting pod/prometheus-k8s-0-debug-j82w4 ... |
| 89 | +Filesystem Size Used Avail Use% Mounted on |
| 90 | +/dev/nvme0n1p4 40G 15G 40G 37% /prometheus |
| 91 | +
|
| 92 | +Removing debug pod ... |
| 93 | +---- |
0 commit comments