OADP-5967-prometheus support #95469

shdeshpa07 · 2025-07-01T08:38:11Z

Jira

OADP-5967

Add note about support for Promethues metric and remove unsupported velero metrics

Version

OCP 4.13 → OCP 4.19

Preview

QE Review

QE has approved this change.

shdeshpa07 · 2025-07-01T08:38:28Z

/label OADP

ocpdocs-previewbot · 2025-07-01T08:43:58Z

🤖 Fri Jul 04 09:46:50 - Prow CI generated the docs preview:

https://95469--ocpdocs-pr.netlify.app/openshift-enterprise/latest/backup_and_restore/application_backup_and_restore/troubleshooting/oadp-monitoring.html

shdeshpa07 · 2025-07-02T10:48:14Z

@mpryc @stillalearner - Can I please ask for your review for this PR? Thanks.

modules/oadp-list-of-metrics.adoc

mpryc · 2025-07-03T15:23:53Z

@shdeshpa07 Would it make sense to remove the part starting from:

I've reviewed again the metrics and to be very precise on what's available I would like some modifications to be made:

The pod volume backup metrics - can we remove them from the current official doc? In the testing I was unable to really see those in the user monitoring, so I would not add them at this point of time.
The metric:

velero_backup_duration_seconds	Time taken to complete backup, in seconds	Histogram

The above metric is type of histogram, which in the OpenShift Monitoring is really giving us a collection of 3 separate metrics:

velero_backup_duration_seconds_bucket	The total count of observations for a bucket in the histogram: Time taken to complete backup, in seconds	Counter	
velero_backup_duration_seconds_count	The total count of observations for: Time taken to complete backup, in seconds	Counter	
velero_backup_duration_seconds_sum	The total sum of observations for: Time taken to complete backup, in seconds	Counter

Some comments around the current doc:

Procedure
Edit the cluster-monitoring-config ConfigMap object in the openshift-monitoring namespace:

End ending with

Apply the 2_configure_user_workload_monitoring.yaml file:


$ oc apply -f 2_configure_user_workload_monitoring.yaml
configmap/user-workload-monitoring-config created

Would it make sense to remove some parts in favor of simply pointing to different chapted of the same OpenShift docs? Possibly having them here gives a full end-to-end set of instructions ?

https://95469--ocpdocs-pr.netlify.app/openshift-enterprise/latest/observability/monitoring/configuring-user-workload-monitoring/preparing-to-configure-the-monitoring-stack-uwm#enabling-monitoring-for-user-defined-projects_preparing-to-configure-the-monitoring-stack-uwm

The following part is happening after some time (even few minutes), so wondering if it should be written that it's not immediate ?

Verification
Confirm that the new service monitor is in an Up state by using the Administrator perspective of the OpenShift Container Platform web console:

Signed-off-by: Shruti Deshpande <shdeshpa@redhat.com>

shdeshpa07 · 2025-07-04T09:35:31Z

@shdeshpa07 Would it make sense to remove the part starting from:

I've reviewed again the metrics and to be very precise on what's available I would like some modifications to be made:

The pod volume backup metrics - can we remove them from the current official doc? In the testing I was unable to really see those in the user monitoring, so I would not add them at this point of time.

The metric:
velero_backup_duration_seconds	Time taken to complete backup, in seconds	Histogram
The above metric is type of histogram, which in the OpenShift Monitoring is really giving us a collection of 3 separate metrics:
velero_backup_duration_seconds_bucket	The total count of observations for a bucket in the histogram: Time taken to complete backup, in seconds	Counter	
velero_backup_duration_seconds_count	The total count of observations for: Time taken to complete backup, in seconds	Counter	
velero_backup_duration_seconds_sum	The total sum of observations for: Time taken to complete backup, in seconds	Counter	
Some comments around the current doc:
Procedure
Edit the cluster-monitoring-config ConfigMap object in the openshift-monitoring namespace:
End ending with
Apply the 2_configure_user_workload_monitoring.yaml file:


$ oc apply -f 2_configure_user_workload_monitoring.yaml
configmap/user-workload-monitoring-config created
Would it make sense to remove some parts in favor of simply pointing to different chapted of the same OpenShift docs? Possibly having them here gives a full end-to-end set of instructions ?

https://95469--ocpdocs-pr.netlify.app/openshift-enterprise/latest/observability/monitoring/configuring-user-workload-monitoring/preparing-to-configure-the-monitoring-stack-uwm#enabling-monitoring-for-user-defined-projects_preparing-to-configure-the-monitoring-stack-uwm

The following part is happening after some time (even few minutes), so wondering if it should be written that it's not immediate ?
Verification
Confirm that the new service monitor is in an Up state by using the Administrator perspective of the OpenShift Container Platform web console:

@mpryc - Thank you for the review.

I have made the changes in the metrics list as you have mentioned.
I have also linked the OCP Preparing to configure the user workload monitoring stack document at the start of the procedure, just above the Prerequisites section. Hope that works well. Please let me know if you would still like any more changes done.
Added a line in the verification section that the service monitor takes a few minutes to be Up.

Could you please review again to see if the changes look good? Thanks.

openshift-ci · 2025-07-04T09:47:28Z

@shdeshpa07: all tests passed!

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

mpryc · 2025-07-04T17:31:19Z

@shdeshpa07 everything looks perfect except one nit, the following link is a plain text not a hyperlink in the generated docs:

For more information about setting up the monitoring stack, see link:https://docs.redhat.com/en/documentation/openshift_container_platform/Branch Build/html/monitoring/configuring-user-workload-monitoring[Configuring user workload monitoring].

shdeshpa07 · 2025-07-07T03:50:36Z

@shdeshpa07 everything looks perfect except one nit, the following link is a plain text not a hyperlink in the generated docs:
For more information about setting up the monitoring stack, see link:https://docs.redhat.com/en/documentation/openshift_container_platform/Branch Build/html/monitoring/configuring-user-workload-monitoring[Configuring user workload monitoring].

Thank you @mpryc . The link has a {ocp_latest_version} attribute in it and hence it will resolve only in the production depolyment. You will be able to see the resolved link in the production docs. Thanks.

shdeshpa07 · 2025-07-07T03:58:00Z

/label peer-review-needed

mpryc · 2025-07-07T07:37:37Z

/lgtm

dfitzmau · 2025-07-07T11:02:25Z

Hi @shdeshpa07 . Please squash the commits from 4 down to 1.

dfitzmau · 2025-07-07T11:04:07Z

modules/oadp-creating-service-monitor.adoc

@@ -64,7 +64,7 @@ servicemonitor.monitoring.coreos.com/oadp-service-monitor created

 .Verification

-* Confirm that the new service monitor is in an *Up* state by using the *Administrator* perspective of the {product-title} web console:
+* Confirm that the new service monitor is in an *Up* state by using the *Administrator* perspective of the {product-title} web console. It takes a few minutes for the service monitor to be in the *Up* state.


Best to define "it" as it's a vague referent. How about:

Wait a few minutes for the service monitor to reach the Up state.

dfitzmau · 2025-07-07T11:05:03Z

modules/oadp-list-of-metrics.adoc

+| `velero_restore_success_total` | Total number of successful restores | Counter
+| `velero_restore_partial_failure_total` | Total number of partially failed restores | Counter
+| `velero_restore_failed_total` | Total number of failed restores | Counter
+| `velero_volume_snapshot_attempt_total``| Total number of attempted volume snapshots | Counter


volume_snapshot_attempt_total` has an extra backtick.

modules/oadp-monitoring-setup.adoc

openshift-ci bot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. OADP Label for all OADP PRs labels Jul 1, 2025

stillalearner reviewed Jul 3, 2025

View reviewed changes

modules/oadp-list-of-metrics.adoc Outdated Show resolved Hide resolved

shdeshpa07 added 3 commits July 4, 2025 11:25

merge conflict

25bcb7a

Signed-off-by: Shruti Deshpande <shdeshpa@redhat.com>

changes in metrics list

0f94ccf

Signed-off-by: Shruti Deshpande <shdeshpa@redhat.com>

attribute error

55ce7bd

Signed-off-by: Shruti Deshpande <shdeshpa@redhat.com>

shdeshpa07 force-pushed the OADP-5967-prometheus-support branch from ef70f28 to 55ce7bd Compare July 4, 2025 05:57

dev review

60e2539

Signed-off-by: Shruti Deshpande <shdeshpa@redhat.com>

openshift-ci bot added the peer-review-needed Signifies that the peer review team needs to review this PR label Jul 7, 2025

mpryc approved these changes Jul 7, 2025

View reviewed changes

openshift-ci bot assigned mpryc Jul 7, 2025

openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Jul 7, 2025

dfitzmau added peer-review-in-progress Signifies that the peer review team is reviewing this PR and removed peer-review-needed Signifies that the peer review team needs to review this PR labels Jul 7, 2025

dfitzmau approved these changes Jul 7, 2025

View reviewed changes

dfitzmau added peer-review-done Signifies that the peer review team has reviewed this PR and removed peer-review-in-progress Signifies that the peer review team is reviewing this PR labels Jul 7, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

OADP-5967-prometheus support #95469

OADP-5967-prometheus support #95469

Uh oh!

shdeshpa07 commented Jul 1, 2025 •

edited

Loading

Uh oh!

shdeshpa07 commented Jul 1, 2025

Uh oh!

ocpdocs-previewbot commented Jul 1, 2025 •

edited

Loading

Uh oh!

shdeshpa07 commented Jul 2, 2025

Uh oh!

Uh oh!

mpryc commented Jul 3, 2025

Uh oh!

shdeshpa07 commented Jul 4, 2025

Uh oh!

openshift-ci bot commented Jul 4, 2025

Uh oh!

mpryc commented Jul 4, 2025

Uh oh!

shdeshpa07 commented Jul 7, 2025

Uh oh!

shdeshpa07 commented Jul 7, 2025

Uh oh!

mpryc commented Jul 7, 2025

Uh oh!

dfitzmau commented Jul 7, 2025

Uh oh!

dfitzmau Jul 7, 2025

Uh oh!

dfitzmau Jul 7, 2025

Uh oh!

Uh oh!

Uh oh!

OADP-5967-prometheus support #95469

Are you sure you want to change the base?

OADP-5967-prometheus support #95469

Uh oh!

Conversation

shdeshpa07 commented Jul 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Jira

Version

Preview

QE Review

Uh oh!

shdeshpa07 commented Jul 1, 2025

Uh oh!

ocpdocs-previewbot commented Jul 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

shdeshpa07 commented Jul 2, 2025

Uh oh!

Uh oh!

mpryc commented Jul 3, 2025

Uh oh!

shdeshpa07 commented Jul 4, 2025

Uh oh!

openshift-ci bot commented Jul 4, 2025

Uh oh!

mpryc commented Jul 4, 2025

Uh oh!

shdeshpa07 commented Jul 7, 2025

Uh oh!

shdeshpa07 commented Jul 7, 2025

Uh oh!

mpryc commented Jul 7, 2025

Uh oh!

dfitzmau commented Jul 7, 2025

Uh oh!

dfitzmau Jul 7, 2025

Choose a reason for hiding this comment

Uh oh!

dfitzmau Jul 7, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

shdeshpa07 commented Jul 1, 2025 •

edited

Loading

ocpdocs-previewbot commented Jul 1, 2025 •

edited

Loading