Skip to content

OADP-5967-prometheus support #95469

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

shdeshpa07
Copy link
Contributor

@shdeshpa07 shdeshpa07 commented Jul 1, 2025

Jira

Add note about support for Promethues metric and remove unsupported velero metrics

Version

  • OCP 4.13 → OCP 4.19

Preview

QE Review

  • QE has approved this change.

@shdeshpa07
Copy link
Contributor Author

/label OADP

@openshift-ci openshift-ci bot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. OADP Label for all OADP PRs labels Jul 1, 2025
@ocpdocs-previewbot
Copy link

ocpdocs-previewbot commented Jul 1, 2025

@shdeshpa07
Copy link
Contributor Author

@mpryc @stillalearner - Can I please ask for your review for this PR? Thanks.

@mpryc
Copy link

mpryc commented Jul 3, 2025

@shdeshpa07 Would it make sense to remove the part starting from:

I've reviewed again the metrics and to be very precise on what's available I would like some modifications to be made:

  1. The pod volume backup metrics - can we remove them from the current official doc? In the testing I was unable to really see those in the user monitoring, so I would not add them at this point of time.

  2. The metric:

velero_backup_duration_seconds	Time taken to complete backup, in seconds	Histogram

The above metric is type of histogram, which in the OpenShift Monitoring is really giving us a collection of 3 separate metrics:

velero_backup_duration_seconds_bucket	The total count of observations for a bucket in the histogram: Time taken to complete backup, in seconds	Counter	
velero_backup_duration_seconds_count	The total count of observations for: Time taken to complete backup, in seconds	Counter	
velero_backup_duration_seconds_sum	The total sum of observations for: Time taken to complete backup, in seconds	Counter	
  1. Some comments around the current doc:
Procedure
Edit the cluster-monitoring-config ConfigMap object in the openshift-monitoring namespace:

End ending with

Apply the 2_configure_user_workload_monitoring.yaml file:


$ oc apply -f 2_configure_user_workload_monitoring.yaml
configmap/user-workload-monitoring-config created

Would it make sense to remove some parts in favor of simply pointing to different chapted of the same OpenShift docs? Possibly having them here gives a full end-to-end set of instructions ?

https://95469--ocpdocs-pr.netlify.app/openshift-enterprise/latest/observability/monitoring/configuring-user-workload-monitoring/preparing-to-configure-the-monitoring-stack-uwm#enabling-monitoring-for-user-defined-projects_preparing-to-configure-the-monitoring-stack-uwm

  1. The following part is happening after some time (even few minutes), so wondering if it should be written that it's not immediate ?
Verification
Confirm that the new service monitor is in an Up state by using the Administrator perspective of the OpenShift Container Platform web console:

Signed-off-by: Shruti Deshpande <shdeshpa@redhat.com>
Signed-off-by: Shruti Deshpande <shdeshpa@redhat.com>
Signed-off-by: Shruti Deshpande <shdeshpa@redhat.com>
@shdeshpa07 shdeshpa07 force-pushed the OADP-5967-prometheus-support branch from ef70f28 to 55ce7bd Compare July 4, 2025 05:57
Signed-off-by: Shruti Deshpande <shdeshpa@redhat.com>
@shdeshpa07
Copy link
Contributor Author

@shdeshpa07 Would it make sense to remove the part starting from:

I've reviewed again the metrics and to be very precise on what's available I would like some modifications to be made:

  1. The pod volume backup metrics - can we remove them from the current official doc? In the testing I was unable to really see those in the user monitoring, so I would not add them at this point of time.
  2. The metric:
velero_backup_duration_seconds	Time taken to complete backup, in seconds	Histogram

The above metric is type of histogram, which in the OpenShift Monitoring is really giving us a collection of 3 separate metrics:

velero_backup_duration_seconds_bucket	The total count of observations for a bucket in the histogram: Time taken to complete backup, in seconds	Counter	
velero_backup_duration_seconds_count	The total count of observations for: Time taken to complete backup, in seconds	Counter	
velero_backup_duration_seconds_sum	The total sum of observations for: Time taken to complete backup, in seconds	Counter	
  1. Some comments around the current doc:
Procedure
Edit the cluster-monitoring-config ConfigMap object in the openshift-monitoring namespace:

End ending with

Apply the 2_configure_user_workload_monitoring.yaml file:


$ oc apply -f 2_configure_user_workload_monitoring.yaml
configmap/user-workload-monitoring-config created

Would it make sense to remove some parts in favor of simply pointing to different chapted of the same OpenShift docs? Possibly having them here gives a full end-to-end set of instructions ?

https://95469--ocpdocs-pr.netlify.app/openshift-enterprise/latest/observability/monitoring/configuring-user-workload-monitoring/preparing-to-configure-the-monitoring-stack-uwm#enabling-monitoring-for-user-defined-projects_preparing-to-configure-the-monitoring-stack-uwm

  1. The following part is happening after some time (even few minutes), so wondering if it should be written that it's not immediate ?
Verification
Confirm that the new service monitor is in an Up state by using the Administrator perspective of the OpenShift Container Platform web console:

@mpryc - Thank you for the review.

  • I have made the changes in the metrics list as you have mentioned.
  • I have also linked the OCP Preparing to configure the user workload monitoring stack document at the start of the procedure, just above the Prerequisites section. Hope that works well. Please let me know if you would still like any more changes done.
  • Added a line in the verification section that the service monitor takes a few minutes to be Up.

Could you please review again to see if the changes look good? Thanks.

Copy link

openshift-ci bot commented Jul 4, 2025

@shdeshpa07: all tests passed!

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@mpryc
Copy link

mpryc commented Jul 4, 2025

@shdeshpa07 everything looks perfect except one nit, the following link is a plain text not a hyperlink in the generated docs:

For more information about setting up the monitoring stack, see link:https://docs.redhat.com/en/documentation/openshift_container_platform/Branch Build/html/monitoring/configuring-user-workload-monitoring[Configuring user workload monitoring].

@shdeshpa07
Copy link
Contributor Author

@shdeshpa07 everything looks perfect except one nit, the following link is a plain text not a hyperlink in the generated docs:

For more information about setting up the monitoring stack, see link:https://docs.redhat.com/en/documentation/openshift_container_platform/Branch Build/html/monitoring/configuring-user-workload-monitoring[Configuring user workload monitoring].

Thank you @mpryc . The link has a {ocp_latest_version} attribute in it and hence it will resolve only in the production depolyment. You will be able to see the resolved link in the production docs. Thanks.

@shdeshpa07
Copy link
Contributor Author

/label peer-review-needed

@openshift-ci openshift-ci bot added the peer-review-needed Signifies that the peer review team needs to review this PR label Jul 7, 2025
@mpryc
Copy link

mpryc commented Jul 7, 2025

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Jul 7, 2025
@dfitzmau
Copy link
Contributor

dfitzmau commented Jul 7, 2025

Hi @shdeshpa07 . Please squash the commits from 4 down to 1.

@dfitzmau dfitzmau added peer-review-in-progress Signifies that the peer review team is reviewing this PR and removed peer-review-needed Signifies that the peer review team needs to review this PR labels Jul 7, 2025
@@ -64,7 +64,7 @@ servicemonitor.monitoring.coreos.com/oadp-service-monitor created

.Verification

* Confirm that the new service monitor is in an *Up* state by using the *Administrator* perspective of the {product-title} web console:
* Confirm that the new service monitor is in an *Up* state by using the *Administrator* perspective of the {product-title} web console. It takes a few minutes for the service monitor to be in the *Up* state.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Best to define "it" as it's a vague referent. How about:

Wait a few minutes for the service monitor to reach the Up state.

| `velero_restore_success_total` | Total number of successful restores | Counter
| `velero_restore_partial_failure_total` | Total number of partially failed restores | Counter
| `velero_restore_failed_total` | Total number of failed restores | Counter
| `velero_volume_snapshot_attempt_total``| Total number of attempted volume snapshots | Counter
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

volume_snapshot_attempt_total` has an extra backtick.

@dfitzmau dfitzmau added peer-review-done Signifies that the peer review team has reviewed this PR and removed peer-review-in-progress Signifies that the peer review team is reviewing this PR labels Jul 7, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
lgtm Indicates that a PR is ready to be merged. OADP Label for all OADP PRs peer-review-done Signifies that the peer review team has reviewed this PR size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants