Skip to content

SOLR-17628: Add query quantiles metrics to prometheus endpoint #3164

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 14 commits into from
Jul 7, 2025

Conversation

jkmuriithi
Copy link
Contributor

@jkmuriithi jkmuriithi commented Feb 6, 2025

https://issues.apache.org/jira/browse/SOLR-17628

Description

Modify the implementation of SolrPrometheusFormatter.exportTimer to export a Prometheus summary containing quantile information instead of a single Prometheus gauge. Rename the Timer-based metrics solr_metrics_core_average_request_time and solr_metrics_core_average_searcher_warmup_time to reflect this change. Remove the solr_metrics_core_requests_time Counter metric.

Solution

Prior to this change, Dropwizard Timer metrics (used for core request handlers and searchers) were exported in Prometheus format as single gauges representing the mean of all observations. This PR replaces the existing mean gauge metrics with a summary that includes quantile metrics, the count (number) of observations, and the sum of all observations.

Sample old output:

# TYPE solr_metrics_core_average_request_time gauge
solr_metrics_core_average_request_time{category="ADMIN",collection="example-collection",core="core_example-collection_shard1_replica_n1",handler="/admin/file",replica="replica_n1",shard="shard1"} 0.0
solr_metrics_core_average_request_time{category="ADMIN",collection="example-collection",core="core_example-collection_shard1_replica_n1",handler="/admin/luke",replica="replica_n1",shard="shard1"} 0.0

Sample new output:

# TYPE solr_metrics_core_request_time summary
solr_metrics_core_request_time{category="ADMIN",collection="example-collection",core="core_example-collection_shard1_replica_n1",handler="/admin/file",replica="replica_n1",shard="shard1",quantile="0.5"} 0.0
solr_metrics_core_request_time{category="ADMIN",collection="example-collection",core="core_example-collection_shard1_replica_n1",handler="/admin/file",replica="replica_n1",shard="shard1",quantile="0.75"} 0.0
solr_metrics_core_request_time{category="ADMIN",collection="example-collection",core="core_example-collection_shard1_replica_n1",handler="/admin/file",replica="replica_n1",shard="shard1",quantile="0.99"} 0.0
solr_metrics_core_request_time{category="ADMIN",collection="example-collection",core="core_example-collection_shard1_replica_n1",handler="/admin/file",replica="replica_n1",shard="shard1",quantile="0.999"} 0.0
solr_metrics_core_request_time_count{category="ADMIN",collection="example-collection",core="core_example-collection_shard1_replica_n1",handler="/admin/file",replica="replica_n1",shard="shard1"} 0
solr_metrics_core_request_time_sum{category="ADMIN",collection="example-collection",core="core_example-collection_shard1_replica_n1",handler="/admin/file",replica="replica_n1",shard="shard1"} 0.0
solr_metrics_core_request_time{category="ADMIN",collection="example-collection",core="core_example-collection_shard1_replica_n1",handler="/admin/luke",replica="replica_n1",shard="shard1",quantile="0.5"} 0.0
solr_metrics_core_request_time{category="ADMIN",collection="example-collection",core="core_example-collection_shard1_replica_n1",handler="/admin/luke",replica="replica_n1",shard="shard1",quantile="0.75"} 0.0
solr_metrics_core_request_time{category="ADMIN",collection="example-collection",core="core_example-collection_shard1_replica_n1",handler="/admin/luke",replica="replica_n1",shard="shard1",quantile="0.99"} 0.0
solr_metrics_core_request_time{category="ADMIN",collection="example-collection",core="core_example-collection_shard1_replica_n1",handler="/admin/luke",replica="replica_n1",shard="shard1",quantile="0.999"} 0.0
solr_metrics_core_request_time_count{category="ADMIN",collection="example-collection",core="core_example-collection_shard1_replica_n1",handler="/admin/luke",replica="replica_n1",shard="shard1"} 0
solr_metrics_core_request_time_sum{category="ADMIN",collection="example-collection",core="core_example-collection_shard1_replica_n1",handler="/admin/luke",replica="replica_n1",shard="shard1"} 0.0

Tests

I updated MetricsHandlerTest and SolrPrometheusFormatterTest to align with the changes to exportTimer. ./gradlew test passes on my local machine.

Checklist

Please review the following and check all that apply:

  • I have reviewed the guidelines for How to Contribute and my code conforms to the standards described there to the best of my ability.
  • I have created a Jira issue and added the issue ID to my pull request title.
  • I have given Solr maintainers access to contribute to my PR branch. (optional but recommended, not available for branches on forks living under an organisation)
  • I have developed this patch against the main branch.
  • I have run ./gradlew check.
  • I have added tests for my changes.
  • I have added documentation for the Reference Guide

Copy link
Contributor

@mlbiscoc mlbiscoc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @jkmuriithi for doing this!

@dsmiley maybe you can help review this and hopefully agree this is worth adding? I made this jira because I think this is important missing piece of metrics for prometheus and this PR addresses my poor implementation exporting timer (Should be a summary not a gauge average) but also adds support for creating a summary metric type.

The prometheus exporter doesn't seem to support histograms or summaries so this gets ahead of that curve.

jkmuriithi and others added 4 commits February 10, 2025 16:33
Co-authored-by: David Smiley <dsmiley@apache.org>
Co-authored-by: Matthew Biscocho <54160956+mlbiscoc@users.noreply.github.com>
Copy link

This PR has had no activity for 60 days and is now labeled as stale. Any new activity will remove the stale label. To attract more reviewers, please tag people who might be familiar with the code area and/or notify the dev@solr.apache.org mailing list. To exempt this PR from being marked as stale, make it a draft PR or add the label "exempt-stale". If left unattended, this PR will be closed after another 60 days of inactivity. Thank you for your contribution!

@github-actions github-actions bot added the stale PR not updated in 60 days label Apr 13, 2025
Copy link

This PR is now closed due to 60 days of inactivity after being marked as stale. Re-opening this PR is still possible, in which case it will be marked as active again.

@github-actions github-actions bot added the closed-stale Closed after being stale for 60 days label Jun 13, 2025
@github-actions github-actions bot closed this Jun 13, 2025
@mlbiscoc
Copy link
Contributor

I was looking at this PR for and just remembered I completely forgot about this PR! I think there is still value in having this for at least the 9x branch and then completely removed in 10 with OTEL. Not only did this add missing quantiles, it fixes up the timer export implementation with a summary giving better timer data.

@dsmiley If you have no other comments I'd like to merge this in. Is it too late for 9.9? Not sure where that process went.

@github-actions github-actions bot removed closed-stale Closed after being stale for 60 days stale PR not updated in 60 days labels Jun 28, 2025
@dsmiley
Copy link
Contributor

dsmiley commented Jul 1, 2025

Until there is a release branch for release X, no PR is "too late" for release X. Once a release branch exists (e.g. branch_9_9), you can take up the question with whoever volunteered to be the release manager (Houston volunteered for 9.9).

Copy link
Contributor

@mlbiscoc mlbiscoc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @jkmuriithi for fixing the precommit failure and merging in main. Could you add a entry into CHANGES.txt under 9.9? I think this belongs under improvement section. Maybe something like "Export metrics timers to wt=prometheus as Prometheus Summaries (introduces count, sum, and quantiles)"

Copy link
Contributor

@dsmiley dsmiley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 thanks for the before & after in JIRA. Really important for showing what changed.

@mlbiscoc mlbiscoc merged commit 1ddf718 into apache:main Jul 7, 2025
3 checks passed
mlbiscoc pushed a commit that referenced this pull request Jul 7, 2025
Export metric timers via `wt=prometheus` as Prometheus summaries. This introduces count, sum, and quantiles.
mlbiscoc pushed a commit that referenced this pull request Jul 8, 2025
Export metric timers via `wt=prometheus` as Prometheus summaries. This introduces count, sum, and quantiles.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants