Skip to content

SOLR-17806: Migrate DirectUpdateHandler2 metrics to OTEL #3417

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 7 commits into
base: feature/SOLR-17458
Choose a base branch
from

Conversation

mlbiscoc
Copy link
Contributor

@mlbiscoc mlbiscoc commented Jul 3, 2025

https://issues.apache.org/jira/browse/SOLR-17806

Migrate DirectUpdateHandler2 metrics and remove Dropwizard initializations.

# HELP solr_metrics_core_update_auto_commits_total Total number of auto commits
# TYPE solr_metrics_core_update_auto_commits_total counter
solr_metrics_core_update_auto_commits_total{category="UPDATE",collection="demo",core="demo_shard1_replica_n1",otel_scope_name="org.apache.solr",replica="replica_n1",shard="shard1",type="auto_commits"} 0.0
solr_metrics_core_update_auto_commits_total{category="UPDATE",collection="demo",core="demo_shard1_replica_n1",otel_scope_name="org.apache.solr",replica="replica_n1",shard="shard1",type="soft_auto_commits"} 0.0
# HELP solr_metrics_core_update_commit_stats Metrics around commits
# TYPE solr_metrics_core_update_commit_stats gauge
solr_metrics_core_update_commit_stats{category="UPDATE",collection="demo",core="demo_shard1_replica_n1",otel_scope_name="org.apache.solr",replica="replica_n1",shard="shard1",type="auto_commit_max_time"} 15000.0
solr_metrics_core_update_commit_stats{category="UPDATE",collection="demo",core="demo_shard1_replica_n1",otel_scope_name="org.apache.solr",replica="replica_n1",shard="shard1",type="soft_auto_commit_max_time"} 3000.0
# HELP solr_metrics_core_update_errors_total Total number of update errors
# TYPE solr_metrics_core_update_errors_total counter
solr_metrics_core_update_errors_total{category="UPDATE",collection="demo",core="demo_shard1_replica_n1",otel_scope_name="org.apache.solr",replica="replica_n1",shard="shard1"} 0.0
# HELP solr_metrics_core_update_maintenance_operations_total Total number of maintenance operations
# TYPE solr_metrics_core_update_maintenance_operations_total counter
solr_metrics_core_update_maintenance_operations_total{category="UPDATE",collection="demo",core="demo_shard1_replica_n1",operation="rollback",otel_scope_name="org.apache.solr",replica="replica_n1",shard="shard1"} 0.0
solr_metrics_core_update_maintenance_operations_total{category="UPDATE",collection="demo",core="demo_shard1_replica_n1",operation="split",otel_scope_name="org.apache.solr",replica="replica_n1",shard="shard1"} 0.0
# HELP solr_metrics_core_update_operations_cumulative Cumulative number of update commands processed. Metric can go down from rollback command
# TYPE solr_metrics_core_update_operations_cumulative gauge
solr_metrics_core_update_operations_cumulative{category="UPDATE",collection="demo",core="demo_shard1_replica_n1",operation="adds",otel_scope_name="org.apache.solr",replica="replica_n1",shard="shard1"} 0.0
solr_metrics_core_update_operations_cumulative{category="UPDATE",collection="demo",core="demo_shard1_replica_n1",operation="commits",otel_scope_name="org.apache.solr",replica="replica_n1",shard="shard1"} 0.0
solr_metrics_core_update_operations_cumulative{category="UPDATE",collection="demo",core="demo_shard1_replica_n1",operation="deletes_by_id",otel_scope_name="org.apache.solr",replica="replica_n1",shard="shard1"} 0.0
solr_metrics_core_update_operations_cumulative{category="UPDATE",collection="demo",core="demo_shard1_replica_n1",operation="deletes_by_query",otel_scope_name="org.apache.solr",replica="replica_n1",shard="shard1"} 0.0
solr_metrics_core_update_operations_cumulative{category="UPDATE",collection="demo",core="demo_shard1_replica_n1",operation="expunge_deletes",otel_scope_name="org.apache.solr",replica="replica_n1",shard="shard1"} 0.0
solr_metrics_core_update_operations_cumulative{category="UPDATE",collection="demo",core="demo_shard1_replica_n1",operation="merges",otel_scope_name="org.apache.solr",replica="replica_n1",shard="shard1"} 0.0
solr_metrics_core_update_operations_cumulative{category="UPDATE",collection="demo",core="demo_shard1_replica_n1",operation="optimize",otel_scope_name="org.apache.solr",replica="replica_n1",shard="shard1"} 0.0
# HELP solr_metrics_core_update_pending_operations Operations pending commit. Values get reset after commit
# TYPE solr_metrics_core_update_pending_operations gauge
solr_metrics_core_update_pending_operations{category="UPDATE",collection="demo",core="demo_shard1_replica_n1",operation="adds",otel_scope_name="org.apache.solr",replica="replica_n1",shard="shard1"} 0.0
solr_metrics_core_update_pending_operations{category="UPDATE",collection="demo",core="demo_shard1_replica_n1",operation="deletes_by_id",otel_scope_name="org.apache.solr",replica="replica_n1",shard="shard1"} 0.0
solr_metrics_core_update_pending_operations{category="UPDATE",collection="demo",core="demo_shard1_replica_n1",operation="deletes_by_query",otel_scope_name="org.apache.solr",replica="replica_n1",shard="shard1"} 0.0
solr_metrics_core_update_pending_operations{category="UPDATE",collection="demo",core="demo_shard1_replica_n1",operation="docs_pending",otel_scope_name="org.apache.solr",replica="replica_n1",shard="shard1"} 0.0

Copy link
Contributor

@dsmiley dsmiley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dealing with rollbacks is annoying... it's tempting to ignore them but I suppose we should not.

I would like to explore here monotinically increasing numbers that can be subtracted from each other to determine "pending" if needed. Maybe the prometheus endpoint could have a convenience to compute pending as well.

Meter numErrorsCumulative;

// Cumulative commands
AttributedLongUpDownCounter deleteByQueryCommandsCumulative;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why are these "cumulative" things having a type that includes "Down"; wouldn't they never go down?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They go down from a rollback command.

addCommandsCumulative.add(-addCommands.sumThenReset());
deleteByIdCommandsCumulative.add(-deleteByIdCommands.sumThenReset());
deleteByQueryCommandsCumulative.add(-deleteByQueryCommands.sumThenReset());

Although the maintenance operations below only go up, so we can move that to an normal counter actually.

updateStats =
solrMetricsContext.observableLongGauge(
"solr_metrics_core_update_pending_operations",
"Operations pending commit. Values get reset after commit",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if it would be better to instead track "submitted", as separate from "committed" operations? Much like the started and completed merges -- the PR Kevin Liang is working on -- #3416

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm maybe that is better. Maybe I'll reduce this to only docs_pending for this gauge instead then we'll move to submitted and committed increasing counters. I'll refactor this around and see how it looks.

.getSolrMetricsContext()
.getMetricManager()
.getPrometheusMetricReader(solrCore.getCoreMetricManager().getRegistryName());
var actual =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this statement is lacking some elegance. If this pattern repeats itself; we should seek an improvement

public static Supplier<Labels.Builder> getCloudLabelsBase(SolrCore core) {
return () ->
Labels.builder()
.label("collection", "tlog_replica_test_only_leader_indexes")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Look at the CloudDescriptor

return () ->
Labels.builder()
.label("collection", "tlog_replica_test_only_leader_indexes")
.label("shard", "shard1")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

likewise. Look at the CloudDescriptor

assertEquals("new adds", 2, newAdds - adds);
assertEquals("new cumulative adds", 2, newCumulativeAdds - cumulativeAdds);
assertEquals(
"Did not have the expected number of pending adds",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as you went through the effort of writng these messages -- fine. But I wouldn't waste your time; it's optional.

"Did not have the expected number of cumulative deletes_by_id",
1,
getGaugeOpDatapoint(
reader, "solr_metrics_core_update_operations_cumulative", "deletes_by_id")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just a quick observation -- the "metrics" part of these names seems redundant; no?
The "solr" part is I know needed to differentiate the JVM & Jetty.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah but mostly kept them as it was the way the prometheus exporter prefixed all it's metrics. Definitely no need to keep them if we don't want. I removed them in the latest commit but will remove it from the other metrics in a separate PR.

@@ -15,6 +15,8 @@
limitations under the License.
*/

//NOCOMMIT: This plugin seems tied to the Admin UIs plugin management but is tied to dropwizard metrics failing some tests.
// This needs to change how it gets these metrics or we need to make a shim to the /admin/plugins handler for this to support it
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at PluginInfoHandler, I suppose we can continue to support that.

Copy link
Contributor

@dsmiley dsmiley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor: I wonder if "ops" would be sufficiently clear instead of "operations"? After all, we have "stats", not "statistics".

@@ -262,27 +262,29 @@ public void initializeMetrics(
baseCommandsMetric,
baseAttributes.get().put(OPERATION_ATTR, "deletes_by_query").build());

var baseCommitMetric =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this a "base"; why do you refer to it in the metrics for commands & merges below other than "commits"?

new AttributedLongUpDownCounter(
baseCommandsMetric, baseAttributes.get().put(OPERATION_ATTR, "merges").build());
new AttributedLongCounter(
baseCommitMetric, baseAttributes.get().put(OPERATION_ATTR, "merges").build());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh this is a command and not merges generally. "mergeIndexes"? At least it's scoped by the OPERATION_ATTR so in-context, it ought to be clear.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants