-
Notifications
You must be signed in to change notification settings - Fork 739
SOLR-17806: Migrate DirectUpdateHandler2 metrics to OTEL #3417
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: feature/SOLR-17458
Are you sure you want to change the base?
SOLR-17806: Migrate DirectUpdateHandler2 metrics to OTEL #3417
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Dealing with rollbacks is annoying... it's tempting to ignore them but I suppose we should not.
I would like to explore here monotinically increasing numbers that can be subtracted from each other to determine "pending" if needed. Maybe the prometheus endpoint could have a convenience to compute pending as well.
Meter numErrorsCumulative; | ||
|
||
// Cumulative commands | ||
AttributedLongUpDownCounter deleteByQueryCommandsCumulative; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why are these "cumulative" things having a type that includes "Down"; wouldn't they never go down?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They go down from a rollback command.
addCommandsCumulative.add(-addCommands.sumThenReset());
deleteByIdCommandsCumulative.add(-deleteByIdCommands.sumThenReset());
deleteByQueryCommandsCumulative.add(-deleteByQueryCommands.sumThenReset());
Although the maintenance operations below only go up, so we can move that to an normal counter actually.
updateStats = | ||
solrMetricsContext.observableLongGauge( | ||
"solr_metrics_core_update_pending_operations", | ||
"Operations pending commit. Values get reset after commit", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if it would be better to instead track "submitted", as separate from "committed" operations? Much like the started and completed merges -- the PR Kevin Liang is working on -- #3416
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmmm maybe that is better. Maybe I'll reduce this to only docs_pending for this gauge instead then we'll move to submitted and committed increasing counters. I'll refactor this around and see how it looks.
.getSolrMetricsContext() | ||
.getMetricManager() | ||
.getPrometheusMetricReader(solrCore.getCoreMetricManager().getRegistryName()); | ||
var actual = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this statement is lacking some elegance. If this pattern repeats itself; we should seek an improvement
public static Supplier<Labels.Builder> getCloudLabelsBase(SolrCore core) { | ||
return () -> | ||
Labels.builder() | ||
.label("collection", "tlog_replica_test_only_leader_indexes") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Look at the CloudDescriptor
return () -> | ||
Labels.builder() | ||
.label("collection", "tlog_replica_test_only_leader_indexes") | ||
.label("shard", "shard1") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
likewise. Look at the CloudDescriptor
assertEquals("new adds", 2, newAdds - adds); | ||
assertEquals("new cumulative adds", 2, newCumulativeAdds - cumulativeAdds); | ||
assertEquals( | ||
"Did not have the expected number of pending adds", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as you went through the effort of writng these messages -- fine. But I wouldn't waste your time; it's optional.
"Did not have the expected number of cumulative deletes_by_id", | ||
1, | ||
getGaugeOpDatapoint( | ||
reader, "solr_metrics_core_update_operations_cumulative", "deletes_by_id") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just a quick observation -- the "metrics" part of these names seems redundant; no?
The "solr" part is I know needed to differentiate the JVM & Jetty.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah but mostly kept them as it was the way the prometheus exporter prefixed all it's metrics. Definitely no need to keep them if we don't want. I removed them in the latest commit but will remove it from the other metrics in a separate PR.
@@ -15,6 +15,8 @@ | |||
limitations under the License. | |||
*/ | |||
|
|||
//NOCOMMIT: This plugin seems tied to the Admin UIs plugin management but is tied to dropwizard metrics failing some tests. | |||
// This needs to change how it gets these metrics or we need to make a shim to the /admin/plugins handler for this to support it |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking at PluginInfoHandler, I suppose we can continue to support that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor: I wonder if "ops" would be sufficiently clear instead of "operations"? After all, we have "stats", not "statistics".
@@ -262,27 +262,29 @@ public void initializeMetrics( | |||
baseCommandsMetric, | |||
baseAttributes.get().put(OPERATION_ATTR, "deletes_by_query").build()); | |||
|
|||
var baseCommitMetric = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this a "base"; why do you refer to it in the metrics for commands & merges below other than "commits"?
new AttributedLongUpDownCounter( | ||
baseCommandsMetric, baseAttributes.get().put(OPERATION_ATTR, "merges").build()); | ||
new AttributedLongCounter( | ||
baseCommitMetric, baseAttributes.get().put(OPERATION_ATTR, "merges").build()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh this is a command and not merges generally. "mergeIndexes"? At least it's scoped by the OPERATION_ATTR so in-context, it ought to be clear.
https://issues.apache.org/jira/browse/SOLR-17806
Migrate DirectUpdateHandler2 metrics and remove Dropwizard initializations.