Skip to content

[SOLR-17799] Replace segment merge metrics using gauges with consistent counters #3416

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

liangkaiwen
Copy link

@liangkaiwen liangkaiwen commented Jul 1, 2025

https://issues.apache.org/jira/browse/SOLR-17799

Description

Replace existing segment merge metrics with consistent counters

Solution

  • Enabled segment merge metrics by default (configurable threshold for major merges behavior is unchanged)
  • Removed the index merge running gauge metrics
  • Added the following index merge metrics:

INDEX.merge.major.started.docs
INDEX.merge.major.started.deletedDocs
INDEX.merge.major.started.segments
INDEX.merge.minor.started.docs
INDEX.merge.minor.started.deletedDocs
INDEX.merge.minor.started.segments

INDEX.merge.major.finished.docs
INDEX.merge.major.finished.deletedDocs
INDEX.merge.major.finished.segments
INDEX.merge.minor.finished.docs
INDEX.merge.minor.finished.deletedDocs
INDEX.merge.minor.finished.segments

Tests

  • Index 100 documents into 10 segments and force merge. Check metrics
  • Index 100 documents into 10 segments, delete 3, force merge. Check metrics
  • Reduce major merge threshold. Index 100 documents into 10 segments and force merge. Check major merge metrics
  • Run a flush. Check metrics

Checklist

Please review the following and check all that apply:

  • I have reviewed the guidelines for How to Contribute and my code conforms to the standards described there to the best of my ability.
  • I have created a Jira issue and added the issue ID to my pull request title.
  • I have given Solr maintainers access to contribute to my PR branch. (optional but recommended, not available for branches on forks living under an organisation)
  • I have developed this patch against the main branch.
  • I have run ./gradlew check.
  • I have added tests for my changes.
  • I have added documentation for the Reference Guide

@liangkaiwen
Copy link
Author

Will update documentation and add tests after we confirm this is the pattern of metrics/naming that we want to use

@dsmiley
Copy link
Contributor

dsmiley commented Jul 4, 2025

+1
Nice!
I slightly prefer "finished" instead of "completed" as I think it better balances with "started", plus it's slightly shorter.

@github-actions github-actions bot added the tests label Jul 22, 2025
@github-actions github-actions bot added the documentation Improvements or additions to documentation label Jul 22, 2025
Copy link
Contributor

@mlbiscoc mlbiscoc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Its a bit hard to review this without seeing the output changes in parallel. Could you post the before and after?

I have some concern for backwards compatibility. I am a fan of this technically being on by default but I think it is too late for that. We should probably keep it optional as before, otherwise we run the risk of blowing up cardinality for clouds with many cores without them knowing the moment they call the /admin/metrics endpoint without filtering.

Also as this is probably going for 9.x, we would probably need to keep the old metrics there and its format for backwards compat reasons as well. Just add the news ones in parallel.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A bit surprised this isn't implementer of SolrMetricProducer and all metrics are initialized in the constructor. Maybe the SolrMetricProducer was create later on and this never moved. I think we should move this to implement that interface which basically does your initMetrics().

Comment on lines +88 to +91
private final Map<String, Counter> majorMergeStartedMetrics = new HashMap<>();
private final Map<String, Counter> minorMergeStartedMetrics = new HashMap<>();
private final Map<String, Counter> majorMergeFinishedMetrics = new HashMap<>();
private final Map<String, Counter> minorMergeFinishedMetrics = new HashMap<>();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see you are updating these with updateMergeMetrics. But this isn't thread safe as these can be modified concurrently from multiple threads from a merge. This should probably be a ConcurrentHashMap?

Comment on lines +74 to +86
private static final String MERGE_METRIC_PATH_COMPONENT = "merge";
private static final String MERGE_METRIC_PATH_COMPONENT_MAJOR = "major";
private static final String MERGE_METRIC_PATH_COMPONENT_MINOR = "minor";
private static final String MERGE_METRIC_PATH_COMPONENT_FINISHED = "finished";
private static final String MERGE_METRIC_PATH_COMPONENT_STARTED = "started";

private static final String MERGES_METRIC_NAME = "merges";
private static final String MERGE_DOCS_METRIC_NAME = "docs";
private static final String MERGE_DELETED_DOCS_METRIC_NAME = "deletedDocs";
private static final String MERGE_SEGMENTS_METRIC_NAME = "segments";
private static final String MERGE_TIMES_METRIC_NAME = "mergeTimes";
private static final String MERGE_FLUSH_METRIC_NAME = "flushes";
private static final String MERGE_ERRORS_METRIC_NAME = "errors";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm usually a fan of constant static strings but in this case there are so many, I think I preferred it the other way of just in-lining them as I found it easier to read that it went something like

solrMetricsContext.gauge(
            () -> runningMajorMerges.get(),
            true,
            "running",
            SolrInfoBean.Category.INDEX.toString(),
            "merge",
            "major");

running.index.merge.major. I don't feel strongly about it though so if you prefer this way thats fine I guess.

runningMinorMergesDocs.addAndGet(-totalNumDocs);
runningMinorMergesSegments.addAndGet(-segmentsCount);
}
updateMergeMetrics(totalNumDocs, deletedDocs, segmentsCount, true, context);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be negative totalNumDocs and -segmentsCount?

Comment on lines +123 to +128
iw.forceMerge(1, true);

assertEquals(
"should be mulitple merge operations",
initialMerges + 1,
minorMergeStartedMetrics.getCount());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this technically just 1 merge from line 123? Why would initial be higher than 0? Is it because minor merges on itself with less docs?

@dsmiley
Copy link
Contributor

dsmiley commented Jul 22, 2025

Just want to react to something:

otherwise we run the risk of blowing up cardinality for clouds with many cores without them knowing the moment they call the /admin/metrics endpoint without filtering.

This is already a risk. Any real-world use of fetching /admin/metrics is going to have a filtering parameter. It'd be reasonable to detect that no filter was given and then potentially return an error. If the user is just kicking the tires, they may want to ignore the filter requirement so the error message might contain a trivial suggestion like to see all Solr metrics.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cat:index documentation Improvements or additions to documentation tests
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants