Skip to content

SOLR-17798: Integrate SDK OTLP metric exporter #3413

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 22 commits into
base: feature/SOLR-17458
Choose a base branch
from

Conversation

mlbiscoc
Copy link
Contributor

@mlbiscoc mlbiscoc commented Jun 30, 2025

https://issues.apache.org/jira/browse/SOLR-17798

In SolrMetricManager add a MetricExporter for OTLP metrics to be pushed via GRPC or HTTP.

By default the OTLP exporter is turned off and Solr only exposes metrics as pull through the /admin/metrics endpoint.
The following configuration parameters can be passed as system properties or environment variables:

System property | env variable

solr.otlpMetricExporterEnabled | SOLR_OTLP_METRIC_EXPORTER_ENABLED = true/false
solr.otlpMetricExporterProtocol | SOLR_OTLP_METRIC_EXPORTER_PROTOCOL = grpc/http/none
solr.otlpMetricExporterInterval | SOLR_OTLP_METRIC_EXPORTER_INTERVAL = <ms time>

Copy link
Contributor Author

@mlbiscoc mlbiscoc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tested this with a OTEL collector using the following collector config:

otel-config.yml
service:
  pipelines:
    metrics:
      receivers: [otlp]
      exporters: [prometheus]
  telemetry:
    metrics:
      address: 0.0.0.0:8888
      level: detailed

receivers:
  otlp:
    protocols:
      grpc:
        endpoint: 0.0.0.0:4317
      http:
        endpoint: 0.0.0.0:4318

exporters:
  prometheus:
    endpoint: 0.0.0.0:8889
    namespace: default

Solr pushes the metrics to the collectors receiver with grpc then you just curl the collector endpoint and the exporter gives you prometheus

curl 'localhost:8889/metrics'

I also think this feature branch is at a point where PRs for the migration to OTEL can start happening in parallel across handlers and different classes. Might start making additional PRs for migrations

Comment on lines +737 to +741
.registerMetricReader(reader)
.registerMetricReader(
PeriodicMetricReader.builder(metricExporter)
.setInterval(OTLP_EXPORTER_INTERVAL, TimeUnit.MILLISECONDS)
.build());
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are registering both the prometheus reader for our metrics endpoint and the periodic reader to push metrics to the OTLP exporter.

* @see io.opentelemetry.exporter.otlp.metrics.OtlpGrpcMetricExporter
* @see NoopMetricExporter
*/
public class OtlpExporterFactory {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's debatable if this should be here in Core. Should we actually move this into the Open Telemetry module for the OTLP exporter? OTLP tracing exporter is enabled with module only so maybe metrics as well.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree; metrics exporting belongs in a module. Rationale: to constrain dependencies

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I moved it into the Open Telemetry module and loaded it with SolrResourceLoader with NoopMetricExporter as the default. We just need the opentelemetry.exporter.sender.okhttp as a runtime dependency in core to use the OTLP exporter.

EnvUtils.getProperty("solr.otlpMetricExporterProtocol", "grpc");

public static final int OTLP_EXPORTER_INTERVAL =
Integer.parseInt(EnvUtils.getProperty("solr.otlpMetricExporterInterval", "60000"));
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

60 seconds is the default according to otel spec so copied it here but users can change the interval as they wish.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use of EnvUtils should have a format of "solr.module.blahBlah", so you have a missing period after "otlp". Although the module should be more major like "metrics"

Comment on lines 54 to 55
case "grpc" -> OtlpGrpcMetricExporter.getDefault();
case "http" -> OtlpHttpMetricExporter.getDefault();
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doing some testing, since we use the getDefault for both grpc and http, it actually reads some default env variables without us having to instrument our own read and configure for the exporter. For example, OTEL_EXPORTER_OTLP_ENDPOINT can change the endpoint where the metrics are pushed but I need to find all the different parameters that this picks up from documentation somewhere.

PeriodicMetricReader.builder(metricExporter)
.setInterval(OTLP_EXPORTER_INTERVAL, TimeUnit.MILLISECONDS)
.build());
SdkMeterProviderUtil.setExemplarFilter(builder, ExemplarFilter.traceBased());
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding in trace based exemplars. If you have Solr distributed trace on and push with OTLP to something like the OTEL collector, you can get exemplars on the metrics honored across the pipeline like so:

solr_metrics_core_requests_times_milliseconds_bucket{collection="demo",core="demo_shard1_replica_n1",handler="/update",job="unknown_service:java",replica="replica_n1",shard="shard1",le="50.0"} 26 1.751574499605e+09 # {trace_id="4438406367ea8ea60f53e3f6e26a409d",span_id="d797ab840ed14e6d"} 30.0 1.75157449662e+09

But this is directly pinned to your traces sampling rate.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed exemplar support here to not confuse the scope changes. Go to this PR instead.

Copy link
Contributor Author

@mlbiscoc mlbiscoc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dsmiley tagging you in this as well. Natively has a OTLP metric exporter in Solr as another configurability option. Opens Solr to many different metric pipelines with the industry support for this protocol. I set it behind some feature flags to enable but not sure if this is the correct implementation. Would like feedback on what is the correct direction to do so.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants