-
Notifications
You must be signed in to change notification settings - Fork 3.1k
[receiver/prometheusremotewrite] - Fix panic caused by mutating read-only ResourceMetrics
#43597
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
ResourceMetrics
| w.WriteHeader(http.StatusNoContent) | ||
|
|
||
| obsrecvCtx := prw.obsrecv.StartMetricsOp(req.Context()) | ||
| if m.MetricCount() == 0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's no need to send empty telemetry IMO. Let me know if any of the codeowners feel otherwise.
|
Hey @VihasMakwana, thanks for the effort here! We have been discussing the better way to solve this issue on our by-weekly SIG meeting. I think that we finally found a solution that doesn't need to deal with mutexes and change the minimum amount of code that we can! This version can be found here. I just need to discuss it with Sens and finish some local tests. Then we can merge it! |
|
@perebaj Hi! Thank you for mentioning your PR. I'll review it, but it looks good to go. |
Background
The collector core marks telemetry data as read-only after it has been consumed by the fanout consumer.
Ref: https://github.com/open-telemetry/opentelemetry-collector/blob/58b71431b36f62a7797a02d661547584cf15222a/internal/fanoutconsumer/metrics.go#L67-L69
This ensures that exporters and other consumers cannot mutate the same data instance after it has been forwarded downstream.
However, in the
prometheusremotewritereceiver, we currently cache the entireResourceMetricsobject, which is directly tied to the originalpmetric.Metricsinstance. When a subsequent request reuses this cached object and appends new samples, it tries to mutate a read-only object, causing a panic.Fix
Instead of caching the full ResourceMetrics object, we should:
Hence, there should be two caches:
If the
target_infometrics and normal metrics are part of different requests, then there are two cases:target_infoarrives first:target_infodata to arrive and populate the resource attributes.How to Reproduce
Refer #41347 (comment)
Fixes #41347