Skip to content

Conversation

@perebaj
Copy link
Contributor

@perebaj perebaj commented Oct 9, 2025

Description

Fixed a concurrency bug in the Prometheus remote write receiver where concurrent requests with identical job/instance labels would return empty responses after the first successful request.

Link to tracking issue

Fixes #42159

Testing

Documentation

Signed-off-by: perebaj <perebaj@gmail.com>
Signed-off-by: perebaj <perebaj@gmail.com>
Signed-off-by: perebaj <perebaj@gmail.com>
@perebaj perebaj changed the title [receiver/prometheusremotewritereceiver] concurrency bug [prometheusremotewritereceiver] concurrency bug Oct 23, 2025
Signed-off-by: perebaj <perebaj@gmail.com>
Comment on lines -1498 to -1530
{
name: "normal metric first, target_info second",
requests: []*writev2.Request{
{
Symbols: []string{
"",
"job", "production/service_a", // 1, 2
"instance", "host1", // 3, 4
"__name__", "normal_metric", // 5, 6
"foo", "bar", // 7, 8
},
Timeseries: []writev2.TimeSeries{
{
Metadata: writev2.Metadata{Type: writev2.Metadata_METRIC_TYPE_GAUGE},
LabelsRefs: []uint32{5, 6, 1, 2, 3, 4, 7, 8},
Samples: []writev2.Sample{{Value: 2, Timestamp: 2}},
},
},
},
{
Symbols: []string{
"",
"job", "production/service_a", // 1, 2
"instance", "host1", // 3, 4
"machine_type", "n1-standard-1", // 5, 6
"cloud_provider", "gcp", // 7, 8
"region", "us-central1", // 9, 10
"__name__", "target_info", // 11, 12
},
Timeseries: []writev2.TimeSeries{
{
Metadata: writev2.Metadata{Type: writev2.Metadata_METRIC_TYPE_GAUGE},
LabelsRefs: []uint32{1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12},
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test doesn't make sense. Looking back at how we implemented and tested the code, it's clear there's a logical problem here. We are assuming that by sending the metrics in the following order:

  1. A normal metric, with attributes (machine_type, cloud_provider, region, etc.) and job/instance = service_a/host1
  2. Immediately after, a target_info. With job/instance = service_a/host1

mockConsumer.metrics[0] would represent the combination of these two separate inputs (sent in two separate HTTP requests). This assumption is illogical. How could the zero index contain the aggregation of two separate requests?

As recently discussed in this thread (link), each request should only return data related to the information it processed. In other words:

  • If we send normal metrics with samples and attributes, the response should reflect those metrics.
  • If we send a target_info, whose sole purpose is to enrich metrics, the response should reflect that as well—by returning nothing, since no actual metrics were processed.

@@ -1592,7 +1555,7 @@ func TestTargetInfoWithMultipleRequests(t *testing.T) {
assert.Equal(t, http.StatusNoContent, resp.StatusCode, string(body))
}

assert.NoError(t, pmetrictest.CompareMetrics(expectedMetrics, mockConsumer.metrics[0]))
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using the above comment as reference, the current change now is in the same page as the expected behavior! This because the expected final metric, is the junction of the target_info and the normal_metric, following the name of the test: target_info first, normal metric second. As we are sending both using separate http requests, then, "logically speaking", the last http response(index 1) that represents the merge between target_info and the normal metric

Copy link
Contributor

@VihasMakwana VihasMakwana left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Co-authored-by: Vihas Makwana <121151420+VihasMakwana@users.noreply.github.com>
@perebaj
Copy link
Contributor Author

perebaj commented Oct 23, 2025

Even approved, let's wait for more opinions...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[receiver/prometheusremotewrite] Send data multiple times, keeping only one instance for the same dimension and discarding the rest.

3 participants