-
Notifications
You must be signed in to change notification settings - Fork 3.1k
[prometheusremotewritereceiver] concurrency bug #43383
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: perebaj <perebaj@gmail.com>
Signed-off-by: perebaj <perebaj@gmail.com>
Signed-off-by: perebaj <perebaj@gmail.com>
Signed-off-by: perebaj <perebaj@gmail.com>
| { | ||
| name: "normal metric first, target_info second", | ||
| requests: []*writev2.Request{ | ||
| { | ||
| Symbols: []string{ | ||
| "", | ||
| "job", "production/service_a", // 1, 2 | ||
| "instance", "host1", // 3, 4 | ||
| "__name__", "normal_metric", // 5, 6 | ||
| "foo", "bar", // 7, 8 | ||
| }, | ||
| Timeseries: []writev2.TimeSeries{ | ||
| { | ||
| Metadata: writev2.Metadata{Type: writev2.Metadata_METRIC_TYPE_GAUGE}, | ||
| LabelsRefs: []uint32{5, 6, 1, 2, 3, 4, 7, 8}, | ||
| Samples: []writev2.Sample{{Value: 2, Timestamp: 2}}, | ||
| }, | ||
| }, | ||
| }, | ||
| { | ||
| Symbols: []string{ | ||
| "", | ||
| "job", "production/service_a", // 1, 2 | ||
| "instance", "host1", // 3, 4 | ||
| "machine_type", "n1-standard-1", // 5, 6 | ||
| "cloud_provider", "gcp", // 7, 8 | ||
| "region", "us-central1", // 9, 10 | ||
| "__name__", "target_info", // 11, 12 | ||
| }, | ||
| Timeseries: []writev2.TimeSeries{ | ||
| { | ||
| Metadata: writev2.Metadata{Type: writev2.Metadata_METRIC_TYPE_GAUGE}, | ||
| LabelsRefs: []uint32{1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test doesn't make sense. Looking back at how we implemented and tested the code, it's clear there's a logical problem here. We are assuming that by sending the metrics in the following order:
- A normal metric, with attributes (machine_type, cloud_provider, region, etc.) and job/instance = service_a/host1
- Immediately after, a target_info. With job/instance = service_a/host1
mockConsumer.metrics[0] would represent the combination of these two separate inputs (sent in two separate HTTP requests). This assumption is illogical. How could the zero index contain the aggregation of two separate requests?
As recently discussed in this thread (link), each request should only return data related to the information it processed. In other words:
- If we send normal metrics with samples and attributes, the response should reflect those metrics.
- If we send a target_info, whose sole purpose is to enrich metrics, the response should reflect that as well—by returning nothing, since no actual metrics were processed.
| @@ -1592,7 +1555,7 @@ func TestTargetInfoWithMultipleRequests(t *testing.T) { | |||
| assert.Equal(t, http.StatusNoContent, resp.StatusCode, string(body)) | |||
| } | |||
|
|
|||
| assert.NoError(t, pmetrictest.CompareMetrics(expectedMetrics, mockConsumer.metrics[0])) | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using the above comment as reference, the current change now is in the same page as the expected behavior! This because the expected final metric, is the junction of the target_info and the normal_metric, following the name of the test: target_info first, normal metric second. As we are sending both using separate http requests, then, "logically speaking", the last http response(index 1) that represents the merge between target_info and the normal metric
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Co-authored-by: Vihas Makwana <121151420+VihasMakwana@users.noreply.github.com>
|
Even approved, let's wait for more opinions... |
Description
Fixed a concurrency bug in the Prometheus remote write receiver where concurrent requests with identical job/instance labels would return empty responses after the first successful request.
Link to tracking issue
Fixes #42159
Testing
Documentation