Skip to content

[awsemfexporter] Fix grouping for container insights metrics for mixed metric types #320

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 13 commits into from
Jun 13, 2025
Merged
6 changes: 6 additions & 0 deletions exporter/awsemfexporter/grouped_metric.go
Original file line number Diff line number Diff line change
Expand Up @@ -91,6 +91,12 @@ func addToGroupedMetric(
// them together into one EMF log event, so don't set batchIndex when it's a summary metric
metadata.groupedMetricMetadata.batchIndex = i
}

if metadata.receiver == containerInsightsReceiver {
// For container insights, put all metrics in the same group regardless of type (ie gauge/counter)
metadata.groupedMetricMetadata.metricDataType = pmetric.MetricTypeEmpty
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to test if this affects any other part of container insights. I see that the only other reference to the data type is in the translateGroupedMetricToCWMetric func:

        if isPrometheusMetric {
		fields[fieldPrometheusMetricType] = fieldPrometheusTypes[groupedMetric.metadata.metricDataType]
	}

code:

fields[fieldPrometheusMetricType] = fieldPrometheusTypes[groupedMetric.metadata.metricDataType]

}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am concerned about mixing other metric types into the same log event, e.g. histograms/summary with gauge/counters.

This will only group them into the same log event if they also have the same dimensions, right? If so, then I think it'd be ok.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't currently emit non-counter metrics for container insights today. But if one of the container insights prom scrapers (neuron/dcgm/nvme/etc) starts ingesting non-counter metrics, we want that all to be in the same log. Otherwise customers will be charged extra

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Histogram/summary metrics have extra dimensions like "quantile" which we wouldn't want applied to the counter/gauge metrics since that doesn't make any sense. So I think they will have to be in a separate log event.


groupKey := aws.NewKey(metadata.groupedMetricMetadata, labels)
if _, ok := groupedMetrics[groupKey]; ok {
// if MetricName already exists in metrics map, print warning log
Expand Down
3 changes: 2 additions & 1 deletion exporter/awsemfexporter/metric_translator.go
Original file line number Diff line number Diff line change
Expand Up @@ -146,7 +146,8 @@ func (mt metricTranslator) translateOTelToGroupedMetric(rm pmetric.ResourceMetri
if strings.HasPrefix(serviceName.Str(), "containerInsightsKubeAPIServerScraper") ||
strings.HasPrefix(serviceName.Str(), "containerInsightsDCGMExporterScraper") ||
strings.HasPrefix(serviceName.Str(), "containerInsightsNeuronMonitorScraper") ||
strings.HasPrefix(serviceName.Str(), "containerInsightsKueueMetricsScraper") {
strings.HasPrefix(serviceName.Str(), "containerInsightsKueueMetricsScraper") ||
strings.HasPrefix(serviceName.Str(), "containerInsightsNVMeExporterScraper") {
// the prometheus metrics that come from the container insight receiver need to be clearly tagged as coming from container insights
metricReceiver = containerInsightsReceiver
}
Expand Down
17 changes: 16 additions & 1 deletion exporter/awsemfexporter/metric_translator_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -286,6 +286,8 @@ func TestTranslateOtToGroupedMetric(t *testing.T) {
neuronMetric.Resource().Attributes().PutStr(conventions.AttributeServiceName, "containerInsightsNeuronMonitorScraper")
kueueMetric := createTestResourceMetricsHelper(defaultNumberOfTestMetrics + 1)
kueueMetric.Resource().Attributes().PutStr(conventions.AttributeServiceName, "containerInsightsKueueMetricsScraper")
nvmeMetric := createTestResourceMetricsHelper(defaultNumberOfTestMetrics + 1)
nvmeMetric.Resource().Attributes().PutStr(conventions.AttributeServiceName, "containerInsightsNVMeExporterScraper")

counterSumMetrics := map[string]*metricInfo{
"spanCounter": {
Expand Down Expand Up @@ -403,6 +405,19 @@ func TestTranslateOtToGroupedMetric(t *testing.T) {
"myServiceNS/containerInsightsKueueMetricsScraper",
containerInsightsReceiver,
},
{
"nvme receiver",
nvmeMetric,
map[string]string{
"isItAnError": "false",
"spanName": "testSpan",
},
map[string]string{
"spanName": "testSpan",
},
"myServiceNS/containerInsightsNVMeExporterScraper",
containerInsightsReceiver,
},
}

for _, tc := range testCases {
Expand Down Expand Up @@ -438,7 +453,7 @@ func TestTranslateOtToGroupedMetric(t *testing.T) {
assert.Equal(t, tc.timerLabels, v.labels)
assert.Equal(t, timerMetrics, v.metrics)
default:
assert.Fail(t, fmt.Sprintf("Unhandled metric type %s not expected", v.metadata.metricDataType))
assert.Equal(t, tc.expectedReceiver, containerInsightsReceiver)
}
}
})
Expand Down
3 changes: 2 additions & 1 deletion internal/aws/k8s/k8sclient/endpoint_slices_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -253,7 +253,8 @@ var endpointSlicesArray = []runtime.Object{
},
Labels: map[string]string{
"kubernetes.io/service-name": "kube-controller-manager",
}},
},
},
},
&discoveryv1.EndpointSlice{
ObjectMeta: metav1.ObjectMeta{
Expand Down
1 change: 0 additions & 1 deletion internal/kubelet/client.go
Original file line number Diff line number Diff line change
Expand Up @@ -130,7 +130,6 @@ func (p *kubeConfigClientProvider) BuildClient() (Client, error) {
}

joinPath, err := url.JoinPath(authConf.Host, "/api/v1/nodes/", p.endpoint, "/proxy/")

if err != nil {
return nil, err
}
Expand Down
Loading