-
Notifications
You must be signed in to change notification settings - Fork 27
Fixing EMF output for detailed summary metrics #313
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@@ -51,13 +51,13 @@ func addToGroupedMetric( | |||
} | |||
continue | |||
} | |||
dps, retained := dps.CalculateDeltaDatapoints(i, metadata.instrumentationScopeName, config.DetailedMetrics, calculators) | |||
ddps, retained := dps.CalculateDeltaDatapoints(i, metadata.instrumentationScopeName, config.DetailedMetrics, calculators) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The variable name dps
here shadows the variable created previously on line 39. Changed to ddps
for "delta datapoints"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd avoid changing variables names from upstream since it can cause merge conflicts -- https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/main/exporter/awsemfexporter/grouped_metric.go#L39
I can see why this is confusing though
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah yeah I forgot it'll cause conflicts. ugh. this was so confusing when I was trying to figure out this bug. i'll revert the name changes
if !(metadata.groupedMetricMetadata.metricDataType == pmetric.MetricTypeSummary && config.DetailedMetrics) { | ||
// Summary metrics can be split into separate datapoints when using DetailedMetrics, but we still want to group | ||
// them together into one EMF log event, so don't set batchIndex when it's a summary metric | ||
metadata.groupedMetricMetadata.batchIndex = j | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the bug fix
if !retained { | ||
continue | ||
} | ||
|
||
for i, dp := range dps { | ||
labels := dp.labels | ||
for j, ddp := range ddps { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The variable name i
here shadows the variable created previously on line 44.
dp
-> ddp
to mirror dps
-> ddps
change.
@@ -51,13 +51,13 @@ func addToGroupedMetric( | |||
} | |||
continue | |||
} | |||
dps, retained := dps.CalculateDeltaDatapoints(i, metadata.instrumentationScopeName, config.DetailedMetrics, calculators) | |||
ddps, retained := dps.CalculateDeltaDatapoints(i, metadata.instrumentationScopeName, config.DetailedMetrics, calculators) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd avoid changing variables names from upstream since it can cause merge conflicts -- https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/main/exporter/awsemfexporter/grouped_metric.go#L39
I can see why this is confusing though
} | ||
|
||
// Extra params to use when grouping metrics | ||
metadata.groupedMetricMetadata.batchIndex = i |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why remove this? Will we break appsignals since they are the ones who added this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It wasn't removed, just moved inside the if statement to skip this part for detailed summary metrics (for which the assumptions that were made when this when introduced does not hold).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agree this code is confusing so what's happening is to skip index changing for summary and when detailedMetrics is enabled. Is there any chance that these metrics are sparse in the array meaning keeping the index would cause any unexepcted grouping behavior?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any chance that these metrics are sparse in the array meaning keeping the index would cause any unexepcted grouping behavior
I don't think so. Here is how the array of datapoints is created: https://github.com/amazon-contributing/opentelemetry-collector-contrib/blob/bump-v0.124.1/exporter/awsemfexporter/datapoint.go#L483-L494. There' always two array entries for sum and count metric (that we want grouped) and then one datapoint for each quantile. We want the quantile entries to be separated, but they are already are because of the labels are different.
ced70e2
to
9580996
Compare
Description
Enabling the “DetailedMetrics” configuration for the awsemfexporter causes this OTeL Prometheus receiver → awsemfexporter pipeline to output n + 2 log events:
The original implementation for the DetailedMetrics functionality outputs count/sum metrics as one log event instead of two:
A recent change to the awsemfexporter caused the groupKey for count and summary metrics to be unique so they are no longer grouped together and therefore emitted as separate log events.
This PR restore the original functionality.
Testing
Followed something similar to the OTLP -> AMP integration test (https://github.com/aws/amazon-cloudwatch-agent-test/blob/main/test/amp/resources/otlp_pusher.sh) but configured the agent to push to EMF instead of AMP.
Use following OTLP metrics file with summary metric only:
Start with this agent config:
Restart agent to run translator, then manually update agent otel config since json config doesn't support setting detailed metrics:
Start the agent without running translator by executing the binary directly:
Run the otlp_pusher.sh
Check CloudWatch logs
Before changes (5 log events)
After changes (4 logs events)
Documentation