-
Notifications
You must be signed in to change notification settings - Fork 35
Description
What is the module?
otel-collector_kubernetes-common
What is the detector?
pod_phase_status
https://github.com/claranet/terraform-signalfx-detectors/blob/c94f90c18ab1cfbeef1efb4736ca830be20399b0/modules/otel-collector_kubernetes-common/detectors-gen.tf#L99C31-L99C47
Describe the bug
when one or more pods linked to a job fail, theoretically, the detector should not trigger any alert.
But in the end we have many alerts triggered on pods linked to jobs.
To Reproduce
Steps to reproduce the behavior:
- start a job doomed to fail
- observe the monitoring in signalfx
- Moment / situation when detector falsely raise (or not) alert
Expected behavior
A clear and concise description of what you expected to happen and difference compared to previous section.
the detector should not raise any alert when a pod linked to a job fails
resolution
replace the base-filtering filter (not filter('k8s.job.name', '*')) and (not filter('cronk8s.job.name', '*'))
by (not filter('k8s.job.uid', '*')) and (not filter('cronk8s.job.uid', '*'))
terraform-signalfx-detectors/modules/otel-collector_kubernetes-common/detectors-gen.tf
Line 107 in c94f90c
base_filtering = (not filter('k8s.job.name', '*')) and (not filter('cronk8s.job.name', '*')) |