Skip to content

[BUG] Detector Pod phase status do not correctily not filter pod linked jobs #556

@antonin-rouault

Description

@antonin-rouault

What is the module?
otel-collector_kubernetes-common

What is the detector?
pod_phase_status
https://github.com/claranet/terraform-signalfx-detectors/blob/c94f90c18ab1cfbeef1efb4736ca830be20399b0/modules/otel-collector_kubernetes-common/detectors-gen.tf#L99C31-L99C47

Describe the bug
when one or more pods linked to a job fail, theoretically, the detector should not trigger any alert.
But in the end we have many alerts triggered on pods linked to jobs.

To Reproduce
Steps to reproduce the behavior:

  1. start a job doomed to fail
  2. observe the monitoring in signalfx
  3. Moment / situation when detector falsely raise (or not) alert

Expected behavior
A clear and concise description of what you expected to happen and difference compared to previous section.
the detector should not raise any alert when a pod linked to a job fails

resolution
replace the base-filtering filter (not filter('k8s.job.name', '*')) and (not filter('cronk8s.job.name', '*')) by (not filter('k8s.job.uid', '*')) and (not filter('cronk8s.job.uid', '*'))

base_filtering = (not filter('k8s.job.name', '*')) and (not filter('cronk8s.job.name', '*'))

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingdetectorsAbout nex or existing detectors

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions