-
Notifications
You must be signed in to change notification settings - Fork 29
Open
Labels
bugSomething isn't workingSomething isn't workinghelp wantedExtra attention is neededExtra attention is neededobservabilitypython
Description
What happened:
I noticed that the policy failure counter metrics are providing an inconsistent value.
If you inspect the value for a particular counter metric incrementally it will change the value reflected in a way that is not accurate to policy evaluations.
Snippet from running curl (every 2 seconds) filtered for a single metric:
magtape_policy_total{count_type="fail",ns="test1",policy="policy-privileged-pod"} 35.0
magtape_policy_total{count_type="fail",ns="test1",policy="policy-privileged-pod"} 1.0
magtape_policy_total{count_type="fail",ns="test1",policy="policy-privileged-pod"} 1.0
magtape_policy_total{count_type="fail",ns="test1",policy="policy-privileged-pod"} 1.0
magtape_policy_total{count_type="fail",ns="test1",policy="policy-privileged-pod"} 35.0
magtape_policy_total{count_type="fail",ns="test1",policy="policy-privileged-pod"} 1.0
magtape_policy_total{count_type="fail",ns="test1",policy="policy-privileged-pod"} 1.0
magtape_policy_total{count_type="fail",ns="test1",policy="policy-privileged-pod"} 35.0
magtape_policy_total{count_type="fail",ns="test1",policy="policy-privileged-pod"} 1.0
magtape_policy_total{count_type="fail",ns="test1",policy="policy-privileged-pod"} 35.0
magtape_policy_total{count_type="fail",ns="test1",policy="policy-privileged-pod"} 1.0
magtape_policy_total{count_type="fail",ns="test1",policy="policy-privileged-pod"} 35.0
magtape_policy_total{count_type="fail",ns="test1",policy="policy-privileged-pod"} 1.0
magtape_policy_total{count_type="fail",ns="test1",policy="policy-privileged-pod"} 1.0
magtape_policy_total{count_type="fail",ns="test1",policy="policy-privileged-pod"} 1.0
magtape_policy_total{count_type="fail",ns="test1",policy="policy-privileged-pod"} 35.0
magtape_policy_total{count_type="fail",ns="test1",policy="policy-privileged-pod"} 1.0
magtape_policy_total{count_type="fail",ns="test1",policy="policy-privileged-pod"} 35.0
magtape_policy_total{count_type="fail",ns="test1",policy="policy-privileged-pod"} 1.0
magtape_policy_total{count_type="fail",ns="test1",policy="policy-privileged-pod"} 35.0
magtape_policy_total{count_type="fail",ns="test1",policy="policy-privileged-pod"} 1.0
magtape_policy_total{count_type="fail",ns="test1",policy="policy-privileged-pod"} 1.0
magtape_policy_total{count_type="fail",ns="test1",policy="policy-privileged-pod"} 1.0
magtape_policy_total{count_type="fail",ns="test1",policy="policy-privileged-pod"} 35.0
magtape_policy_total{count_type="fail",ns="test1",policy="policy-privileged-pod"} 35.0
magtape_policy_total{count_type="fail",ns="test1",policy="policy-privileged-pod"} 35.0
magtape_policy_total{count_type="fail",ns="test1",policy="policy-privileged-pod"} 1.0
magtape_policy_total{count_type="fail",ns="test1",policy="policy-privileged-pod"} 1.0
magtape_policy_total{count_type="fail",ns="test1",policy="policy-privileged-pod"} 1.0
magtape_policy_total{count_type="fail",ns="test1",policy="policy-privileged-pod"} 1.0
magtape_policy_total{count_type="fail",ns="test1",policy="policy-privileged-pod"} 1.0
magtape_policy_total{count_type="fail",ns="test1",policy="policy-privileged-pod"} 1.0
magtape_policy_total{count_type="fail",ns="test1",policy="policy-privileged-pod"} 35.0
magtape_policy_total{count_type="fail",ns="test1",policy="policy-privileged-pod"} 35.0
magtape_policy_total{count_type="fail",ns="test1",policy="policy-privileged-pod"} 1.0
magtape_policy_total{count_type="fail",ns="test1",policy="policy-privileged-pod"} 1.0
magtape_policy_total{count_type="fail",ns="test1",policy="policy-privileged-pod"} 35.0
magtape_policy_total{count_type="fail",ns="test1",policy="policy-privileged-pod"} 1.0
magtape_policy_total{count_type="fail",ns="test1",policy="policy-privileged-pod"} 35.0
magtape_policy_total{count_type="fail",ns="test1",policy="policy-privileged-pod"} 1.0
It's almost like there are multiple counters running in the background and the metrics route handler sometimes displays values from one, and sometimes displays values from the other.
What you expected to happen:
Metrics counter values should be consistent
How to reproduce it (as minimally and precisely as possible):
- Deploy MagTape
- Run
make test-functional
and/or manually apply some resources to force policy failures to increment the counters - Port forward to a specific MagTape pod on port 5000
- Run a curl against the metrics endpoint in a loop and record the values for a specific metric and you should see the value change
$ for i in {1..100}; do curl -ks https://localhost:5000/metrics | grep "magtape_policy_total" | grep "test1" | grep "fail" | grep "privileged" >> /tmp/magtape-pod1-metrics.out; done
Anything else we need to know?:
MagTape was running with 3 replicas
Environment:
- Kubernetes version (use
kubectl version
): v1.17 - Cloud provider or hardware configuration:
- Others:
- MagTape v2.3.2
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't workinghelp wantedExtra attention is neededExtra attention is neededobservabilitypython