-
Couldn't load subscription status.
- Fork 247
Open
Description
Is your feature request related to a problem? Please describe.
Prometheus is based on samples, which means that even if we scraped every 15 seconds we could miss many short 100% or more CPU spikes.
So our Prom query may show that our 95th percentile utilization is at 50% of the current CPU requests value, but in practice lowering the requests might cause CPU throttling and/or increase latency.
Describe the solution you'd like
I believe a profiling tool would be needed here such as an ebpf exporter which could expose a metric with cpu spikes.
Perhaps something like https://github.com/cloudflare/ebpf_exporter
Metadata
Metadata
Assignees
Labels
No labels