-
Notifications
You must be signed in to change notification settings - Fork 17
Description
while doing some testing of the karpenter provider for cluster api i have come across a scenario where the prometheus metrics are panicking due to a transformation of the kubernetes api group for my type (cluster.x-k8s.io
).
after some manual testing, it appears that the toPrometheusLabel
function should probably add the dash (-
) character to the list of unsupported characters. ref: https://github.com/awslabs/operatorpkg/blob/main/status/controller.go#L194
this conforms with the prometheus docs about the data model: https://prometheus.io/docs/concepts/data_model/
Label names SHOULD match the regex [a-zA-Z_][a-zA-Z0-9_]* for the best experience and compatibility (see the warning below). Label names outside of that regex will require quoting e.g. when used in PromQL
this is the error log i see in my application:
{"level":"DEBUG","time":"2025-05-30T00:04:04.961Z","logger":"controller","caller":"operator/operator.go:132","message":"discovered karpenter version","version":"unspecified"}
panic: descriptor Desc{fqName: "operator_nodeclaim_status_condition_transition_seconds", help: "The amount of time a condition was in a given state before transitioning. e.g. Alarm := P99(Updated=False) > 5 minutes", constLabels: {}, variableLabels: {type,status,karpenter_cluster_x-k8s_io_
clusterapinodeclass,karpenter_sh_nodepool}} is invalid: "karpenter_cluster_x-k8s_io_clusterapinodeclass" is not a valid label name for metric "operator_nodeclaim_status_condition_transition_seconds"
goroutine 1 [running]:
github.com/prometheus/client_golang/prometheus.(*Registry).MustRegister(0x347ba60, {0xc00028b8a0?, 0xc000113220?, 0x1a?})
/home/mike/go/pkg/mod/github.com/prometheus/client_golang@v1.20.5/prometheus/registry.go:406 +0x65
github.com/awslabs/operatorpkg/metrics.NewPrometheusHistogram({0x2310730, 0x347ba60}, {{0x20306e6, 0x8}, {0xc000113220, 0x1a}, {0x203e2dd, 0x12}, {0x20ba690, 0x76}, ...}, ...)
/home/mike/go/pkg/mod/github.com/awslabs/operatorpkg@v0.0.0-20241205163410-0fff9f28d115/metrics/prometheus.go:69 +0x19e
github.com/awslabs/operatorpkg/status.conditionDurationMetric({0xc00051c780?, 0x9}, {0xc0000cac80, 0x2, 0x0?})
/home/mike/go/pkg/mod/github.com/awslabs/operatorpkg@v0.0.0-20241205163410-0fff9f28d115/status/metrics.go:29 +0x297
github.com/awslabs/operatorpkg/status.NewController[...]({0x232cdc0, 0xc000352c60}, {0x2310240, 0xc00063d0c0}, {0xc00016de48, 0x22fb7e0?, 0xc0004bcc48?})
/home/mike/go/pkg/mod/github.com/awslabs/operatorpkg@v0.0.0-20241205163410-0fff9f28d115/status/controller.go:79 +0x34d
sigs.k8s.io/karpenter/pkg/controllers.NewControllers({0x231b430, 0xc0003b4c60}, {0x2334e48, 0xc0002c7380}, {0x23200a8, 0x349fb40}, {0x232cdc0, 0xc000352c60}, {0x22fb7e0, 0xc0004bcc48}, ...)
/home/mike/go/pkg/mod/sigs.k8s.io/karpenter@v1.1.3/pkg/controllers/controllers.go:100 +0x1148
main.main()
/home/mike/karpenter-provider-cluster-api/cmd/controller/main.go:32 +0x18b