Skip to content
Open
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions api/datadoghq/v2alpha1/datadogagent_types.go
Original file line number Diff line number Diff line change
Expand Up @@ -833,6 +833,12 @@ type KubeStateMetricsCoreFeatureConfig struct {
// +optional
// +listType=atomic
CollectCrMetrics []Resource `json:"collectCrMetrics,omitempty"`

// CollectControllerRevisions enables collection of ControllerRevision metrics.
// This requires agent version 7.72.0 or later.
// Default: false
// +optional
CollectControllerRevisions *bool `json:"collectControllerRevisions,omitempty"`
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Open question: Do we want to expose a configuration option that enables the collection of ControllerRevision resources — or should we instead use these resources internally to derive higher-level insights?

Based on our previous discussion, this seems related to exposing rollout duration for DaemonSets and StatefulSets.

If that’s the case, I wonder whether the new setting should be framed around the benefit or outcome (e.g., enabling rollout duration metrics) rather than the low-level implementation detail (collecting ControllerRevisions).

What do you think? Open to discussion.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@clamoriniere So the collection of DaemonSets and StatefulSets will still occur by default, it's just that the "start time" of the deployment would be less accurate without controllerrevisions access. However, this flag was meant as more of a killswitch than anything else, because enabling this on agent < 7.72.0 breaks the KSM check, or if there was a custom version that was 7.72 but tagged as something non-parsable, we wanted to be able to turn it on manually. I'm open to removing this option and just strictly having it version gated in the feature.go code.

Also - and this is something I'd probably rather not do (hence being extra careful with the operator and helm-chart deployments) - there's the option of modifying the agent code to NOT fail the ksm check if it's configured to collect controllerrevisions but doesn't have access. We just remove it from the list and continue normal function.

Happy to hear your thoughts

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the explanation, that makes sense.

I think a good approach would be to always add the RBAC for controllerrevisions when the KSM check is enabled. Then, starting from the next Agent/Cluster-Agent version, we could detect at runtime whether the permission is available. If it isn’t, we simply skip generating the metrics that rely on controllerrevisions.

This way, the check can still run successfully without introducing a new configuration parameter, which should provide a smoother experience for users and reduce long-term config maintenance.

Additionally, we could emit a check configuration warning when the RBAC is missing — the check would continue running, but the user would see a warning in the Agent status suggesting they add the missing permission or upgrade their Operator/Helm chart version.

I realize this involves a bit more work, but it would likely result in a better UX and a cleaner setup overall.
let me know what is your opinion?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So based on this PR the rbac will always be enabled, regardless of agent version. As far as the agent code changes, I may have misspoke - newer agents that don't have controllerrevisions enabled will still function, it's just that the start times for statefulsets will be less accurate. The case where the ksm check would break is in older agents < 7.72.0, if we enable collection of controllerrevisions (adding the rbac is fine). Unfortunately we can't fix that in the code, so we just have to make sure it's not enabled for those agents.

I'm not against the runtime check for resources in the agent, but it won't help us with this release, so I think it can be something we implement in the future?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a particular reason for having a config option in the CRD as opposed to automatically enabling controller revisions metric collection based on the agent version? I'd prefer to keep the feature config options simple and limited since a user can always provide a custom config if they want more fine grained control over what's collected

}

// Resource configures a custom resource for metric generation.
Expand Down
5 changes: 5 additions & 0 deletions api/datadoghq/v2alpha1/zz_generated.deepcopy.go

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

7 changes: 7 additions & 0 deletions api/datadoghq/v2alpha1/zz_generated.openapi.go

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

12 changes: 12 additions & 0 deletions config/crd/bases/v1/datadoghq.com_datadogagentinternals.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -1527,6 +1527,12 @@ spec:
kubeStateMetricsCore:
description: KubeStateMetricsCore check configuration.
properties:
collectControllerRevisions:
description: |-
CollectControllerRevisions enables collection of ControllerRevision metrics.
This requires agent version 7.72.0 or later.
Default: false
type: boolean
collectCrMetrics:
description: |-
`CollectCrMetrics` defines custom resources for the kube-state-metrics core check to collect.
Expand Down Expand Up @@ -9311,6 +9317,12 @@ spec:
kubeStateMetricsCore:
description: KubeStateMetricsCore check configuration.
properties:
collectControllerRevisions:
description: |-
CollectControllerRevisions enables collection of ControllerRevision metrics.
This requires agent version 7.72.0 or later.
Default: false
type: boolean
collectCrMetrics:
description: |-
`CollectCrMetrics` defines custom resources for the kube-state-metrics core check to collect.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -1552,6 +1552,10 @@
"additionalProperties": false,
"description": "KubeStateMetricsCore check configuration.",
"properties": {
"collectControllerRevisions": {
"description": "CollectControllerRevisions enables collection of ControllerRevision metrics.\nThis requires agent version 7.72.0 or later.\nDefault: false",
"type": "boolean"
},
"collectCrMetrics": {
"description": "`CollectCrMetrics` defines custom resources for the kube-state-metrics core check to collect.\n\nThe datadog agent uses the same logic as upstream `kube-state-metrics`. So is its configuration.\nThe exact structure and existing fields of each item in this list can be found in:\nhttps://github.com/kubernetes/kube-state-metrics/blob/main/docs/metrics/extend/customresourcestate-metrics.md",
"items": {
Expand Down Expand Up @@ -9157,6 +9161,10 @@
"additionalProperties": false,
"description": "KubeStateMetricsCore check configuration.",
"properties": {
"collectControllerRevisions": {
"description": "CollectControllerRevisions enables collection of ControllerRevision metrics.\nThis requires agent version 7.72.0 or later.\nDefault: false",
"type": "boolean"
},
"collectCrMetrics": {
"description": "`CollectCrMetrics` defines custom resources for the kube-state-metrics core check to collect.\n\nThe datadog agent uses the same logic as upstream `kube-state-metrics`. So is its configuration.\nThe exact structure and existing fields of each item in this list can be found in:\nhttps://github.com/kubernetes/kube-state-metrics/blob/main/docs/metrics/extend/customresourcestate-metrics.md",
"items": {
Expand Down
6 changes: 6 additions & 0 deletions config/crd/bases/v1/datadoghq.com_datadogagentprofiles.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -1527,6 +1527,12 @@ spec:
kubeStateMetricsCore:
description: KubeStateMetricsCore check configuration.
properties:
collectControllerRevisions:
description: |-
CollectControllerRevisions enables collection of ControllerRevision metrics.
This requires agent version 7.72.0 or later.
Default: false
type: boolean
collectCrMetrics:
description: |-
`CollectCrMetrics` defines custom resources for the kube-state-metrics core check to collect.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -1556,6 +1556,10 @@
"additionalProperties": false,
"description": "KubeStateMetricsCore check configuration.",
"properties": {
"collectControllerRevisions": {
"description": "CollectControllerRevisions enables collection of ControllerRevision metrics.\nThis requires agent version 7.72.0 or later.\nDefault: false",
"type": "boolean"
},
"collectCrMetrics": {
"description": "`CollectCrMetrics` defines custom resources for the kube-state-metrics core check to collect.\n\nThe datadog agent uses the same logic as upstream `kube-state-metrics`. So is its configuration.\nThe exact structure and existing fields of each item in this list can be found in:\nhttps://github.com/kubernetes/kube-state-metrics/blob/main/docs/metrics/extend/customresourcestate-metrics.md",
"items": {
Expand Down
12 changes: 12 additions & 0 deletions config/crd/bases/v1/datadoghq.com_datadogagents.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -1527,6 +1527,12 @@ spec:
kubeStateMetricsCore:
description: KubeStateMetricsCore check configuration.
properties:
collectControllerRevisions:
description: |-
CollectControllerRevisions enables collection of ControllerRevision metrics.
This requires agent version 7.72.0 or later.
Default: false
type: boolean
collectCrMetrics:
description: |-
`CollectCrMetrics` defines custom resources for the kube-state-metrics core check to collect.
Expand Down Expand Up @@ -9361,6 +9367,12 @@ spec:
kubeStateMetricsCore:
description: KubeStateMetricsCore check configuration.
properties:
collectControllerRevisions:
description: |-
CollectControllerRevisions enables collection of ControllerRevision metrics.
This requires agent version 7.72.0 or later.
Default: false
type: boolean
collectCrMetrics:
description: |-
`CollectCrMetrics` defines custom resources for the kube-state-metrics core check to collect.
Expand Down
8 changes: 8 additions & 0 deletions config/crd/bases/v1/datadoghq.com_datadogagents_v2alpha1.json
Original file line number Diff line number Diff line change
Expand Up @@ -1552,6 +1552,10 @@
"additionalProperties": false,
"description": "KubeStateMetricsCore check configuration.",
"properties": {
"collectControllerRevisions": {
"description": "CollectControllerRevisions enables collection of ControllerRevision metrics.\nThis requires agent version 7.72.0 or later.\nDefault: false",
"type": "boolean"
},
"collectCrMetrics": {
"description": "`CollectCrMetrics` defines custom resources for the kube-state-metrics core check to collect.\n\nThe datadog agent uses the same logic as upstream `kube-state-metrics`. So is its configuration.\nThe exact structure and existing fields of each item in this list can be found in:\nhttps://github.com/kubernetes/kube-state-metrics/blob/main/docs/metrics/extend/customresourcestate-metrics.md",
"items": {
Expand Down Expand Up @@ -9222,6 +9226,10 @@
"additionalProperties": false,
"description": "KubeStateMetricsCore check configuration.",
"properties": {
"collectControllerRevisions": {
"description": "CollectControllerRevisions enables collection of ControllerRevision metrics.\nThis requires agent version 7.72.0 or later.\nDefault: false",
"type": "boolean"
},
"collectCrMetrics": {
"description": "`CollectCrMetrics` defines custom resources for the kube-state-metrics core check to collect.\n\nThe datadog agent uses the same logic as upstream `kube-state-metrics`. So is its configuration.\nThe exact structure and existing fields of each item in this list can be found in:\nhttps://github.com/kubernetes/kube-state-metrics/blob/main/docs/metrics/extend/customresourcestate-metrics.md",
"items": {
Expand Down
7 changes: 7 additions & 0 deletions config/rbac/role.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -100,6 +100,13 @@ rules:
- deletecollection
- list
- watch
- apiGroups:
- apps
resources:
- controllerrevisions
verbs:
- list
- watch
- apiGroups:
- apps
resources:
Expand Down
1 change: 1 addition & 0 deletions docs/configuration.v2alpha1.md
Original file line number Diff line number Diff line change
Expand Up @@ -124,6 +124,7 @@ spec:
| features.helmCheck.collectEvents | CollectEvents set to `true` enables event collection in the Helm check (Requires Agent 7.36.0+ and Cluster Agent 1.20.0+) Default: false |
| features.helmCheck.enabled | Enables the Helm check. Default: false |
| features.helmCheck.valuesAsTags | ValuesAsTags collects Helm values from a release and uses them as tags (Requires Agent and Cluster Agent 7.40.0+). Default: {} |
| features.kubeStateMetricsCore.collectControllerRevisions | CollectControllerRevisions enables collection of ControllerRevision metrics. This requires agent version 7.72.0 or later. Default: false |
| features.kubeStateMetricsCore.collectCrMetrics | `CollectCrMetrics` defines custom resources for the kube-state-metrics core check to collect. The datadog agent uses the same logic as upstream `kube-state-metrics`. So is its configuration. The exact structure and existing fields of each item in this list can be found in: https://github.com/kubernetes/kube-state-metrics/blob/main/docs/metrics/extend/customresourcestate-metrics.md |
| features.kubeStateMetricsCore.conf.configData | ConfigData corresponds to the configuration file content. |
| features.kubeStateMetricsCore.conf.configMap.items | Maps a ConfigMap data `key` to a file `path` mount. |
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -92,6 +92,10 @@ instances:
config.WriteString(" - apiservices\n")
}

if collectorOpts.enableControllerRevisions {
config.WriteString(" - controllerrevisions\n")
}

if collectorOpts.enableCRD {
config.WriteString(" - customresourcedefinitions\n")
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,7 @@ instances:
optionsWithVPA := collectorOptions{enableVPA: true}
optionsWithCRD := collectorOptions{enableCRD: true}
optionsWithAPIService := collectorOptions{enableAPIService: true}
optionsWithControllerRevisions := collectorOptions{enableControllerRevisions: true}

// Test custom resources
optionsWithCustomResources := collectorOptions{
Expand Down Expand Up @@ -159,6 +160,17 @@ instances:
},
want: buildDefaultConfigMap(owner.GetNamespace(), defaultKubeStateMetricsCoreConf, ksmCheckConfig(true, optionsWithAPIService)),
},
{
name: "with ControllerRevisions",
fields: fields{
owner: owner,
enable: true,
runInClusterChecksRunner: true,
configConfigMapName: defaultKubeStateMetricsCoreConf,
collectorOpts: optionsWithControllerRevisions,
},
want: buildDefaultConfigMap(owner.GetNamespace(), defaultKubeStateMetricsCoreConf, ksmCheckConfig(true, optionsWithControllerRevisions)),
},
{
name: "with custom resources",
fields: fields{
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -46,10 +46,11 @@ func buildKSMFeature(options *feature.Options) feature.Feature {
}

type ksmFeature struct {
runInClusterChecksRunner bool
collectCRDMetrics bool
collectCrMetrics []v2alpha1.Resource
collectAPIServiceMetrics bool
runInClusterChecksRunner bool
collectCRDMetrics bool
collectCrMetrics []v2alpha1.Resource
collectAPIServiceMetrics bool
collectControllerRevisions bool

rbacSuffix string
serviceAccountName string
Expand All @@ -67,6 +68,9 @@ type ksmFeature struct {
// Add "-0" so that prerelase versions are considered sufficient. https://github.com/Masterminds/semver#working-with-prerelease-versions
const crdAPIServiceCollectionMinVersion = "7.46.0-0"

// Minimum agent version that supports collection of controllerrevisions
const controllerRevisionsCollectionMinVersion = "7.72.0-0"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this and crdAPIServiceCollectionMinVersion are both about minimum agent versions, I think it makes sense to group them together as constants

const (
  first = "aaa"
  second = "bbb"
)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will group


// ID returns the ID of the Feature
func (f *ksmFeature) ID() feature.IDType {
return feature.KubernetesStateCoreIDType
Expand All @@ -78,6 +82,7 @@ func (f *ksmFeature) Configure(dda metav1.Object, ddaSpec *v2alpha1.DatadogAgent
var output feature.RequiredComponents

if ddaSpec.Features != nil && ddaSpec.Features.KubeStateMetricsCore != nil && apiutils.BoolValue(ddaSpec.Features.KubeStateMetricsCore.Enabled) {
f.logger.Info("KubeStateMetricsCore feature enabled")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We already log out enabled features elsewhere. Could you also remove the other log lines? We don't generally log whether a config is set or not

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removing

output.ClusterAgent.IsRequired = apiutils.NewBoolPointer(true)
output.ClusterAgent.Containers = []apicommon.AgentContainerName{apicommon.ClusterAgentContainerName}
output.Agent.IsRequired = apiutils.NewBoolPointer(true)
Expand All @@ -88,6 +93,14 @@ func (f *ksmFeature) Configure(dda metav1.Object, ddaSpec *v2alpha1.DatadogAgent
f.collectCrMetrics = ddaSpec.Features.KubeStateMetricsCore.CollectCrMetrics
f.serviceAccountName = constants.GetClusterAgentServiceAccount(dda.GetName(), ddaSpec)

// Get the CollectControllerRevisions value from spec (default to false if not set)
collectControllerRevisionsExplicitlySet := ddaSpec.Features.KubeStateMetricsCore.CollectControllerRevisions != nil
if collectControllerRevisionsExplicitlySet {
f.collectControllerRevisions = apiutils.BoolValue(ddaSpec.Features.KubeStateMetricsCore.CollectControllerRevisions)
} else {
f.collectControllerRevisions = false
}

// This check will only run in the Cluster Checks Runners or Cluster Agent (not the Node Agent)
if ddaSpec.Features.ClusterChecks != nil && apiutils.BoolValue(ddaSpec.Features.ClusterChecks.Enabled) && apiutils.BoolValue(ddaSpec.Features.ClusterChecks.UseClusterChecksRunners) {
f.runInClusterChecksRunner = true
Expand All @@ -97,17 +110,43 @@ func (f *ksmFeature) Configure(dda metav1.Object, ddaSpec *v2alpha1.DatadogAgent
output.ClusterChecksRunner.Containers = []apicommon.AgentContainerName{apicommon.CoreAgentContainerName}

if ccrOverride, ok := ddaSpec.Override[v2alpha1.ClusterChecksRunnerComponentName]; ok {
if ccrOverride.Image != nil && !utils.IsAboveMinVersion(common.GetAgentVersionFromImage(*ccrOverride.Image), crdAPIServiceCollectionMinVersion) {
// Disable if image is overridden to an unsupported version
f.collectAPIServiceMetrics = false
f.collectCRDMetrics = false
if ccrOverride.Image != nil {
agentVersion := common.GetAgentVersionFromImage(*ccrOverride.Image)

if !utils.IsAboveMinVersion(agentVersion, crdAPIServiceCollectionMinVersion) {
f.collectAPIServiceMetrics = false
f.collectCRDMetrics = false
}

// Only apply version check if CollectControllerRevisions was not explicitly set
if !collectControllerRevisionsExplicitlySet {
if utils.IsAboveMinVersion(agentVersion, controllerRevisionsCollectionMinVersion) {
f.collectControllerRevisions = true
} else {
f.collectControllerRevisions = false
}
}
}
}
} else if clusterAgentOverride, ok := ddaSpec.Override[v2alpha1.ClusterAgentComponentName]; ok {
if clusterAgentOverride.Image != nil && !utils.IsAboveMinVersion(common.GetAgentVersionFromImage(*clusterAgentOverride.Image), crdAPIServiceCollectionMinVersion) {
// Disable if image is overridden to an unsupported version
f.collectAPIServiceMetrics = false
f.collectCRDMetrics = false
} else {
if clusterAgentOverride, ok := ddaSpec.Override[v2alpha1.ClusterAgentComponentName]; ok {
if clusterAgentOverride.Image != nil {
agentVersion := common.GetAgentVersionFromImage(*clusterAgentOverride.Image)

if !utils.IsAboveMinVersion(agentVersion, crdAPIServiceCollectionMinVersion) {
f.collectAPIServiceMetrics = false
f.collectCRDMetrics = false
}

// Only apply version check if CollectControllerRevisions was not explicitly set
if !collectControllerRevisionsExplicitlySet {
if utils.IsAboveMinVersion(agentVersion, controllerRevisionsCollectionMinVersion) {
f.collectControllerRevisions = true
} else {
f.collectControllerRevisions = false
}
}
}
}
}

Expand All @@ -124,16 +163,24 @@ func (f *ksmFeature) Configure(dda metav1.Object, ddaSpec *v2alpha1.DatadogAgent
}

f.configConfigMapName = constants.GetConfName(dda, f.customConfig, defaultKubeStateMetricsCoreConf)

// Log final configuration state
f.logger.Info("KubeStateMetricsCore configuration finalized",
"collectAPIServiceMetrics", f.collectAPIServiceMetrics,
"collectCRDMetrics", f.collectCRDMetrics,
"collectControllerRevisions", f.collectControllerRevisions,
"runInClusterChecksRunner", f.runInClusterChecksRunner)
}

return output
}

type collectorOptions struct {
enableVPA bool
enableAPIService bool
enableCRD bool
customResources []v2alpha1.Resource
enableVPA bool
enableAPIService bool
enableCRD bool
enableControllerRevisions bool
customResources []v2alpha1.Resource
}

// ManageDependencies allows a feature to manage its dependencies.
Expand All @@ -143,10 +190,11 @@ func (f *ksmFeature) ManageDependencies(managers feature.ResourceManagers, provi
// OR if the default configMap is needed.
pInfo := managers.Store().GetPlatformInfo()
collectorOpts := collectorOptions{
enableVPA: pInfo.IsResourceSupported("VerticalPodAutoscaler"),
enableAPIService: f.collectAPIServiceMetrics,
enableCRD: f.collectCRDMetrics,
customResources: f.collectCrMetrics,
enableVPA: pInfo.IsResourceSupported("VerticalPodAutoscaler"),
enableAPIService: f.collectAPIServiceMetrics,
enableCRD: f.collectCRDMetrics,
enableControllerRevisions: f.collectControllerRevisions,
customResources: f.collectCrMetrics,
}
configCM, err := f.buildKSMCoreConfigMap(collectorOpts)
if err != nil {
Expand Down
Loading
Loading