Skip to content

fix: Ensure DaemonSets with '/initialized' and '/registered' are considered during nodeclaim calculations #2161

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

1mwataru
Copy link

@1mwataru 1mwataru commented Apr 23, 2025

Fixes #2116

Description

  • When Daemonsets has nodeAffinity or nodeSelector related to karpenter.sh/registered and karpenter.sh/initialized labels, these are not considered during NodeClaim calculations, resulting in nodes being launched with smaller sizes than expected.

    • This leads to unexpected behaviors such as resource shortages on nodes, preventing Daemonset Pods running, or Deployment Pods remaining in Pending state causing new NodeClaims to be created.
  • Modified the code to consider karpenter.sh/registered and karpenter.sh/initialized labels during Daemonset overhead calculations.

How was this change tested?

  • Prerequisites

    • Prepared the following Daemonset and Deployment
      • Daemonset
        • Pod Identity
          • No setting
        • VPC CNI (aws-node)
          • Cpu request: 25m
        • Kube Proxy
          • Cpu request: 100m
        • My Daemonset (with required nodeAffinity that specifies nodeSelectorTerms with matchExpressions for the karpenter.sh/registered label)
          • Cpu request: 2 (2000m)
      • Deployment
        • My Deployment
          • Cpu request: 1 (1000m)
  • Before the change

    • Confirmed from Karpenter logs that My Daemonset was not considered in the request value ("cpu":"1150m")

      {"level":"INFO","time":"2025-04-23T13:11:26.806Z","logger":"controller","caller":"provisioning/provisioner.go:413","message":"created nodeclaim","commit":"3f9eb00-dirty","controller":"provisioner","namespace":"","name":"","reconcileID":"e786ef2c-95ea-4607-b317-d684dd519b89","NodePool":{"name":"my-nodepool"},"NodeClaim":{"name":"my-nodepool-jtzvc"},"requests":{"cpu":"1150m","pods":"4"},"instance-types":"c5.large, c5.xlarge, c5a.large, c5a.xlarge, c5d.large and 55 other(s)"}
      
    • As a result, an EC2 instance of type c5d.large (allocatable cpu: 1930m) was launched, but the Daemonset remained in Pending state

  • After the change

    • Confirmed from Karpenter logs that My Daemonset was now considered in the request value ("cpu":"3150m")

      {"level":"INFO","time":"2025-04-23T13:45:02.609Z","logger":"controller","caller":"provisioning/provisioner.go:413","message":"created nodeclaim","commit":"3f9eb00-dirty","controller":"provisioner","namespace":"","name":"","reconcileID":"061608ab-f9ad-41fa-ade5-fd19a2ccec58","NodePool":{"name":"my-nodepool"},"NodeClaim":{"name":"my-nodepool-wbh4m"},"requests":{"cpu":"3150m","pods":"5"},"instance-types":"c5.2xlarge, c5.xlarge, c5a.2xlarge, c5a.xlarge, c5d.xlarge and 55 other(s)"}
      
    • The Daemonset started successfully without issues

Manifests used for verification

  • NodePool

    nodepool.yaml

    apiVersion: karpenter.sh/v1
    kind: NodePool
    metadata:
    name: my-nodepool
    spec:
    template:
        metadata:
        labels:
            cluster-name: cluster-31-3923-alb
        spec:
        nodeClassRef:
            group: karpenter.k8s.aws
            kind: EC2NodeClass
            name: default
        requirements:
        - key: karpenter.k8s.aws/instance-category
            operator: In
            values:
            - m
            - c
            - r
        - key: karpenter.k8s.aws/instance-generation
            operator: Gt
            values:
            - "4"
        - key: kubernetes.io/arch
            operator: In
            values:
            - amd64
        - key: kubernetes.io/os
            operator: In
            values:
            - linux

  • Deployment and Deamonset

    dep_deamon.yaml
    apiVersion: apps/v1
    kind: Deployment
    metadata:
    name: my-deploy
    spec:
    replicas: 1
    selector:
        matchLabels:
        jeremy: my-deploy
    template:
        metadata:
        labels:
            jeremy: my-deploy
        spec:
        nodeSelector:
            karpenter.sh/nodepool: my-nodepool
        containers:
        - name: my-dep-container
            image: nginx:latest
            ports:
            - containerPort: 80
            resources:
            requests:
                cpu: "1"
    
    ---
    
    apiVersion: apps/v1
    kind: DaemonSet
    metadata:
    name: my-daemonset
    spec:
    selector:
        matchLabels:
        app.kubernetes.io/name: my-daemonset-app
    template:
        metadata:
        labels:
            app.kubernetes.io/name: my-daemonset-app
        spec:
        affinity:
            nodeAffinity:
            requiredDuringSchedulingIgnoredDuringExecution:
                nodeSelectorTerms:
                - matchExpressions:
                - key: karpenter.sh/registered
                    operator: In
                    values:
                    - "true"
                - key: karpenter.sh/nodepool
                    operator: In
                    values:
                    - my-nodepool
        containers:
        - image: nginx:latest
            imagePullPolicy: IfNotPresent
            name: my-ds-container
            resources:
            requests:
                cpu: "2"
        tolerations:
        - operator: Exists

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Apr 23, 2025
@k8s-ci-robot
Copy link
Contributor

Welcome @1mwataru!

It looks like this is your first PR to kubernetes-sigs/karpenter 🎉. Please refer to our pull request process documentation to help your PR have a smooth ride to approval.

You will be prompted by a bot to use commands during the review process. Do not be afraid to follow the prompts! It is okay to experiment. Here is the bot commands documentation.

You can also check if kubernetes-sigs/karpenter has its own contribution guidelines.

You may want to refer to our testing guide if you run into trouble with your tests not passing.

If you are having difficulty getting your pull request seen, please follow the recommended escalation practices. Also, for tips and tricks in the contribution process you may want to read the Kubernetes contributor cheat sheet. We want to make sure your contribution gets all the attention it needs!

Thank you, and welcome to Kubernetes. 😃

@k8s-ci-robot k8s-ci-robot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Apr 23, 2025
@k8s-ci-robot
Copy link
Contributor

Hi @1mwataru. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot added the size/S Denotes a PR that changes 10-29 lines, ignoring generated files. label Apr 23, 2025
@coveralls
Copy link

coveralls commented Apr 23, 2025

Pull Request Test Coverage Report for Build 15683847793

Details

  • 12 of 12 (100.0%) changed or added relevant lines in 1 file are covered.
  • 7 unchanged lines in 1 file lost coverage.
  • Overall coverage decreased (-0.008%) to 82.002%

Files with Coverage Reduction New Missed Lines %
pkg/controllers/provisioning/scheduling/preferences.go 7 88.76%
Totals Coverage Status
Change from base Build 15642318551: -0.008%
Covered Lines: 10265
Relevant Lines: 12518

💛 - Coveralls

@jonathan-innis
Copy link
Member

/assign @jonathan-innis

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label May 7, 2025
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: 1mwataru
Once this PR has been reviewed and has the lgtm label, please ask for approval from jonathan-innis. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label May 13, 2025
@1mwataru 1mwataru requested a review from jonathan-innis May 15, 2025 02:46
@@ -70,6 +70,12 @@ func NewNodeClaimTemplate(nodePool *v1.NodePool) *NodeClaimTemplate {
})
nct.Requirements.Add(scheduling.NewNodeSelectorRequirementsWithMinValues(nct.Spec.Requirements...).Values()...)
nct.Requirements.Add(scheduling.NewLabelRequirements(nct.Labels).Values()...)

// Add requirements for DaemonSet scheduling calculations
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good! Can we add one test to make sure that this works properly on the two new labels? Maybe something with affinity?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have implemented a test that verifies nodes are launched with properly calculated daemonOverhead and scheduled correctly when NodeAffinity is configured for each respective label.

@@ -80,6 +86,14 @@ func (i *NodeClaimTemplate) ToNodeClaim() *v1.NodeClaim {
return i.Name
})...))

// Filter out DaemonSet scheduling-only requirements for the actual NodeClaim
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: These may not just be DaemonSet scheduling requirements -- I'm not sure why you would do this on things other than DaemonSets, but we should technically support this for any pod that selected against this label

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jonathan-innis
Thanks for your comment.

Apologies for the delay.
We exclude other workloads like Deployments due to the following scenarios:

  • When node affinity for these labels are applied to Deployments, current validation rejects them during NodeClaim creation, preventing NodeClaim creation entirely
  • Simply removing this validation leads to unintended behavior:
    • Deployments create Pending Pods, triggering new NodeClaim creation
    • Pods remain unscheduled until Nodes become registered/initialized, causing redundant NodeClaim creation

Addressing this would require logic to prevent excessive NodeClaim creation when Pending Pods exist with registered/initialized labels.
This represents a not small change beyond this Issue's scope, so we excluded it.
(Please let me know if we should create a separate Issue for investigation, or if any comments need modification.)

@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Jun 16, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Karpenter underprovisions CPU requests for DS with certain node affinity
4 participants