Skip to content

Add custom-builds-load-dra template #5650

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
May 27, 2025

Conversation

nojnhuh
Copy link
Contributor

@nojnhuh nojnhuh commented May 23, 2025

What type of PR is this?
/kind feature

What this PR does / why we need it:

This PR adds a new test cluster template for custom-builds + load + DRA in order to A/B compare with and without DRA in scale tests.

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes #

Special notes for your reviewer:

This shows the changes we've made to an existing template already to enable DRA:

Diff between custom-builds-machine-pool -> custom-builds-dra
git diff --no-index templates/test/dev/cluster-template-custom-builds-machine-pool.yaml templates/test/dev/cluster-template-custom-builds-dra.yaml
diff --git a/templates/test/dev/cluster-template-custom-builds-machine-pool.yaml b/templates/test/dev/cluster-template-custom-builds-dra.yaml
index d498dfe60..3fbfe639e 100644
--- a/templates/test/dev/cluster-template-custom-builds-machine-pool.yaml
+++ b/templates/test/dev/cluster-template-custom-builds-dra.yaml
@@ -61,13 +61,16 @@ spec:
   kubeadmConfigSpec:
     clusterConfiguration:
       apiServer:
-        extraArgs: {}
+        extraArgs:
+          feature-gates: ${K8S_FEATURE_GATES:-"DynamicResourceAllocation=true"}
+          runtime-config: resource.k8s.io/v1beta1=true
         timeoutForControlPlane: 20m
       controllerManager:
         extraArgs:
           allocate-node-cidrs: "false"
           cloud-provider: external
           cluster-name: ${CLUSTER_NAME}
+          feature-gates: DynamicResourceAllocation=true
           v: "4"
       etcd:
         local:
@@ -75,6 +78,9 @@ spec:
           extraArgs:
             quota-backend-bytes: "8589934592"
       kubernetesVersion: ci/${CI_VERSION}
+      scheduler:
+        extraArgs:
+          feature-gates: DynamicResourceAllocation=true
     diskSetup:
       filesystems:
       - device: /dev/disk/azure/scsi1/lun0
@@ -192,10 +198,20 @@ spec:
       owner: root:root
       path: /tmp/oot-cred-provider.sh
       permissions: "0744"
+    - content: |
+        #!/bin/bash
+
+        echo "enabling containerd CDI plugin"
+        sed -i '/\[plugins."io.containerd.grpc.v1.cri"\]/a\    enable_cdi = true' /etc/containerd/config.toml
+        systemctl restart containerd
+      owner: root:root
+      path: /tmp/containerd-config.sh
+      permissions: "0744"
     initConfiguration:
       nodeRegistration:
         kubeletExtraArgs:
           cloud-provider: external
+          feature-gates: DynamicResourceAllocation=true
           image-credential-provider-bin-dir: /var/lib/kubelet/credential-provider
           image-credential-provider-config: /var/lib/kubelet/credential-provider-config.yaml
         name: '{{ ds.meta_data["local_hostname"] }}'
@@ -203,6 +219,7 @@ spec:
       nodeRegistration:
         kubeletExtraArgs:
           cloud-provider: external
+          feature-gates: DynamicResourceAllocation=true
           image-credential-provider-bin-dir: /var/lib/kubelet/credential-provider
           image-credential-provider-config: /var/lib/kubelet/credential-provider-config.yaml
         name: '{{ ds.meta_data["local_hostname"] }}'
@@ -212,6 +229,7 @@ spec:
     postKubeadmCommands:
     - bash -c /tmp/replace-k8s-components.sh
     preKubeadmCommands:
+    - bash -c /tmp/containerd-config.sh
     - bash -c /tmp/replace-k8s-binaries.sh
     - bash -c /tmp/oot-cred-provider.sh
     verbosity: 5
@@ -372,14 +390,25 @@ spec:
     owner: root:root
     path: /etc/kubernetes/azure.json
     permissions: "0644"
+  - content: |
+      #!/bin/bash
+
+      echo "enabling containerd CDI plugin"
+      sed -i '/\[plugins."io.containerd.grpc.v1.cri"\]/a\    enable_cdi = true' /etc/containerd/config.toml
+      systemctl restart containerd
+    owner: root:root
+    path: /tmp/containerd-config.sh
+    permissions: "0744"
   joinConfiguration:
     nodeRegistration:
       kubeletExtraArgs:
         cloud-provider: external
+        feature-gates: ${NODE_FEATURE_GATES:-"DynamicResourceAllocation=true"}
         image-credential-provider-bin-dir: /var/lib/kubelet/credential-provider
         image-credential-provider-config: /var/lib/kubelet/credential-provider-config.yaml
       name: '{{ ds.meta_data["local_hostname"] }}'
   preKubeadmCommands:
+  - bash -c /tmp/containerd-config.sh
   - bash -c /tmp/oot-cred-provider.sh
   - bash -c /tmp/replace-k8s-binaries.sh
 ---
@@ -509,11 +538,21 @@ spec:
       }
     path: C:/oot-cred-provider.ps1
     permissions: "0744"
+  - content: |
+      #!/bin/bash
+
+      echo "enabling containerd CDI plugin"
+      sed -i '/\[plugins."io.containerd.grpc.v1.cri"\]/a\    enable_cdi = true' /etc/containerd/config.toml
+      systemctl restart containerd
+    owner: root:root
+    path: /tmp/containerd-config.sh
+    permissions: "0744"
   joinConfiguration:
     nodeRegistration:
       criSocket: npipe:////./pipe/containerd-containerd
       kubeletExtraArgs:
         cloud-provider: external
+        feature-gates: ${NODE_FEATURE_GATES:-"DynamicResourceAllocation=true"}
         image-credential-provider-bin-dir: /var/lib/kubelet/credential-provider
         image-credential-provider-config: /var/lib/kubelet/credential-provider-config.yaml
         pod-infra-container-image: mcr.microsoft.com/oss/kubernetes/pause:3.9
@@ -522,6 +561,7 @@ spec:
   - nssm set kubelet start SERVICE_AUTO_START
   - powershell C:/defender-exclude-calico.ps1
   preKubeadmCommands:
+  - bash -c /tmp/containerd-config.sh
   - powershell c:/create-external-network.ps1
   - powershell C:/replace-pr-binaries.ps1
   - powershell C:/oot-cred-provider.ps1

This shows the new changes to the custom-builds-load template which should roughly match the above diff:

Diff between custom-builds-load -> custom-builds-load-dra
git diff --no-index templates/test/dev/cluster-template-custom-builds-load.yaml templates/test/dev/cluster-template-custom-builds-load-dra.yaml
diff --git a/templates/test/dev/cluster-template-custom-builds-load.yaml b/templates/test/dev/cluster-template-custom-builds-load-dra.yaml
index c711d0104..a658a5230 100644
--- a/templates/test/dev/cluster-template-custom-builds-load.yaml
+++ b/templates/test/dev/cluster-template-custom-builds-load-dra.yaml
@@ -64,13 +64,15 @@ spec:
     clusterConfiguration:
       apiServer:
         extraArgs:
-          feature-gates: ${K8S_FEATURE_GATES:-""}
+          feature-gates: ${K8S_FEATURE_GATES:-"DynamicResourceAllocation=true"}
+          runtime-config: resource.k8s.io/v1beta1=true
         timeoutForControlPlane: 20m
       controllerManager:
         extraArgs:
           allocate-node-cidrs: "false"
           cloud-provider: external
           cluster-name: ${CLUSTER_NAME}
+          feature-gates: DynamicResourceAllocation=true
           v: "4"
       etcd:
         local:
@@ -82,6 +84,7 @@ spec:
         extraArgs:
           authorization-always-allow-paths: /healthz,/readyz,/livez,/metrics
           bind-address: 0.0.0.0
+          feature-gates: DynamicResourceAllocation=true
     diskSetup:
       filesystems:
       - device: /dev/disk/azure/scsi1/lun0
@@ -188,10 +191,20 @@ spec:
       owner: root:root
       path: /tmp/replace-k8s-components.sh
       permissions: "0744"
+    - content: |
+        #!/bin/bash
+
+        echo "enabling containerd CDI plugin"
+        sed -i '/\[plugins."io.containerd.grpc.v1.cri"\]/a\    enable_cdi = true' /etc/containerd/config.toml
+        systemctl restart containerd
+      owner: root:root
+      path: /tmp/containerd-config.sh
+      permissions: "0744"
     initConfiguration:
       nodeRegistration:
         kubeletExtraArgs:
           cloud-provider: external
+          feature-gates: DynamicResourceAllocation=true
           image-credential-provider-bin-dir: /var/lib/kubelet/credential-provider
           image-credential-provider-config: /var/lib/kubelet/credential-provider-config.yaml
         name: '{{ ds.meta_data["local_hostname"] }}'
@@ -199,6 +212,7 @@ spec:
       nodeRegistration:
         kubeletExtraArgs:
           cloud-provider: external
+          feature-gates: DynamicResourceAllocation=true
           image-credential-provider-bin-dir: /var/lib/kubelet/credential-provider
           image-credential-provider-config: /var/lib/kubelet/credential-provider-config.yaml
         name: '{{ ds.meta_data["local_hostname"] }}'
@@ -208,6 +222,7 @@ spec:
     postKubeadmCommands:
     - bash -c /tmp/replace-k8s-components.sh
     preKubeadmCommands:
+    - bash -c /tmp/containerd-config.sh
     - bash -c /tmp/oot-cred-provider.sh
     - bash -c /tmp/replace-k8s-binaries.sh
     verbosity: 5
@@ -374,14 +389,25 @@ spec:
         owner: root:root
         path: /tmp/replace-k8s-binaries.sh
         permissions: "0744"
+      - content: |
+          #!/bin/bash
+
+          echo "enabling containerd CDI plugin"
+          sed -i '/\[plugins."io.containerd.grpc.v1.cri"\]/a\    enable_cdi = true' /etc/containerd/config.toml
+          systemctl restart containerd
+        owner: root:root
+        path: /tmp/containerd-config.sh
+        permissions: "0744"
       joinConfiguration:
         nodeRegistration:
           kubeletExtraArgs:
             cloud-provider: external
+            feature-gates: ${NODE_FEATURE_GATES:-"DynamicResourceAllocation=true"}
             image-credential-provider-bin-dir: /var/lib/kubelet/credential-provider
             image-credential-provider-config: /var/lib/kubelet/credential-provider-config.yaml
           name: '{{ ds.meta_data["local_hostname"] }}'
       preKubeadmCommands:
+      - bash -c /tmp/containerd-config.sh
       - bash -c /tmp/oot-cred-provider.sh
       - bash -c /tmp/replace-k8s-binaries.sh
 ---
@@ -562,12 +588,21 @@ spec:
           kube-proxy.exe --version
         path: C:/replace-pr-binaries.ps1
         permissions: "0744"
+      - content: |
+          #!/bin/bash
+
+          echo "enabling containerd CDI plugin"
+          sed -i '/\[plugins."io.containerd.grpc.v1.cri"\]/a\    enable_cdi = true' /etc/containerd/config.toml
+          systemctl restart containerd
+        owner: root:root
+        path: /tmp/containerd-config.sh
+        permissions: "0744"
       joinConfiguration:
         nodeRegistration:
           criSocket: npipe:////./pipe/containerd-containerd
           kubeletExtraArgs:
             cloud-provider: external
-            feature-gates: ${NODE_FEATURE_GATES:-""}
+            feature-gates: ${NODE_FEATURE_GATES:-"DynamicResourceAllocation=true"}
             image-credential-provider-bin-dir: /var/lib/kubelet/credential-provider
             image-credential-provider-config: /var/lib/kubelet/credential-provider-config.yaml
             v: "2"
@@ -577,6 +612,7 @@ spec:
       - nssm set kubelet start SERVICE_AUTO_START
       - powershell C:/defender-exclude-calico.ps1
       preKubeadmCommands:
+      - bash -c /tmp/containerd-config.sh
       - powershell C:/create-temp-folder.ps1
       - powershell C:/replace-containerd.ps1
       - powershell C:/collect-hns-crashes.ps1

TODOs:

  • squashed commits
  • includes documentation
  • adds unit tests
  • cherry-pick candidate

Release note:

NONE

@k8s-ci-robot k8s-ci-robot added the release-note-none Denotes a PR that doesn't merit a release note. label May 23, 2025
@k8s-ci-robot k8s-ci-robot added the kind/feature Categorizes issue or PR as related to a new feature. label May 23, 2025
@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label May 23, 2025
@k8s-ci-robot k8s-ci-robot requested review from Jont828 and mboersma May 23, 2025 16:21
@k8s-ci-robot k8s-ci-robot added the size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. label May 23, 2025
@nojnhuh
Copy link
Contributor Author

nojnhuh commented May 23, 2025

/assign @willie-yao @Jont828

Copy link

codecov bot commented May 23, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 53.26%. Comparing base (1a9b38b) to head (cbd65b2).
Report is 9 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #5650   +/-   ##
=======================================
  Coverage   53.26%   53.26%           
=======================================
  Files         272      272           
  Lines       29529    29529           
=======================================
  Hits        15730    15730           
  Misses      12984    12984           
  Partials      815      815           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Contributor

@willie-yao willie-yao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

Are the addition of windows mp templates intentional? Just making sure since it wasn't mentioned in the issue description.

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label May 27, 2025
@k8s-ci-robot
Copy link
Contributor

LGTM label has been added.

Git tree hash: a25919e3a81bbfcd11467bcb2cf9e856319183e8

@nojnhuh
Copy link
Contributor Author

nojnhuh commented May 27, 2025

Are the addition of windows mp templates intentional? Just making sure since it wasn't mentioned in the issue description.

I forgot to mention that, but yes that was intentional. Reason being

  1. to reduce drift between the DRA and non-DRA templates
  2. Windows nodes are opt-in with env vars, otherwise the MachinePool's replica count defaults to zero, so the cluster still won't actually have any Windows nodes

Copy link
Contributor

@willie-yao willie-yao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/approve

I'll go ahead and approve this since it looks like Jonathan is oof and I want to unblock this test. Thanks Jon!

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: willie-yao

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label May 27, 2025
@k8s-ci-robot k8s-ci-robot merged commit 390bf5b into kubernetes-sigs:main May 27, 2025
29 checks passed
@k8s-ci-robot k8s-ci-robot added this to the v1.20 milestone May 27, 2025
@github-project-automation github-project-automation bot moved this from Todo to Done in CAPZ Planning May 27, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/feature Categorizes issue or PR as related to a new feature. lgtm "Looks good to me", indicates that a PR is ready to be merged. release-note-none Denotes a PR that doesn't merit a release note. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

4 participants