Skip to content

Upgrade to calico 3.29 and use windows support #5523

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 16 commits into
base: main
Choose a base branch
from

Conversation

jsturtevant
Copy link
Contributor

@jsturtevant jsturtevant commented Mar 27, 2025

What type of PR is this?

/kind cleanup
What this PR does / why we need it:
Upgrades Calico to 3.29 and updates windows to use the upstream Calico HPC support .

I tried to remove as much of the calico manifests but the Private Cluster test still uses those so they are still needed. We still need a bunch of the CSRs for windows since the calico still needs some additional support to fully remove it (tigera/operator#3113).

I did chat briefly with @Jont828 about expanding the helmProxy to take an additional field that would be Pre/Post yaml to deploy so we could potentially remove some of the extra yaml. But that is low priority and could be a follow up.

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes #5498

Special notes for your reviewer:

TODOs:

  • squashed commits
  • includes documentation
  • adds unit tests
  • cherry-pick candidate

Release note:

Update Calico to 3.29

@k8s-ci-robot k8s-ci-robot added kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Mar 27, 2025
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign fabriziopandini for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. label Mar 27, 2025
@jsturtevant jsturtevant mentioned this pull request Mar 27, 2025
4 tasks
Copy link

codecov bot commented Mar 27, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 53.26%. Comparing base (09830c0) to head (87199fa).

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #5523      +/-   ##
==========================================
- Coverage   53.26%   53.26%   -0.01%     
==========================================
  Files         272      272              
  Lines       29529    29529              
==========================================
- Hits        15730    15728       -2     
- Misses      12984    12986       +2     
  Partials      815      815              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. and removed do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. labels Mar 27, 2025
@jsturtevant
Copy link
Contributor Author

leaving my self some notes after investigating the failures:

  • When kubernetes-services-endpoint is included the operator fails to install things, which can be solved with dns changes: If "KUBERNETES_SERVICE_HOST" is domain, initial installation is not possible. (tigera-operator for windows) projectcalico/calico#9536
  • the service cidrs are not being set properly in the helm template:
    • values: "installation:\n cni:\n type: Calico\n calicoNetwork:\n bgp: Disabled\n \ windowsDataplane: HNS\n mtu: 1350\n ipPools:\n ipPools:\n - cidr: 192.168.0.0/16\n encapsulation: VXLAN\n serviceCIDRs: \n registry: mcr.microsoft.com/oss\n# Image and registry configuration for the tigera/operator pod.\ntigeraOperator:\n \ image: tigera/operator\n registry: mcr.microsoft.com/oss\ncalicoctl:\n image: mcr.microsoft.com/oss/calico/ctl\n"

@jsturtevant
Copy link
Contributor Author

not sure what to do about this one as the chart doesn't let me override dnsConfig settings

@jsturtevant jsturtevant force-pushed the upgrade-calico-3.29 branch from 5fbed9d to 78297a9 Compare March 31, 2025 22:04
@jsturtevant
Copy link
Contributor Author

with the latest changes the cluster is coming online:

Cluster capz-j9dp7i created and fully operational
NAME                              STATUS   ROLES           AGE     VERSION    INTERNAL-IP   EXTERNAL-IP   OS-IMAGE                         KERNEL-VERSION     CONTAINER-RUNTIME
capz-j9dp-55m7p                   Ready    <none>          4m7s    v1.29.10   10.1.0.4      <none>        Windows Server 2022 Datacenter   10.0.20348.2762    containerd://1.7.20
capz-j9dp-dljnd                   Ready    <none>          4m5s    v1.29.10   10.1.0.6      <none>        Windows Server 2022 Datacenter   10.0.20348.2762    containerd://1.7.20
capz-j9dp7i-control-plane-hgrj5   Ready    control-plane   6m40s   v1.29.10   10.0.0.4      <none>        Ubuntu 24.04.1 LTS               6.8.0-1016-azure   containerd://1.7.20
capz-j9dp7i-md-0-pb8j6-xnkct      Ready    <none>          5m1s    v1.29.10   10.1.0.5      <none>        Ubuntu 24.04.1 LTS               6.8.0-1016-azure   containerd://1.7.20
capz-j9dp7i-md-0-pb8j6-zc2cm      Ready    <none>          4m54s   v1.29.10   10.1.0.7      <none>        Ubuntu 24.04.1 LTS               6.8.0-1016-azure   containerd://1.7.20

@jsturtevant
Copy link
Contributor Author

not sure what to do about this one as the chart doesn't let me override dnsConfig settings

Opened projectcalico/calico#10117

@jsturtevant
Copy link
Contributor Author

The workload upgrade test is failing with cloud manager crashing:

E0401 00:45:09.324689       1 runtime.go:79] Observed a panic: "invalid memory address or nil pointer dereference" (runtime error: invalid memory address or nil pointer dereference)
goroutine 321 [running]:
k8s.io/apimachinery/pkg/util/runtime.logPanic({0x20b7500, 0x3b02f10})
	/go/src/sigs.k8s.io/cloud-provider-azure/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:75 +0x85
k8s.io/apimachinery/pkg/util/runtime.HandleCrash({0x0, 0x0, 0xc000783a40?})
	/go/src/sigs.k8s.io/cloud-provider-azure/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:49 +0x6b
panic({0x20b7500?, 0x3b02f10?})
	/usr/local/go/src/runtime/panic.go:785 +0x132
sigs.k8s.io/cloud-provider-azure/pkg/provider.(*Cloud).getRouteTable(0x485d95?, 0x483d17?)
	/go/src/sigs.k8s.io/cloud-provider-azure/pkg/provider/azure_vmsets_repo.go:183 +0x4f

Is this a known issue? I don't see anything that makes it look like this is related to changes here.

@jsturtevant
Copy link
Contributor Author

jsturtevant commented Apr 1, 2025

This is now mostly passing, It requires:

  • a release of the upstream Calico helm chart but the change as been checked in.
  • Images for the Windows Calico in the same location as the linux ones (working internally to solve this issue)

@nawazkh
Copy link
Member

nawazkh commented Apr 2, 2025

Is this a known issue? I don't see anything that makes it look like this is related to changes here.

Super strange that it is flaky with a panic..

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Apr 8, 2025
@jsturtevant
Copy link
Contributor Author

All the major challenges have been addresses. Just waiting on a Calico release that contains the updated helm chart

@jsturtevant jsturtevant force-pushed the upgrade-calico-3.29 branch from 4285b61 to 1f6f8b4 Compare April 30, 2025 20:36
@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Apr 30, 2025
@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label May 3, 2025
@nawazkh
Copy link
Member

nawazkh commented May 8, 2025

PR needs rebase.

We probably need a rebase here ?

@jsturtevant
Copy link
Contributor Author

Still waiting on a helm chart release. I saw they released the next calico version but don't see the updates in the helm chart so I will ping them to find out if that will be released too

@jsturtevant
Copy link
Contributor Author

Still waiting on a helm chart release. I saw they released the next calico version but don't see the updates in the helm chart so I will ping them to find out if that will be released too

It wasn't scheduled till Calico 3.31 but I requested some backports so should have it in the next patch release! projectcalico/calico#10117 (comment)

@jsturtevant jsturtevant force-pushed the upgrade-calico-3.29 branch from 1f6f8b4 to 06b1720 Compare May 23, 2025 20:28
@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label May 23, 2025
Signed-off-by: James Sturtevant <jsturtevant@gmail.com>
Signed-off-by: James Sturtevant <jsturtevant@gmail.com>
Signed-off-by: James Sturtevant <jsturtevant@gmail.com>
Signed-off-by: James Sturtevant <jsturtevant@gmail.com>
Signed-off-by: James Sturtevant <jsturtevant@gmail.com>
Signed-off-by: James Sturtevant <jsturtevant@gmail.com>
Signed-off-by: James Sturtevant <jsturtevant@gmail.com>
Signed-off-by: James Sturtevant <jsturtevant@gmail.com>
Signed-off-by: James Sturtevant <jsturtevant@gmail.com>
Signed-off-by: James Sturtevant <jsturtevant@gmail.com>
Signed-off-by: James Sturtevant <jsturtevant@gmail.com>
Signed-off-by: James Sturtevant <jsturtevant@gmail.com>
Signed-off-by: James Sturtevant <jsturtevant@gmail.com>
Signed-off-by: James Sturtevant <jsturtevant@gmail.com>
Signed-off-by: James Sturtevant <jsturtevant@gmail.com>
Signed-off-by: James Sturtevant <jsturtevant@gmail.com>
@jsturtevant jsturtevant force-pushed the upgrade-calico-3.29 branch from 47b9852 to 87199fa Compare May 23, 2025 22:37
Copy link
Contributor

@willie-yao willie-yao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks mostly ready to me! Just some minor comments and questions from my end

@@ -8,5 +8,5 @@ spec:
spec:
containers:
# Change the value of image field below to your controller image URL
- image: gcr.io/k8s-staging-cluster-api-azure/cluster-api-azure-controller:main
- image: jjsacrpub.azurecr.io/cluster-api-azure/cluster-api-azure-controller-amd64:dev
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was this meant to be committed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no, I think is result of running tools locally. Not sure why its not a part of the .gitignore? will remove

@@ -8,4 +8,4 @@ spec:
spec:
containers:
- name: manager
imagePullPolicy: Always
imagePullPolicy: IfNotPresent
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above. I think these get changed when you run make tilt-up

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, I think so... I'll remove it and squash commits too

@@ -1,5 +1,1088 @@
apiVersion: apiextensions.k8s.io/v1
kind: CustomResourceDefinition
metadata:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these diffs just generated templates from the new Calico source? And same with all the deletions in the other folders of addons?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is automated via the make file command where it grabs the latest release then extracts it.

plural: ""
conditions: []
storedVersions: []
---
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: eof line

api server name properly\n# https://github.com/projectcalico/calico/issues/9536\ndnsConfig:
\n nameservers:\n - 127.0.0.53\n options:\n - name: edns0\n - name:
trust-ad\nkubernetesServiceEndpoint:\n host: \"{{ .Cluster.spec.controlPlaneEndpoint.host
}}\"\n port: \"{{ .Cluster.spec.controlPlaneEndpoint.port }}\""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this be formatted like the previous diff so it's a bit more readable rather than using newline characters?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can take a look, I think this is an automatically generated file?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
Status: Todo
Development

Successfully merging this pull request may close these issues.

Update the Calico version for testing
4 participants