Skip to content

Node scale up triggers deployment changes before Nodes get ready #255

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
kokes opened this issue Mar 24, 2025 · 2 comments
Open

Node scale up triggers deployment changes before Nodes get ready #255

kokes opened this issue Mar 24, 2025 · 2 comments

Comments

@kokes
Copy link
Contributor

kokes commented Mar 24, 2025

When running the CPA, we've started facing an issue whereas we add a second Node to a cluster, which triggers a condition within our ladder config (say two nodes require two coredns pods), but this new node is not Ready (according to node.Status.Conditions) and thus the pod always gets scheduled to the first node. So even though we use topology spread constraints to prefer pods to be placed on distinct nodes (we can't use strict anti affinity rules), we're always left with two pods on the same node.

Sifting through the code, we found that even though includeUnschedulableNodes is in effect, it only considers spec.Unschedulable, not Taints or Conditions. When bootstrapping nodes via kubeadm, it's pretty difficult to setup a node with spec: Unschedulable: true without a race condition (the racy way being kubadm join && kubectl cordon; there's also a long deprecated option to start nodes as unschedulable).

However, it would seem logical that includeUnschedulableNodes covers unready nodes, too. For that reason I've implemented a check for Conditions to include Ready=true when filtering for schedulable nodes.


I've drafted a way to resolve this in #254. Let me know if this is something that could be included in the CPA (perhaps behind a flag?).

@kokes
Copy link
Contributor Author

kokes commented Mar 25, 2025

PR merged, thanks. @MrHohn - are you planning a release at some point soon? I see the last one is from November and a new release could contain this change + multi-target support from d499f0d.

Thanks!

@MrHohn
Copy link
Member

MrHohn commented Mar 25, 2025

@kokes Yes - I have cut https://github.com/kubernetes-sigs/cluster-proportional-autoscaler/releases/tag/v1.10.1 and raised kubernetes/k8s.io#7907 to promote the staging images to the production repo.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants