Node scale up triggers deployment changes before Nodes get ready #255

kokes · 2025-03-24T14:46:34Z

When running the CPA, we've started facing an issue whereas we add a second Node to a cluster, which triggers a condition within our ladder config (say two nodes require two coredns pods), but this new node is not Ready (according to node.Status.Conditions) and thus the pod always gets scheduled to the first node. So even though we use topology spread constraints to prefer pods to be placed on distinct nodes (we can't use strict anti affinity rules), we're always left with two pods on the same node.

Sifting through the code, we found that even though includeUnschedulableNodes is in effect, it only considers spec.Unschedulable, not Taints or Conditions. When bootstrapping nodes via kubeadm, it's pretty difficult to setup a node with spec: Unschedulable: true without a race condition (the racy way being kubadm join && kubectl cordon; there's also a long deprecated option to start nodes as unschedulable).

However, it would seem logical that includeUnschedulableNodes covers unready nodes, too. For that reason I've implemented a check for Conditions to include Ready=true when filtering for schedulable nodes.

I've drafted a way to resolve this in #254. Let me know if this is something that could be included in the CPA (perhaps behind a flag?).

The text was updated successfully, but these errors were encountered:

kokes · 2025-03-25T09:15:07Z

PR merged, thanks. @MrHohn - are you planning a release at some point soon? I see the last one is from November and a new release could contain this change + multi-target support from d499f0d.

Thanks!

MrHohn · 2025-03-25T21:40:59Z

@kokes Yes - I have cut https://github.com/kubernetes-sigs/cluster-proportional-autoscaler/releases/tag/v1.10.1 and raised kubernetes/k8s.io#7907 to promote the staging images to the production repo.

kokes mentioned this issue Mar 24, 2025

Only consider Ready nodes when filtering for schedulable nodes #254

Merged

MrHohn mentioned this issue Mar 25, 2025

Promote cluster-proportional-autoscaler:v1.10.1 images. kubernetes/k8s.io#7907

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Node scale up triggers deployment changes before Nodes get ready #255

Node scale up triggers deployment changes before Nodes get ready #255

kokes commented Mar 24, 2025

kokes commented Mar 25, 2025

MrHohn commented Mar 25, 2025

Node scale up triggers deployment changes before Nodes get ready #255

Node scale up triggers deployment changes before Nodes get ready #255

Comments

kokes commented Mar 24, 2025

kokes commented Mar 25, 2025

MrHohn commented Mar 25, 2025