Skip to content

Commit a2e183d

Browse files
committed
Update autoscaling from zero enhancement proposal with support for platform-aware autoscale from zero
This commit updates the contract between the cluster-autoscaler Cluster API provider and the infrastructure provider's controllers that reconcile the Infrastructure Machine Template to support platform-aware autoscale from 0 in clusters consisting of nodes heterogeneous in CPU architecture and OS. With this commit, the infrastructure providers implementing controllers to reconcile the status of their Infrastructure Machine Templates for supporting autoscale from 0 will be able to fill the status.nodeInfo stanza with additional information about the nodes. The status.nodeInfo stanza has type corev1.NodeSystemInfo to reflect the same content, the rendered nodes' objects would store in their status field. The cluster-autoscaler can use that information to build the node template labels `kubernetes.io/arch` and `kubernetes.io/os` if that information is present. Suppose the pending pods that trigger the cluster autoscaler have a node selector or a requiredDuringSchedulingIgnoredDuringExecution node affinity concerning the architecture or operating system of the node where they can execute. In that case, the autoscaler will be able to filter the nodes groups options according to the architecture or operating system requested by the pod. The users could already provide this information to the cluster autoscaler through the labels capacity annotation. However, there is no similar capability to support future labels/taints through information set by the reconcilers of the status of Infrastructure Machine Templates. Signed-off-by: aleskandro <aleskandro@redhat.com>
1 parent a21ffb0 commit a2e183d

File tree

2 files changed

+115
-4
lines changed

2 files changed

+115
-4
lines changed

docs/book/src/developer/providers/contracts/infra-machine.md

Lines changed: 42 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -501,7 +501,7 @@ If implementing the pause behavior, providers SHOULD surface the paused status o
501501

502502
### InfraMachineTemplate: support cluster autoscaling from zero
503503

504-
As described in the enhancement [Opt-in Autoscaling from Zero][Opt-in Autoscaling from Zero], providers may implement a capacity field in machine templates to inform the cluster autoscaler about the resources available on that machine type.
504+
As described in the enhancement [Opt-in Autoscaling from Zero][Opt-in Autoscaling from Zero], providers may implement the `capacity` and `nodeInfo` fields in machine templates to inform the cluster autoscaler about the resources available on that machine type, the architecture, and the operating system it runs.
505505

506506
Building on the `FooMachineTemplate` example from above, this shows the addition of a status and capacity field:
507507

@@ -524,19 +524,59 @@ type FooMachineTemplateStatus struct {
524524
// https://github.com/kubernetes-sigs/cluster-api/blob/main/docs/proposals/20210310-opt-in-autoscaling-from-zero.md
525525
// +optional
526526
Capacity corev1.ResourceList `json:"capacity,omitempty"`
527+
// +optional
528+
NodeInfo NodeInfo `json:"nodeInfo,omitempty,omitzero"`
529+
}
530+
531+
// Architecture represents the CPU architecture of the node.
532+
// Its underlying type is a string and its value can be any of amd64, arm64, s390x, ppc64le.
533+
// +kubebuilder:validation:Enum=amd64;arm64;s390x;ppc64le
534+
// +enum
535+
type Architecture string
536+
537+
// Example architecture constants defined for better readability and maintainability.
538+
const (
539+
ArchitectureAmd64 Architecture = "amd64"
540+
ArchitectureArm64 Architecture = "arm64"
541+
ArchitectureS390x Architecture = "s390x"
542+
ArchitecturePpc64le Architecture = "ppc64le"
543+
)
544+
545+
// NodeInfo contains information about the node's architecture and operating system.
546+
// +kubebuilder:validation:MinProperties=1
547+
type NodeInfo struct {
548+
// architecture is the CPU architecture of the node.
549+
// Its underlying type is a string and its value can be any of amd64, arm64, s390x, ppc64le.
550+
// +optional
551+
Architecture Architecture `json:"architecture,omitempty"`
552+
// operatingSystem is a string representing the operating system of the node.
553+
// This may be a string like 'linux' or 'windows'.
554+
// +optional
555+
OperatingSystem string `json:"operatingSystem,omitempty"`
527556
}
557+
528558
```
529559

530-
When rendered to a manifest, the machine template status capacity field representing an instance with 500 megabytes of RAM, 1 CPU core, and 1 NVidia GPU would look like this:
560+
When rendered to a manifest, the machine template status capacity field representing an amd64 linux instance with 500 megabytes of RAM, 1 CPU core, and 1 NVidia GPU should look like this:
531561

532562
```
533563
status:
534564
capacity:
535565
memory: 500mb
536566
cpu: "1"
537567
nvidia.com/gpu: "1"
568+
nodeInfo:
569+
architecture: amd64
570+
operatingSystem: linux
538571
```
539572

573+
If the information in the `nodeInfo` field is not available, the result of the autoscaling from zero operation will depend
574+
on the cluster autoscaler implementation. For example, the Cluster API implementation of the Kubernetes Cluster Autoscaler
575+
will assume the host is running either the architecture set in the `CAPI_SCALE_ZERO_DEFAULT_ARCH` environment variable of
576+
the cluster autoscaler pod environment, or the amd64 architecture and Linux operating system as default values.
577+
578+
See [autoscaling](../../../tasks/automated-machine-management/autoscaling.md).
579+
540580
## Typical InfraMachine reconciliation workflow
541581

542582
A machine infrastructure provider must respond to changes to its InfraMachine resources. This process is

docs/proposals/20210310-opt-in-autoscaling-from-zero.md

Lines changed: 73 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -107,8 +107,8 @@ node group. But, during a scale from zero situation (ie when a node group has ze
107107
autoscaler needs to acquire this information from the infrastructure provider.
108108

109109
An optional status field is proposed on the Infrastructure Machine Template which will be populated
110-
by infrastructure providers to contain the CPU, memory, and GPU capacities for machines described by that
111-
template. The cluster autoscaler will then utilize this information by reading the appropriate
110+
by infrastructure providers to contain the CPU, CPU architecture, memory, and GPU capacities for machines
111+
described by that template. The cluster autoscaler will then utilize this information by reading the appropriate
112112
infrastructure reference from the resource it is scaling (MachineSet or MachineDeployment).
113113

114114
A user may override the field in the associated infrastructure template by applying annotations to the
@@ -160,6 +160,13 @@ the template. Internally, this field will be represented by a Go `map` type uti
160160
for the keys and `k8s.io/apimachinery/pkg/api/resource.Quantity` as the values (similar to how resource
161161
limits and requests are handled for pods).
162162

163+
Additionally, the status field should contain information about the node, such as the architecture and
164+
operating system. This information is not required for the autoscaler to function, but it can be useful in
165+
scenarios where the autoscaler needs to make decisions for clusters with heterogeneous node groups in architecture, OS, or both.
166+
167+
This information must be represented as a field with name `nodeInfo`, a struct with two optional subfields,
168+
`architecture` and `operatingSystem`. Allowed values for architecture are `amd64`, `arm64`, `s390x`, `ppc64le`.
169+
163170
It is worth mentioning that the Infrastructure Machine Templates are not usually reconciled by themselves.
164171
Each infrastructure provider will be responsible for determining the best implementation for adding the
165172
status field based on the information available on their platform.
@@ -175,6 +182,9 @@ const (
175182
// DockerMachineTemplateStatus defines the observed state of a DockerMachineTemplate
176183
type DockerMachineTemplateStatus struct {
177184
Capacity corev1.ResourceList `json:"capacity,omitempty"`
185+
186+
// +optional
187+
NodeInfo NodeInfo `json:"nodeInfo,omitempty,omitzero"`
178188
}
179189
180190
// DockerMachineTemplate is the Schema for the dockermachinetemplates API.
@@ -188,6 +198,39 @@ type DockerMachineTemplate struct {
188198
```
189199
_Note: the `ResourceList` and `ResourceName` referenced are from k8s.io/api/core/v1`_
190200

201+
`NodeInfo` is a struct that contains the architecture and operating system information of the node, to implement
202+
in the providers integration code.
203+
Its definition should look like the following:
204+
205+
```go
206+
// Architecture represents the CPU architecture of the node.
207+
// Its underlying type is a string and its value can be any of amd64, arm64, s390x, ppc64le.
208+
// +kubebuilder:validation:Enum=amd64;arm64;s390x;ppc64le
209+
// +enum
210+
type Architecture string
211+
212+
// Example architecture constants defined for better readability and maintainability.
213+
const (
214+
ArchitectureAmd64 Architecture = "amd64"
215+
ArchitectureArm64 Architecture = "arm64"
216+
ArchitectureS390x Architecture = "s390x"
217+
ArchitecturePpc64le Architecture = "ppc64le"
218+
)
219+
220+
// NodeInfo contains information about the node's architecture and operating system.
221+
// +kubebuilder:validation:MinProperties=1
222+
type NodeInfo struct {
223+
// architecture is the CPU architecture of the node.
224+
// Its underlying type is a string and its value can be any of amd64, arm64, s390x, ppc64le.
225+
// +optional
226+
Architecture Architecture `json:"architecture,omitempty"`
227+
// operatingSystem is a string representing the operating system of the node.
228+
// This may be a string like 'linux' or 'windows'.
229+
// +optional
230+
OperatingSystem string `json:"operatingSystem,omitempty"`
231+
}
232+
```
233+
191234
When used as a manifest, it would look like this:
192235

193236
```
@@ -204,8 +247,13 @@ status:
204247
memory: 500mb
205248
cpu: "1"
206249
nvidia.com/gpu: "1"
250+
nodeInfo:
251+
architecture: arm64
252+
operatingSystem: linux
207253
```
208254

255+
The information stored in the `status.nodeInfo` field will be used by the cluster autoscaler's scheduler simulator to determine the simulated node's labels `kubernetes.io/arch` and `kubernetes.io/os`. This logic will be implemented in the cluster autoscaler's ClusterAPI cloud provider code.
256+
209257
#### MachineSet and MachineDeployment Annotations
210258

211259
In cases where a user needs to provide specific resource information for a
@@ -246,6 +294,28 @@ metadata:
246294
capacity.cluster-autoscaler.kubernetes.io/taints: "key1=value1:NoSchedule,key2=value2:NoExecute"
247295
```
248296

297+
If the `capacity.cluster-autoscaler.kubernetes.io/labels` annotation specifies a label that would otherwise be
298+
generated from the fields in the `status` field of the Machine Template, the autoscaler will prioritize and use
299+
the label defined in the annotation. This means any label set by the annotation will override the corresponding
300+
value provided by the infrastructure provider in the Machine Template status.
301+
302+
For example, assume the following objects
303+
304+
```yaml
305+
kind: MachineDeployment
306+
metadata:
307+
annotations:
308+
capacity.cluster-autoscaler.kubernetes.io/labels: kubernetes.io/arch=amd64
309+
---
310+
kind: ExampleMachineTemplate
311+
status:
312+
nodeInfo:
313+
architecture: arm64
314+
```
315+
316+
The cluster autoscaler will prefer the annotation on the MachineDeployment and will predict nodes that have a
317+
`kubernetes.io/arch: amd64` label on them.
318+
249319
### Security Model
250320

251321
This feature will require the service account associated with the cluster autoscaler to have
@@ -318,6 +388,7 @@ office hours meeting:
318388

319389
## Implementation History
320390

391+
- [X] 05/08/2025: Updated proposal to enable architecture- and OS- aware auto-scale from 0
321392
- [X] 09/12/2024: Added section on Implementation Status
322393
- [X] 01/31/2023: Updated proposal to include annotation changes
323394
- [X] 06/10/2021: Proposed idea in an issue or [community meeting]

0 commit comments

Comments
 (0)