Skip to content

📖 Update autoscaling from zero enhancement proposal with support for platform-aware autoscale from zero #11962

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
44 changes: 42 additions & 2 deletions docs/book/src/developer/providers/contracts/infra-machine.md
Original file line number Diff line number Diff line change
Expand Up @@ -501,7 +501,7 @@ If implementing the pause behavior, providers SHOULD surface the paused status o

### InfraMachineTemplate: support cluster autoscaling from zero

As described in the enhancement [Opt-in Autoscaling from Zero][Opt-in Autoscaling from Zero], providers may implement a capacity field in machine templates to inform the cluster autoscaler about the resources available on that machine type.
As described in the enhancement [Opt-in Autoscaling from Zero][Opt-in Autoscaling from Zero], providers may implement the `capacity` and `nodeInfo` fields in machine templates to inform the cluster autoscaler about the resources available on that machine type, the architecture, and the operating system it runs.

Building on the `FooMachineTemplate` example from above, this shows the addition of a status and capacity field:

Expand All @@ -524,19 +524,59 @@ type FooMachineTemplateStatus struct {
// https://github.com/kubernetes-sigs/cluster-api/blob/main/docs/proposals/20210310-opt-in-autoscaling-from-zero.md
// +optional
Capacity corev1.ResourceList `json:"capacity,omitempty"`
// +optional
NodeInfo NodeInfo `json:"nodeInfo,omitempty,omitzero"`
}
// Architecture represents the CPU architecture of the node.
// Its underlying type is a string and its value can be any of amd64, arm64, s390x, ppc64le.
// +kubebuilder:validation:Enum=amd64;arm64;s390x;ppc64le
// +enum
type Architecture string
// Example architecture constants defined for better readability and maintainability.
const (
ArchitectureAmd64 Architecture = "amd64"
ArchitectureArm64 Architecture = "arm64"
ArchitectureS390x Architecture = "s390x"
ArchitecturePpc64le Architecture = "ppc64le"
)
// NodeInfo contains information about the node's architecture and operating system.
// +kubebuilder:validation:MinProperties=1
type NodeInfo struct {
// architecture is the CPU architecture of the node.
// Its underlying type is a string and its value can be any of amd64, arm64, s390x, ppc64le.
// +optional
Architecture Architecture `json:"architecture,omitempty"`
// operatingSystem is a string representing the operating system of the node.
// This may be a string like 'linux' or 'windows'.
// +optional
OperatingSystem string `json:"operatingSystem,omitempty"`
}
```

When rendered to a manifest, the machine template status capacity field representing an instance with 500 megabytes of RAM, 1 CPU core, and 1 NVidia GPU would look like this:
When rendered to a manifest, the machine template status capacity field representing an amd64 linux instance with 500 megabytes of RAM, 1 CPU core, and 1 NVidia GPU should look like this:

```
status:
capacity:
memory: 500mb
cpu: "1"
nvidia.com/gpu: "1"
nodeInfo:
architecture: amd64
operatingSystem: linux
```

If the information in the `nodeInfo` field is not available, the result of the autoscaling from zero operation will depend
on the cluster autoscaler implementation. For example, the Cluster API implementation of the Kubernetes Cluster Autoscaler
will assume the host is running either the architecture set in the `CAPI_SCALE_ZERO_DEFAULT_ARCH` environment variable of
the cluster autoscaler pod environment, or the amd64 architecture and Linux operating system as default values.

See [autoscaling](../../../tasks/automated-machine-management/autoscaling.md).

## Typical InfraMachine reconciliation workflow

A machine infrastructure provider must respond to changes to its InfraMachine resources. This process is
Expand Down
75 changes: 73 additions & 2 deletions docs/proposals/20210310-opt-in-autoscaling-from-zero.md
Original file line number Diff line number Diff line change
Expand Up @@ -107,8 +107,8 @@ node group. But, during a scale from zero situation (ie when a node group has ze
autoscaler needs to acquire this information from the infrastructure provider.

An optional status field is proposed on the Infrastructure Machine Template which will be populated
by infrastructure providers to contain the CPU, memory, and GPU capacities for machines described by that
template. The cluster autoscaler will then utilize this information by reading the appropriate
by infrastructure providers to contain the CPU, CPU architecture, memory, and GPU capacities for machines
described by that template. The cluster autoscaler will then utilize this information by reading the appropriate
infrastructure reference from the resource it is scaling (MachineSet or MachineDeployment).

A user may override the field in the associated infrastructure template by applying annotations to the
Expand Down Expand Up @@ -160,6 +160,13 @@ the template. Internally, this field will be represented by a Go `map` type uti
for the keys and `k8s.io/apimachinery/pkg/api/resource.Quantity` as the values (similar to how resource
limits and requests are handled for pods).

Additionally, the status field should contain information about the node, such as the architecture and
operating system. This information is not required for the autoscaler to function, but it can be useful in
scenarios where the autoscaler needs to make decisions for clusters with heterogeneous node groups in architecture, OS, or both.

This information must be represented as a field with name `nodeInfo`, a struct with two optional subfields,
`architecture` and `operatingSystem`. Allowed values for architecture are `amd64`, `arm64`, `s390x`, `ppc64le`.

It is worth mentioning that the Infrastructure Machine Templates are not usually reconciled by themselves.
Each infrastructure provider will be responsible for determining the best implementation for adding the
status field based on the information available on their platform.
Expand All @@ -175,6 +182,9 @@ const (
// DockerMachineTemplateStatus defines the observed state of a DockerMachineTemplate
type DockerMachineTemplateStatus struct {
Capacity corev1.ResourceList `json:"capacity,omitempty"`
// +optional
NodeInfo NodeInfo `json:"nodeInfo,omitempty,omitzero"`
}
// DockerMachineTemplate is the Schema for the dockermachinetemplates API.
Expand All @@ -188,6 +198,39 @@ type DockerMachineTemplate struct {
```
_Note: the `ResourceList` and `ResourceName` referenced are from k8s.io/api/core/v1`_

`NodeInfo` is a struct that contains the architecture and operating system information of the node, to implement
in the providers integration code.
Its definition should look like the following:

```go
// Architecture represents the CPU architecture of the node.
// Its underlying type is a string and its value can be any of amd64, arm64, s390x, ppc64le.
// +kubebuilder:validation:Enum=amd64;arm64;s390x;ppc64le
// +enum
type Architecture string

// Example architecture constants defined for better readability and maintainability.
const (
ArchitectureAmd64 Architecture = "amd64"
ArchitectureArm64 Architecture = "arm64"
ArchitectureS390x Architecture = "s390x"
ArchitecturePpc64le Architecture = "ppc64le"
)

// NodeInfo contains information about the node's architecture and operating system.
// +kubebuilder:validation:MinProperties=1
type NodeInfo struct {
// architecture is the CPU architecture of the node.
// Its underlying type is a string and its value can be any of amd64, arm64, s390x, ppc64le.
// +optional
Architecture Architecture `json:"architecture,omitempty"`
// operatingSystem is a string representing the operating system of the node.
// This may be a string like 'linux' or 'windows'.
// +optional
OperatingSystem string `json:"operatingSystem,omitempty"`
}
```

When used as a manifest, it would look like this:

```
Expand All @@ -204,8 +247,13 @@ status:
memory: 500mb
cpu: "1"
nvidia.com/gpu: "1"
nodeInfo:
architecture: arm64
operatingSystem: linux
```

The information stored in the `status.nodeInfo` field will be used by the cluster autoscaler's scheduler simulator to determine the simulated node's labels `kubernetes.io/arch` and `kubernetes.io/os`. This logic will be implemented in the cluster autoscaler's ClusterAPI cloud provider code.

#### MachineSet and MachineDeployment Annotations

In cases where a user needs to provide specific resource information for a
Expand Down Expand Up @@ -246,6 +294,28 @@ metadata:
capacity.cluster-autoscaler.kubernetes.io/taints: "key1=value1:NoSchedule,key2=value2:NoExecute"
```

If the `capacity.cluster-autoscaler.kubernetes.io/labels` annotation specifies a label that would otherwise be
generated from the fields in the `status` field of the Machine Template, the autoscaler will prioritize and use
the label defined in the annotation. This means any label set by the annotation will override the corresponding
value provided by the infrastructure provider in the Machine Template status.

For example, assume the following objects

```yaml
kind: MachineDeployment
metadata:
annotations:
capacity.cluster-autoscaler.kubernetes.io/labels: kubernetes.io/arch=amd64
---
kind: ExampleMachineTemplate
status:
nodeInfo:
architecture: arm64
```
The cluster autoscaler will prefer the annotation on the MachineDeployment and will predict nodes that have a
`kubernetes.io/arch: amd64` label on them.

### Security Model

This feature will require the service account associated with the cluster autoscaler to have
Expand Down Expand Up @@ -318,6 +388,7 @@ office hours meeting:

## Implementation History

- [X] 05/08/2025: Updated proposal to enable architecture- and OS- aware auto-scale from 0
- [X] 09/12/2024: Added section on Implementation Status
- [X] 01/31/2023: Updated proposal to include annotation changes
- [X] 06/10/2021: Proposed idea in an issue or [community meeting]
Expand Down