Merge pull request #74917 from jeana-redhat/OSDOCS-9800-CAS-expanders

jeana-redhat · web-flow · commit 9d963e28dec4 · 2024-04-25T09:33:27.000-04:00
OSDOCS-9800: CAS expanders
diff --git a/machine_management/applying-autoscaling.adoc b/machine_management/applying-autoscaling.adoc
@@ -16,7 +16,7 @@ You can configure the cluster autoscaler only in clusters where the Machine API
 include::modules/cluster-autoscaler-about.adoc[leveloffset=+1]
 
 [id="configuring-clusterautoscaler_{context}"]
-== Configuring the cluster autoscaler
+=== Configuring the cluster autoscaler
 
 First, deploy the cluster autoscaler to manage automatic resource scaling in your {product-title} cluster.
 
@@ -25,7 +25,9 @@ First, deploy the cluster autoscaler to manage automatic resource scaling in you
 Because the cluster autoscaler is scoped to the entire cluster, you can make only one cluster autoscaler for the cluster.
 ====
 
-include::modules/cluster-autoscaler-cr.adoc[leveloffset=+2]
+include::modules/cluster-autoscaler-cr.adoc[leveloffset=+3]
+
+include::modules/cluster-autoscaler-config-priority-expander.adoc[leveloffset=+3]
 
 :FeatureName: cluster autoscaler
 :FeatureResourceName: ClusterAutoscaler
@@ -36,7 +38,7 @@ include::modules/deploying-resource.adoc[leveloffset=+2]
 include::modules/machine-autoscaler-about.adoc[leveloffset=+1]
 
 [id="configuring-machineautoscaler_{context}"]
-== Configuring machine autoscalers
+=== Configuring machine autoscalers
 
 After you deploy the cluster autoscaler, deploy `MachineAutoscaler` resources that reference the compute machine sets that are used to scale the cluster.
 
@@ -50,7 +52,7 @@ You must deploy at least one `MachineAutoscaler` resource after you deploy the `
 You must configure separate resources for each compute machine set. Remember that compute machine sets are different in each region, so consider whether you want to enable machine scaling in multiple regions. The compute machine set that you scale must have at least one machine in it.
 ====
 
-include::modules/machine-autoscaler-cr.adoc[leveloffset=+2]
+include::modules/machine-autoscaler-cr.adoc[leveloffset=+3]
 
 :FeatureName: machine autoscaler
 :FeatureResourceName: MachineAutoscaler
diff --git a/modules/cluster-autoscaler-config-priority-expander.adoc b/modules/cluster-autoscaler-config-priority-expander.adoc
@@ -0,0 +1,95 @@
+// Module included in the following assemblies:
+//
+// * machine_management/applying-autoscaling.adoc
+
+:_mod-docs-content-type: PROCEDURE
+[id="cluster-autoscaler-config-priority-expander_{context}"]
+= Configuring a priority expander for the cluster autoscaler
+
+When the cluster autoscaler uses the priority expander, it scales up by using the machine set with the highest user-assigned priority.
+To use this expander, you must create a config map that defines the priority of your machine sets.
+
+For each specified priority level, you must create regular expressions to identify machine sets that you want to use when prioritizing a machine set for selection.
+The regular expressions must match the name of any compute machine set that you want the cluster autoscaler to consider for selection.
+
+.Prerequisites
+
+* You have deployed an {product-title} cluster that uses the Machine API.
+* You have access to the cluster using an account with `cluster-admin` permissions.
+* You have installed the {oc-first}.
+
+.Procedure
+
+. List the compute machine sets on your cluster by running the following command:
++
+[source,terminal]
+----
+$ oc get machinesets.machine.openshift.io
+----
++
+.Example output
+[source,terminal]
+----
+NAME                                        DESIRED   CURRENT   READY   AVAILABLE   AGE
+archive-agl030519-vplxk-worker-us-east-1c   1         1         1       1           25m
+fast-01-agl030519-vplxk-worker-us-east-1a   1         1         1       1           55m
+fast-02-agl030519-vplxk-worker-us-east-1a   1         1         1       1           55m
+fast-03-agl030519-vplxk-worker-us-east-1b   1         1         1       1           55m
+fast-04-agl030519-vplxk-worker-us-east-1b   1         1         1       1           55m
+prod-01-agl030519-vplxk-worker-us-east-1a   1         1         1       1           33m
+prod-02-agl030519-vplxk-worker-us-east-1c   1         1         1       1           33m
+----
+
+. Using regular expressions, construct one or more patterns that match the name of any compute machine set that you want to set a priority level for.
++
+For example, use the regular expression pattern `\*fast*` to match any compute machine set that includes the string `fast` in its name.
+
+. Create a `cluster-autoscaler-priority-expander.yml` YAML file that defines a config map similar to the following:
++
+--
+.Example priority expander config map
+[source,yaml]
+----
+apiVersion: v1
+kind: ConfigMap
+metadata:
+  name: cluster-autoscaler-priority-expander # <1>
+  namespace: openshift-machine-api # <2>
+data:
+  priorities: |- # <3>
+    10:
+      - *fast*
+      - *archive*
+    40:
+      - *prod*
+----
+<1> You must name config map `cluster-autoscaler-priority-expander`.
+<2> You must create the config map in the same namespace as cluster autoscaler pod, which is the `openshift-machine-api` namespace.
+<3> Define the priority of your machine sets.
++
+The `priorities` values must be positive integers.
+The cluster autoscaler uses higher-value priorities before lower-value priorities.
++
+For each priority level, specify the regular expressions that correspond to the machine sets you want to use.
+--
+
+. Create the config map by running the following command:
++
+[source,terminal]
+----
+$ oc create configmap cluster-autoscaler-priority-expander \
+  --from-file=<location_of_config_map_file>/cluster-autoscaler-priority-expander.yml
+----
+
+.Verification
+
+* Review the config map by running the following command:
++
+[source,terminal]
+----
+$ oc get configmaps cluster-autoscaler-priority-expander -o yaml
+----
+
+.Next steps
+
+* To use the priority expander, ensure that the `ClusterAutoscaler` resource definition is configured to use the `expanders: ["Priority"]` parameter.
diff --git a/modules/cluster-autoscaler-cr.adoc b/modules/cluster-autoscaler-cr.adoc
@@ -9,38 +9,38 @@
 
 This `ClusterAutoscaler` resource definition shows the parameters and sample values for the cluster autoscaler.
 
-
 [source,yaml]
 ----
 apiVersion: "autoscaling.openshift.io/v1"
 kind: "ClusterAutoscaler"
 metadata:
   name: "default"
 spec:
-  podPriorityThreshold: -10 <1>
+  podPriorityThreshold: -10 # <1>
   resourceLimits:
-    maxNodesTotal: 24 <2>
+    maxNodesTotal: 24 # <2>
     cores:
-      min: 8 <3>
-      max: 128 <4>
+      min: 8 # <3>
+      max: 128 # <4>
     memory:
-      min: 4 <5>
-      max: 256 <6>
+      min: 4 # <5>
+      max: 256 # <6>
     gpus:
-      - type: nvidia.com/gpu <7>
-        min: 0 <8>
-        max: 16 <9>
+      - type: nvidia.com/gpu # <7>
+        min: 0 # <8>
+        max: 16 # <9>
       - type: amd.com/gpu
         min: 0
         max: 4
-  logVerbosity: 4 <10>
-  scaleDown: <11>
-    enabled: true <12>
-    delayAfterAdd: 10m <13>
-    delayAfterDelete: 5m <14>
-    delayAfterFailure: 30s <15>
-    unneededTime: 5m <16>
-    utilizationThreshold: "0.4" <17>
+  logVerbosity: 4 # <10>
+  scaleDown: # <11>
+    enabled: true # <12>
+    delayAfterAdd: 10m # <13>
+    delayAfterDelete: 5m # <14>
+    delayAfterFailure: 30s # <15>
+    unneededTime: 5m # <16>
+    utilizationThreshold: "0.4" # <17>
+  expanders: ["Random"] # <18>
 ----
 <1> Specify the priority that a pod must exceed to cause the cluster autoscaler to deploy additional nodes. Enter a 32-bit integer value. The `podPriorityThreshold` value is compared to the value of the `PriorityClass` that you assign to each pod.
 <2> Specify the maximum number of nodes to deploy. This value is the total number of machines that are deployed in your cluster, not just the ones that the autoscaler controls. Ensure that this value is large enough to account for all of your control plane and compute machines and the total number of replicas that you specify in your `MachineAutoscaler` resources.
@@ -66,8 +66,29 @@ If you do not specify a value, the default value of `1` is used.
 <14> Optional: Specify the period to wait before deleting a node after a node has recently been _deleted_. If you do not specify a value, the default value of `0s` is used.
 <15> Optional: Specify the period to wait before deleting a node after a scale down failure occurred. If you do not specify a value, the default value of `3m` is used.
 <16> Optional: Specify a period of time before an unnecessary node is eligible for deletion. If you do not specify a value, the default value of `10m` is used.
-<17> Optional:  Specify the _node utilization level_. Nodes below this utilization level are eligible for deletion. If you do not specify a value, the default value of `10m` is used.. The node utilization level is the sum of the requested resources divided by the allocated resources for the node, and must be a value greater than `"0"` but less than `"1"`. If you do not specify a value, the cluster autoscaler uses a default value of `"0.5"`, which corresponds to 50% utilization. This value must be expressed as a string.
-// Might be able to add a formula to show this visually, but need to look into asciidoc math formatting and what our tooling supports.
+<17> Optional:  Specify the _node utilization level_. Nodes below this utilization level are eligible for deletion. If you do not specify a value, the default value of `10m` is used.
++
+The node utilization level is the sum of the requested resources divided by the allocated resources for the node, and must be a value greater than `"0"` but less than `"1"`. If you do not specify a value, the cluster autoscaler uses a default value of `"0.5"`, which corresponds to 50% utilization. You must express this value as a string.
+<18> Optional: Specify any expanders that you want the cluster autoscaler to use.
+The following values are valid:
++
+--
+* `LeastWaste`: Selects the machine set that minimizes the idle CPU after scaling.
+If multiple machine sets would yield the same amount of idle CPU, the selection minimizes unused memory.
+* `Priority`: Selects the machine set with the highest user-assigned priority.
+To use this expander, you must create a config map that defines the priority of your machine sets.
+For more information, see "Configuring a priority expander for the cluster autoscaler."
+* `Random`: (Default) Selects the machine set randomly.
+--
++
+If you do not specify a value, the default value of `Random` is used.
++
+You can specify multiple expanders by using the `[LeastWaste, Priority]` format.
+The cluster autoscaler applies each expander according to the specified order.
++
+In the `[LeastWaste, Priority]` example, the cluster autoscaler first evaluates according to the `LeastWaste` criteria.
+If more than one machine set satisfies the `LeastWaste` criteria equally well, the cluster autoscaler then evaluates according to the `Priority` criteria.
+If more than one machine set satisfies all of the specified expanders equally well, the cluster autoscaler selects one to use at random.
 
 [NOTE]
 ====