OSDOCS-13968: add rhel worker node removal info

jldohmann · jldohmann · commit a78fed502f26 · 2025-04-25T11:07:30.000-07:00
diff --git a/_topic_maps/_topic_map.yml b/_topic_maps/_topic_map.yml
@@ -760,6 +760,9 @@ Topics:
     File: preparing-manual-creds-update
   - Name: Preflight validation for Kernel Module Management (KMM) Modules
     File: kmm-preflight-validation
+  - Name: Preparing to update from OpenShift Container Platform 4.18 to a newer version
+    File: updating-cluster-prepare-past-4.18
+    Distros: openshift-enterprise
 - Name: Performing a cluster update
   Dir: updating_a_cluster
   Topics:
diff --git a/updating/preparing_for_updates/updating-cluster-prepare-past-4.18.adoc b/updating/preparing_for_updates/updating-cluster-prepare-past-4.18.adoc
@@ -0,0 +1,123 @@
+:_mod-docs-content-type: ASSEMBLY
+[id="updating-cluster-prepare-past-4.18"]
+= Preparing to update from {product-title} 4.18 to a newer version
+include::_attributes/common-attributes.adoc[]
+:context: updating-cluster-prepare-past-4.18
+
+toc::[]
+
+Before you update from {product-title} 4.18 to a newer version, learn about some of the specific concerns around {op-system-base-full} compute machines.
+
+[id="migrating-workloads-to-different-nodes_{context}"]
+== Migrating workloads off of package-based {op-system-base} worker nodes
+
+With the introduction of {product-title} 4.19, package-based {op-system-base} worker nodes are no longer supported. If you try to update your cluster while those nodes are up and running, the update will fail.
+
+You can reschedule pods running on {op-system-base} compute nodes to run on your {op-system} nodes instead by using node selectors.
+
+For example, the following `Node` object has a label for its operating system information, in this case {op-system}:
+
+.Sample `Node` object with {op-system} label
+[source,yaml,subs="+attributes"]
+----
+kind: Node
+apiVersion: v1
+metadata:
+  name: ip-10-0-131-14.ec2.internal
+  selfLink: /api/v1/nodes/ip-10-0-131-14.ec2.internal
+  uid: 7bc2580a-8b8e-11e9-8e01-021ab4174c74
+  resourceVersion: '478704'
+  creationTimestamp: '2019-06-10T14:46:08Z'
+  labels:
+    kubernetes.io/os: linux
+    failure-domain.beta.kubernetes.io/zone: us-east-1a
+    node.openshift.io/os_version: '{product-version}'
+    node-role.kubernetes.io/worker: ''
+    failure-domain.beta.kubernetes.io/region: us-east-1
+    node.openshift.io/os_id: rhcos <1>
+    beta.kubernetes.io/instance-type: m4.large
+    kubernetes.io/hostname: ip-10-0-131-14
+    beta.kubernetes.io/arch: amd64
+#...
+----
+<1> The label identifying the operating system that runs on the node, to match the pod node selector.
+
+Any pods that you want to schedule to new {op-system} nodes must contain a matching label in its `nodeSelector` field. The following procedure describes how to add the label.
+
+.Procedure
+
+. Deschedule the {op-system-base} node currently running your existing pods by entering the following command:
++
+[source,terminal]
+----
+$ oc adm cordon <rhel-node>
+----
+
+. Add an `rhcos` node selector to a pod:
+
+** To add the node selector to existing and future pods, add the node selector to the controller object for the pods by entering the following command:
++
+.Example `Deployment` object with `rhcos` label
+[source,terminal]
+----
+$ oc patch dc <my-app> -p '{"spec":{"template":{"spec":{"nodeSelector":{"node.openshift.io/os_id":"rhcos"}}}}}'
+----
++
+Any existing pods under your `Deployment` controlling object will be re-created on your {op-system} nodes.
+
+** To add the node selector to a specific, new pod, add the selector to the `Pod` object directly:
++
+.Example `Pod` object with `rhcos` label
+[source,yaml]
+----
+apiVersion: v1
+kind: Pod
+metadata:
+  name: <my-app>
+#...
+spec:
+  nodeSelector:
+    node.openshift.io/os_id: rhcos
+#...
+----
++
+The new pod will be created on {op-system} nodes, assuming the pod also has a controlling object.
+
+[id="identifying-and-removing-rhel-worker-nodes_{context}"]
+== Identifying and removing {op-system-base} worker nodes
+
+With the introduction of {product-title} 4.19, package-based {op-system-base} worker nodes are no longer supported. The following procedure describes how to identify {op-system-base} nodes for cluster removal on bare-metal installations. You must complete the following steps to successfully update your cluster.
+
+.Procedure
+
+. Identify nodes in your cluster that are running {op-system-base} by entering the following command:
++
+[source,terminal]
+----
+$ oc get -l node.openshift.io/os_id=rhel
+----
++
+.Example output
++
+[source,text]
+----
+NAME                        STATUS    ROLES     AGE       VERSION
+rhel-node1.example.com      Ready     worker    7h        v1.31.7
+rhel-node2.example.com      Ready     worker    7h        v1.31.7
+rhel-node3.example.com      Ready     worker    7h        v1.31.7
+----
+
+. xref:../../nodes/nodes/nodes-nodes-working.adoc#nodes-nodes-working-deleting-bare-metal_nodes-nodes-working[Continue with the node removal process]. {op-system-base} nodes are not managed by the Machine API and have no compute machine sets associated with them. You must unschedule and drain the node before you manually delete it from the cluster.
++
+For more information on this process, see link:https://access.redhat.com/solutions/4976801[How to remove a worker node from Red{nbsp}Hat {product-title} 4 UPI].
+
+[id="provision-new-rhcos-nodes_{context}"]
+== Provisioning new {op-system} worker nodes
+
+If you need additional compute nodes for your workloads, you can provision new ones either before or after you update your cluster. For more information, see the following xref:../../machine_management/index.adoc#overview-of-machine-management[machine management] documentation:
+
+* xref:../../machine_management/manually-scaling-machineset.adoc#manually-scaling-machineset[Manually scaling a compute machine set]
+* xref:../../machine_management/applying-autoscaling.adoc#applying-autoscaling[Applying autoscaling to an {product-title} cluster]
+* xref:../../machine_management/user_infra/adding-compute-user-infra-general.adoc#adding-compute-user-infra-general[Adding compute machines to clusters with user-provisioned infrastructure manually]
+
+For installer-provisioned infrastructure installations, automatic scaling adds {op-system} nodes by default. For user-provisioned infrastructure installations on bare metal platforms, you can manually xref:../../post_installation_configuration/node-tasks.adoc#post-install-config-adding-fcos-compute[add {op-system} compute nodes to your cluster].