Merge pull request #91723 from sr1kar99/2230-controlled-reboots

michaelryanpeter · web-flow · commit 6af204cd4c2a · 2025-04-30T11:47:42.000-04:00
TELCODOCS#2230: Coordinating reboots for configuration changes
diff --git a/edge_computing/policygenerator_for_ztp/ztp-configuring-managed-clusters-policygenerator.adoc b/edge_computing/policygenerator_for_ztp/ztp-configuring-managed-clusters-policygenerator.adoc
@@ -55,6 +55,13 @@ include::modules/ztp-customizing-a-managed-site-using-pgt.adoc[leveloffset=+1]
 
 include::modules/ztp-monitoring-policy-deployment-progress.adoc[leveloffset=+1]
 
+include::modules/ztp-coordinating-reboots-for-config-changes.adoc[leveloffset=+1]
+
+[role="_additional-resources"]
+.Additional resources
+
+* xref:../../edge_computing/policygenerator_for_ztp/ztp-configuring-managed-clusters-policygenerator.adoc#ztp-customizing-a-managed-site-using-pgt_ztp-configuring-managed-clusters-policygenerator[Customizing a managed cluster with PolicyGenerator CRs]
+
 include::modules/ztp-validating-the-generation-of-configuration-policy-crs.adoc[leveloffset=+1]
 
 include::modules/ztp-restarting-policies-reconciliation.adoc[leveloffset=+1]
diff --git a/modules/defer-applicaton-tuning-example.adoc b/modules/defer-applicaton-tuning-example.adoc
@@ -55,4 +55,9 @@ spec:
 
 <1> The `include` directive is used to inherit the `openshift-node-performance-performance` profile. This is a best practice to ensure that the profile is not missing any required settings.
 <2> The `kernel.shmmni` sysctl parameter is being changed to `8192`.
-<3> The `machineConfigLabels` field is used to target the `worker-cnf` role. Configure a `MachineConfigPool` resource to ensure the profile is applied only to the correct nodes.
+<3> The `machineConfigLabels` field is used to target the `worker-cnf` role. Configure a `MachineConfigPool` resource to ensure the profile is applied only to the correct nodes.
+
+[NOTE]
+====
+You can use {cgu-operator-full} to perform a controlled reboot across a fleet of spoke clusters to apply a deferred tuning change. For more information about coordinated reboots, see "Coordinating reboots for configuration changes".
+====
diff --git a/modules/ztp-coordinating-reboots-for-config-changes.adoc b/modules/ztp-coordinating-reboots-for-config-changes.adoc
@@ -0,0 +1,93 @@
+// Module included in the following assemblies:
+//
+// * scalability_and_performance/ztp_far_edge/ztp-configuring-managed-clusters-policies.adoc
+
+:_mod-docs-content-type: PROCEDURE
+[id="ztp-coordinating-reboots-for-config-changes_{context}"]
+= Coordinating reboots for configuration changes
+
+You can use {cgu-operator-full} (TALM) to coordinate reboots across a fleet of spoke clusters when configuration changes require a reboot, such as deferred tuning changes. {cgu-operator} reboots all nodes in the targeted `MachineConfigPool` on the selected clusters when the reboot policy is applied.
+
+Instead of rebooting nodes after each individual change, you can apply all configuration updates through policies and then trigger a single, coordinated reboot. 
+
+.Prerequisites
+
+* You have installed the {oc-first}.
+* You have logged in to the hub cluster as a user with `cluster-admin` privileges.
+* You have deployed and configured {cgu-operator}.
+
+.Procedure
+
+. Generate the configuration policies by creating a `PolicyGenerator` custom resource (CR). You can use one of the following sample manifests:
+
+* `out/argocd/example/acmpolicygenerator/acm-example-sno-reboot`
+* `out/argocd/example/acmpolicygenerator/acm-example-multinode-reboot`
+
+. Update the `policyDefaults.placement.labelSelector` field in the `PolicyGenerator` CR to target the clusters that you want to reboot. Modify other fields as necessary for your use case. 
++
+If you are coordinating a reboot to apply a deferred tuning change, ensure the `MachineConfigPool` in the reboot policy matches the value specified in the `spec.recommend` field in the `Tuned` object.
+
+. Apply the `PolicyGenerator` CR to generate and apply the configuration policies. For detailed steps, see "Customizing a managed cluster with PolicyGenerator CRs".
+
+. After ArgoCD completes syncing the policies, create and apply the `ClusterGroupUpgrade` (CGU) CR. 
++
+.Example CGU custom resource configuration
+[source,yaml]
+----
+apiVersion: ran.openshift.io/v1alpha1
+kind: ClusterGroupUpgrade
+metadata:
+  name: reboot
+  namespace: default
+spec:
+  clusterLabelSelectors:
+  - matchLabels: <1>
+# ...
+  enable: true
+  managedPolicies: <2>
+  - example-reboot
+  remediationStrategy:
+    timeout: 300 <3>
+    maxConcurrency: 10
+# ...
+----
+<1> Configure the labels that match the clusters you want to reboot.
+<2> Add all required configuration policies before the reboot policy. {cgu-operator} applies the configuration changes as specified in the policies, in the order they are listed.
+<3> Specify the timeout in seconds for the entire upgrade across all selected clusters. Set this field by considering the worst-case scenario.
+
+. After you apply the CGU custom resource, {cgu-operator} rolls out the configuration policies in order. Once all policies are compliant, it applies the reboot policy and triggers a reboot of all nodes in the specified `MachineConfigPool`.
+
+.Verification
+
+. Monitor the CGU rollout status.
++
+You can monitor the rollout of the CGU custom resource on the hub by checking the status. Verify the successful rollout of the reboot by running the following command:
++
+[source,terminal]
+----
+oc get cgu -A
+----
++
+.Example output
+[source,terminal]
+----
+NAMESPACE   NAME     AGE   STATE       DETAILS
+default     reboot   1d    Completed   All clusters are compliant with all the managed policies
+----
+
+. Verify successful reboot on a specific node.
++
+To confirm that the reboot was successful on a specific node, check the status of the `MachineConfigPool` (MCP) for the node by running the following command:
++
+[source,terminal]
+----
+oc get mcp master
+
+----
++
+.Example output
+[source,terminal]
+----
+NAME     CONFIG                                             UPDATED   UPDATING   DEGRADED   MACHINECOUNT   READYMACHINECOUNT   UPDATEDMACHINECOUNT   DEGRADEDMACHINECOUNT   AGE
+master   rendered-master-be5785c3b98eb7a1ec902fef2b81e865   True      False      False      3              3                   3                     0                      72d
+----
diff --git a/scalability_and_performance/using-node-tuning-operator.adoc b/scalability_and_performance/using-node-tuning-operator.adoc
@@ -23,6 +23,10 @@ include::modules/custom-tuning-example.adoc[leveloffset=+1]
 
 include::modules/defer-applicaton-tuning-example.adoc[leveloffset=+1]
 
+[role="_additional-resources"]
+.Additional resources
+* xref:../edge_computing/policygenerator_for_ztp/ztp-configuring-managed-clusters-policygenerator.adoc#ztp-coordinating-reboots-for-config-changes_ztp-configuring-managed-clusters-policygenerator[Coordinating reboots for configuration changes]
+
 include::modules/defer-application-tuning-proc.adoc[leveloffset=+2]
 
 include::modules/node-tuning-operator-supported-tuned-daemon-plug-ins.adoc[leveloffset=+1]