Skip to content

Commit 1ff4b94

Browse files
committed
TELCODOCS#2230: Coordinating reboots for configuration changes
1 parent 78b4db9 commit 1ff4b94

File tree

4 files changed

+110
-1
lines changed

4 files changed

+110
-1
lines changed

edge_computing/policygenerator_for_ztp/ztp-configuring-managed-clusters-policygenerator.adoc

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -55,6 +55,13 @@ include::modules/ztp-customizing-a-managed-site-using-pgt.adoc[leveloffset=+1]
5555
5656
include::modules/ztp-monitoring-policy-deployment-progress.adoc[leveloffset=+1]
5757

58+
include::modules/ztp-coordinating-reboots-for-config-changes.adoc[leveloffset=+1]
59+
60+
[role="_additional-resources"]
61+
.Additional resources
62+
63+
* xref:../../edge_computing/policygenerator_for_ztp/ztp-configuring-managed-clusters-policygenerator.adoc#ztp-customizing-a-managed-site-using-pgt_ztp-configuring-managed-clusters-policygenerator[Customizing a managed cluster with PolicyGenerator CRs]
64+
5865
include::modules/ztp-validating-the-generation-of-configuration-policy-crs.adoc[leveloffset=+1]
5966

6067
include::modules/ztp-restarting-policies-reconciliation.adoc[leveloffset=+1]

modules/defer-applicaton-tuning-example.adoc

Lines changed: 6 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -55,4 +55,9 @@ spec:
5555

5656
<1> The `include` directive is used to inherit the `openshift-node-performance-performance` profile. This is a best practice to ensure that the profile is not missing any required settings.
5757
<2> The `kernel.shmmni` sysctl parameter is being changed to `8192`.
58-
<3> The `machineConfigLabels` field is used to target the `worker-cnf` role. Configure a `MachineConfigPool` resource to ensure the profile is applied only to the correct nodes.
58+
<3> The `machineConfigLabels` field is used to target the `worker-cnf` role. Configure a `MachineConfigPool` resource to ensure the profile is applied only to the correct nodes.
59+
60+
[NOTE]
61+
====
62+
You can use {cgu-operator-full} to perform a controlled reboot across a fleet of spoke clusters to apply a deferred tuning change. For more information about coordinated reboots, see "Coordinating reboots for configuration changes".
63+
====
Lines changed: 93 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,93 @@
1+
// Module included in the following assemblies:
2+
//
3+
// * scalability_and_performance/ztp_far_edge/ztp-configuring-managed-clusters-policies.adoc
4+
5+
:_mod-docs-content-type: PROCEDURE
6+
[id="ztp-coordinating-reboots-for-config-changes_{context}"]
7+
= Coordinating reboots for configuration changes
8+
9+
You can use {cgu-operator-full} (TALM) to coordinate reboots across a fleet of spoke clusters when configuration changes require a reboot, such as deferred tuning changes. {cgu-operator} reboots all nodes in the targeted `MachineConfigPool` on the selected clusters when the reboot policy is applied.
10+
11+
Instead of rebooting nodes after each individual change, you can apply all configuration updates through policies and then trigger a single, coordinated reboot.
12+
13+
.Prerequisites
14+
15+
* You have installed the {oc-first}.
16+
* You have logged in to the hub cluster as a user with `cluster-admin` privileges.
17+
* You have deployed and configured {cgu-operator}.
18+
19+
.Procedure
20+
21+
. Generate the configuration policies by creating a `PolicyGenerator` custom resource (CR). You can use one of the following sample manifests:
22+
23+
* `out/argocd/example/acmpolicygenerator/acm-example-sno-reboot`
24+
* `out/argocd/example/acmpolicygenerator/acm-example-multinode-reboot`
25+
26+
. Update the `policyDefaults.placement.labelSelector` field in the `PolicyGenerator` CR to target the clusters that you want to reboot. Modify other fields as necessary for your use case.
27+
+
28+
If you are coordinating a reboot to apply a deferred tuning change, ensure the `MachineConfigPool` in the reboot policy matches the value specified in the `spec.recommend` field in the `Tuned` object.
29+
30+
. Apply the `PolicyGenerator` CR to generate and apply the configuration policies. For detailed steps, see "Customizing a managed cluster with PolicyGenerator CRs".
31+
32+
. After ArgoCD completes syncing the policies, create and apply the `ClusterGroupUpgrade` (CGU) CR.
33+
+
34+
.Example CGU custom resource configuration
35+
[source,yaml]
36+
----
37+
apiVersion: ran.openshift.io/v1alpha1
38+
kind: ClusterGroupUpgrade
39+
metadata:
40+
name: reboot
41+
namespace: default
42+
spec:
43+
clusterLabelSelectors:
44+
- matchLabels: <1>
45+
# ...
46+
enable: true
47+
managedPolicies: <2>
48+
- example-reboot
49+
remediationStrategy:
50+
timeout: 300 <3>
51+
maxConcurrency: 10
52+
# ...
53+
----
54+
<1> Configure the labels that match the clusters you want to reboot.
55+
<2> Add all required configuration policies before the reboot policy. {cgu-operator} applies the configuration changes as specified in the policies, in the order they are listed.
56+
<3> Specify the timeout in seconds for the entire upgrade across all selected clusters. Set this field by considering the worst-case scenario.
57+
58+
. After you apply the CGU custom resource, {cgu-operator} rolls out the configuration policies in order. Once all policies are compliant, it applies the reboot policy and triggers a reboot of all nodes in the specified `MachineConfigPool`.
59+
60+
.Verification
61+
62+
. Monitor the CGU rollout status.
63+
+
64+
You can monitor the rollout of the CGU custom resource on the hub by checking the status. Verify the successful rollout of the reboot by running the following command:
65+
+
66+
[source,terminal]
67+
----
68+
oc get cgu -A
69+
----
70+
+
71+
.Example output
72+
[source,terminal]
73+
----
74+
NAMESPACE NAME AGE STATE DETAILS
75+
default reboot 1d Completed All clusters are compliant with all the managed policies
76+
----
77+
78+
. Verify successful reboot on a specific node.
79+
+
80+
To confirm that the reboot was successful on a specific node, check the status of the `MachineConfigPool` (MCP) for the node by running the following command:
81+
+
82+
[source,terminal]
83+
----
84+
oc get mcp master
85+
86+
----
87+
+
88+
.Example output
89+
[source,terminal]
90+
----
91+
NAME CONFIG UPDATED UPDATING DEGRADED MACHINECOUNT READYMACHINECOUNT UPDATEDMACHINECOUNT DEGRADEDMACHINECOUNT AGE
92+
master rendered-master-be5785c3b98eb7a1ec902fef2b81e865 True False False 3 3 3 0 72d
93+
----

scalability_and_performance/using-node-tuning-operator.adoc

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -23,6 +23,10 @@ include::modules/custom-tuning-example.adoc[leveloffset=+1]
2323

2424
include::modules/defer-applicaton-tuning-example.adoc[leveloffset=+1]
2525

26+
[role="_additional-resources"]
27+
.Additional resources
28+
* xref:../edge_computing/policygenerator_for_ztp/ztp-configuring-managed-clusters-policygenerator.adoc#ztp-coordinating-reboots-for-config-changes_ztp-configuring-managed-clusters-policygenerator[Coordinating reboots for configuration changes]
29+
2630
include::modules/defer-application-tuning-proc.adoc[leveloffset=+2]
2731

2832
include::modules/node-tuning-operator-supported-tuned-daemon-plug-ins.adoc[leveloffset=+1]

0 commit comments

Comments
 (0)