Skip to content

Commit c67a35b

Browse files
authored
Merge pull request #73787 from rohennes/TELCODOCS-1485
TELCODOCS-1485: Updates to reflect new backend remediation process by TALM
2 parents 3852299 + 9a7ac12 commit c67a35b

6 files changed

+59
-113
lines changed

modules/cnf-about-topology-aware-lifecycle-manager-blocking-crs.adoc

Lines changed: 1 addition & 25 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
// Module included in the following assemblies:
22
//
3-
// * scalability_and_performance/cnf-talm-for-cluster-upgrades.adoc
3+
// * edge_computing/cnf-talm-for-cluster-upgrades.adoc
44

55
:_mod-docs-content-type: PROCEDURE
66
[id="cnf-about-topology-aware-lifecycle-manager-blocking-crs_{context}"]
@@ -54,10 +54,6 @@ status:
5454
reason: UpgradeNotStarted
5555
status: "False"
5656
type: Ready
57-
copiedPolicies:
58-
- cgu-a-policy1-common-cluster-version-policy
59-
- cgu-a-policy2-common-pao-sub-policy
60-
- cgu-a-policy3-common-ptp-sub-policy
6157
managedPoliciesForUpgrade:
6258
- name: policy1-common-cluster-version-policy
6359
namespace: default
@@ -108,11 +104,6 @@ status:
108104
reason: UpgradeNotStarted
109105
status: "False"
110106
type: Ready
111-
copiedPolicies:
112-
- cgu-b-policy1-common-cluster-version-policy
113-
- cgu-b-policy2-common-pao-sub-policy
114-
- cgu-b-policy3-common-ptp-sub-policy
115-
- cgu-b-policy4-common-sriov-sub-policy
116107
managedPoliciesForUpgrade:
117108
- name: policy1-common-cluster-version-policy
118109
namespace: default
@@ -164,9 +155,6 @@ status:
164155
reason: UpgradeNotStarted
165156
status: "False"
166157
type: Ready
167-
copiedPolicies:
168-
- cgu-c-policy1-common-cluster-version-policy
169-
- cgu-c-policy4-common-sriov-sub-policy
170158
managedPoliciesCompliantBeforeUpgrade:
171159
- policy2-common-pao-sub-policy
172160
- policy3-common-ptp-sub-policy
@@ -238,10 +226,6 @@ status:
238226
reason: UpgradeCannotStart
239227
status: "False"
240228
type: Ready
241-
copiedPolicies:
242-
- cgu-a-policy1-common-cluster-version-policy
243-
- cgu-a-policy2-common-pao-sub-policy
244-
- cgu-a-policy3-common-ptp-sub-policy
245229
managedPoliciesForUpgrade:
246230
- name: policy1-common-cluster-version-policy
247231
namespace: default
@@ -296,11 +280,6 @@ status:
296280
reason: UpgradeCannotStart
297281
status: "False"
298282
type: Ready
299-
copiedPolicies:
300-
- cgu-b-policy1-common-cluster-version-policy
301-
- cgu-b-policy2-common-pao-sub-policy
302-
- cgu-b-policy3-common-ptp-sub-policy
303-
- cgu-b-policy4-common-sriov-sub-policy
304283
managedPoliciesForUpgrade:
305284
- name: policy1-common-cluster-version-policy
306285
namespace: default
@@ -354,9 +333,6 @@ status:
354333
reason: UpgradeNotCompleted
355334
status: "False"
356335
type: Ready
357-
copiedPolicies:
358-
- cgu-c-policy1-common-cluster-version-policy
359-
- cgu-c-policy4-common-sriov-sub-policy
360336
managedPoliciesCompliantBeforeUpgrade:
361337
- policy2-common-pao-sub-policy
362338
- policy3-common-ptp-sub-policy

modules/cnf-topology-aware-lifecycle-manager-apply-policies.adoc

Lines changed: 48 additions & 55 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
// Module included in the following assemblies:
22
// Epic CNF-2600 (CNF-2133) (4.10), Story TELCODOCS-285
3-
// * scalability_and_performance/cnf-talm-for-cluster-upgrades.adoc
3+
// * edge_computing/cnf-talm-for-cluster-upgrades.adoc
44

55
:_mod-docs-content-type: PROCEDURE
66
[id="talo-apply-policies_{context}"]
@@ -11,6 +11,7 @@ You can update your managed clusters by applying your policies.
1111
.Prerequisites
1212

1313
* Install the {cgu-operator-first}.
14+
* {cgu-operator} 4.16 requires {rh-rhacm} 2.9 or later.
1415
* Provision one or more managed clusters.
1516
* Log in as a user with `cluster-admin` privileges.
1617
* Create {rh-rhacm} policies in the hub cluster.
@@ -64,7 +65,6 @@ $ oc get cgu --all-namespaces
6465
----
6566
+
6667
.Example output
67-
+
6868
[source,terminal]
6969
----
7070
NAMESPACE NAME AGE STATE DETAILS
@@ -79,7 +79,6 @@ $ oc get cgu -n default cgu-1 -ojsonpath='{.status}' | jq
7979
----
8080
+
8181
.Example output
82-
+
8382
[source,json]
8483
----
8584
{
@@ -93,12 +92,6 @@ $ oc get cgu -n default cgu-1 -ojsonpath='{.status}' | jq
9392
"type": "Progressing"
9493
}
9594
],
96-
"copiedPolicies": [
97-
"cgu-policy1-common-cluster-version-policy",
98-
"cgu-policy2-common-nto-sub-policy",
99-
"cgu-policy3-common-ptp-sub-policy",
100-
"cgu-policy4-common-sriov-sub-policy"
101-
],
10295
"managedPoliciesContent": {
10396
"policy1-common-cluster-version-policy": "null",
10497
"policy2-common-nto-sub-policy": "[{\"kind\":\"Subscription\",\"name\":\"node-tuning-operator\",\"namespace\":\"openshift-cluster-node-tuning-operator\"}]",
@@ -141,9 +134,6 @@ $ oc get cgu -n default cgu-1 -ojsonpath='{.status}' | jq
141134
"cgu-policy3-common-ptp-sub-policy",
142135
"cgu-policy4-common-sriov-sub-policy"
143136
],
144-
"precaching": {
145-
"spec": {}
146-
},
147137
"remediationPlan": [
148138
[
149139
"spoke1",
@@ -159,28 +149,6 @@ $ oc get cgu -n default cgu-1 -ojsonpath='{.status}' | jq
159149
----
160150
<1> The `spec.enable` field in the `ClusterGroupUpgrade` CR is set to `false`.
161151

162-
.. Check the status of the policies by running the following command:
163-
+
164-
[source,terminal]
165-
----
166-
$ oc get policies -A
167-
----
168-
+
169-
.Example output
170-
[source,terminal]
171-
----
172-
NAMESPACE NAME REMEDIATION ACTION COMPLIANCE STATE AGE
173-
default cgu-policy1-common-cluster-version-policy enforce 17m <1>
174-
default cgu-policy2-common-nto-sub-policy enforce 17m
175-
default cgu-policy3-common-ptp-sub-policy enforce 17m
176-
default cgu-policy4-common-sriov-sub-policy enforce 17m
177-
default policy1-common-cluster-version-policy inform NonCompliant 15h
178-
default policy2-common-nto-sub-policy inform NonCompliant 15h
179-
default policy3-common-ptp-sub-policy inform NonCompliant 18m
180-
default policy4-common-sriov-sub-policy inform NonCompliant 18m
181-
----
182-
<1> The `spec.remediationAction` field of policies currently applied on the clusters is set to `enforce`. The managed policies in `inform` mode from the `ClusterGroupUpgrade` CR remain in `inform` mode during the update.
183-
184152
. Change the value of the `spec.enable` field to `true` by running the following command:
185153
+
186154
[source,terminal]
@@ -191,15 +159,14 @@ $ oc --namespace=default patch clustergroupupgrade.ran.openshift.io/cgu-1 \
191159

192160
.Verification
193161

194-
. Check the status of the update again by running the following command:
162+
. Check the status of the update by running the following command:
195163
+
196164
[source,terminal]
197165
----
198166
$ oc get cgu -n default cgu-1 -ojsonpath='{.status}' | jq
199167
----
200168
+
201169
.Example output
202-
+
203170
[source,json]
204171
----
205172
{
@@ -210,25 +177,23 @@ $ oc get cgu -n default cgu-1 -ojsonpath='{.status}' | jq
210177
"message": "All selected clusters are valid",
211178
"reason": "ClusterSelectionCompleted",
212179
"status": "True",
213-
"type": "ClustersSelected",
180+
"type": "ClustersSelected"
181+
},
182+
{
214183
"lastTransitionTime": "2022-02-25T15:33:07Z",
215184
"message": "Completed validation",
216185
"reason": "ValidationCompleted",
217186
"status": "True",
218-
"type": "Validated",
187+
"type": "Validated"
188+
},
189+
{
219190
"lastTransitionTime": "2022-02-25T15:34:07Z",
220191
"message": "Remediating non-compliant policies",
221192
"reason": "InProgress",
222193
"status": "True",
223194
"type": "Progressing"
224195
}
225196
],
226-
"copiedPolicies": [
227-
"cgu-policy1-common-cluster-version-policy",
228-
"cgu-policy2-common-nto-sub-policy",
229-
"cgu-policy3-common-ptp-sub-policy",
230-
"cgu-policy4-common-sriov-sub-policy"
231-
],
232197
"managedPoliciesContent": {
233198
"policy1-common-cluster-version-policy": "null",
234199
"policy2-common-nto-sub-policy": "[{\"kind\":\"Subscription\",\"name\":\"node-tuning-operator\",\"namespace\":\"openshift-cluster-node-tuning-operator\"}]",
@@ -271,9 +236,6 @@ $ oc get cgu -n default cgu-1 -ojsonpath='{.status}' | jq
271236
"cgu-policy3-common-ptp-sub-policy",
272237
"cgu-policy4-common-sriov-sub-policy"
273238
],
274-
"precaching": {
275-
"spec": {}
276-
},
277239
"remediationPlan": [
278240
[
279241
"spoke1",
@@ -286,17 +248,52 @@ $ oc get cgu -n default cgu-1 -ojsonpath='{.status}' | jq
286248
],
287249
"status": {
288250
"currentBatch": 1,
289-
"currentBatchStartedAt": "2022-02-25T15:54:16Z",
290-
"remediationPlanForBatch": {
291-
"spoke1": 0,
292-
"spoke2": 1
251+
"currentBatchRemediationProgress": {
252+
"spoke1": {
253+
"policyIndex": 1,
254+
"state": "InProgress"
255+
},
256+
"spoke2": {
257+
"policyIndex": 1,
258+
"state": "InProgress"
259+
}
293260
},
261+
"currentBatchStartedAt": "2022-02-25T15:54:16Z",
294262
"startedAt": "2022-02-25T15:54:16Z"
295263
}
296264
}
297265
----
298266
<1> Reflects the update progress of the current batch. Run this command again to receive updated information about the progress.
299267

268+
. Check the status of the policies by running the following command:
269+
+
270+
[source,terminal]
271+
----
272+
oc get policies -A
273+
----
274+
+
275+
.Example output
276+
[source,terminal]
277+
----
278+
NAMESPACE NAME REMEDIATION ACTION COMPLIANCE STATE AGE
279+
spoke1 default.policy1-common-cluster-version-policy enforce Compliant 18m
280+
spoke1 default.policy2-common-nto-sub-policy enforce NonCompliant 18m
281+
spoke2 default.policy1-common-cluster-version-policy enforce Compliant 18m
282+
spoke2 default.policy2-common-nto-sub-policy enforce NonCompliant 18m
283+
spoke5 default.policy3-common-ptp-sub-policy inform NonCompliant 18m
284+
spoke5 default.policy4-common-sriov-sub-policy inform NonCompliant 18m
285+
spoke6 default.policy3-common-ptp-sub-policy inform NonCompliant 18m
286+
spoke6 default.policy4-common-sriov-sub-policy inform NonCompliant 18m
287+
default policy1-common-ptp-sub-policy inform Compliant 18m
288+
default policy2-common-sriov-sub-policy inform NonCompliant 18m
289+
default policy3-common-ptp-sub-policy inform NonCompliant 18m
290+
default policy4-common-sriov-sub-policy inform NonCompliant 18m
291+
----
292+
+
293+
* The `spec.remediationAction` value changes to `enforce` for the child policies applied to the clusters from the current batch.
294+
* The `spec.remedationAction` value remains `inform` for the child policies in the rest of the clusters.
295+
* After the batch is complete, the `spec.remediationAction` value changes back to `inform` for the enforced child policies.
296+
300297
. If the policies include Operator subscriptions, you can check the installation progress directly on the single-node cluster.
301298

302299
.. Export the `KUBECONFIG` file of the single-node cluster you want to check the installation progress for by running the following command:
@@ -314,7 +311,6 @@ $ oc get subs -A | grep -i <subscription_name>
314311
----
315312
+
316313
.Example output for `cluster-logging` policy
317-
+
318314
[source,terminal]
319315
----
320316
NAMESPACE NAME PACKAGE SOURCE CHANNEL
@@ -329,7 +325,6 @@ $ oc get clusterversion
329325
----
330326
+
331327
.Example output
332-
+
333328
[source,terminal,subs="attributes+"]
334329
----
335330
NAME VERSION AVAILABLE PROGRESSING SINCE STATUS
@@ -351,7 +346,6 @@ $ oc get installplan -n <subscription_namespace>
351346
----
352347
+
353348
.Example output for `cluster-logging` Operator
354-
+
355349
[source,terminal]
356350
----
357351
NAMESPACE NAME CSV APPROVAL APPROVED
@@ -373,7 +367,6 @@ $ oc get csv -n <operator_namespace>
373367
----
374368
+
375369
.Example output for OpenShift Logging Operator
376-
+
377370
[source,terminal]
378371
----
379372
NAME DISPLAY VERSION REPLACES PHASE

modules/cnf-topology-aware-lifecycle-manager-installation-cli.adoc

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
// Module included in the following assemblies:
22
// Epic CNF-2600 (CNF-2133) (4.10), Story TELCODOCS-285
3-
// * scalability_and_performance/cnf-talm-for-cluster-upgrades.adoc
3+
// * edge_computing/cnf-talm-for-cluster-upgrades.adoc
44

55
:_mod-docs-content-type: PROCEDURE
66
[id="installing-topology-aware-lifecycle-manager-using-cli_{context}"]
@@ -12,6 +12,7 @@ You can use the OpenShift CLI (`oc`) to install the {cgu-operator-first}.
1212

1313
* Install the OpenShift CLI (`oc`).
1414
* Install the latest version of the {rh-rhacm} Operator.
15+
* {cgu-operator} 4.16 requires {rh-rhacm} 2.9 or later.
1516
* Set up a hub cluster with disconnected registry.
1617
* Log in as a user with `cluster-admin` privileges.
1718

modules/cnf-topology-aware-lifecycle-manager-installation-web-console.adoc

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
// Module included in the following assemblies:
22
// Epic CNF-2600 (CNF-2133) (4.10), Story TELCODOCS-285
3-
// * scalability_and_performance/cnf-talm-for-cluster-upgrades.adoc
3+
// * edge_computing/cnf-talm-for-cluster-upgrades.adoc
44

55
:_mod-docs-content-type: PROCEDURE
66
[id="installing-topology-aware-lifecycle-manager-using-web-console_{context}"]
@@ -13,7 +13,8 @@ You can use the {product-title} web console to install the {cgu-operator-full}.
1313
// Based on polarion test cases
1414

1515
* Install the latest version of the {rh-rhacm} Operator.
16-
* Set up a hub cluster with disconnected regitry.
16+
* {cgu-operator} 4.16 requires {rh-rhacm} 2.9 or later.
17+
* Set up a hub cluster with a disconnected registry.
1718
* Log in as a user with `cluster-admin` privileges.
1819
1920
.Procedure

modules/cnf-topology-aware-lifecycle-manager-policies-concept.adoc

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,22 +1,22 @@
11
// Module included in the following assemblies:
22
// Epic CNF-2600 (CNF-2133) (4.10), Story TELCODOCS-285
3-
// * scalability_and_performance/cnf-talm-for-cluster-upgrades.adoc
3+
// * edge_computing/cnf-talm-for-cluster-upgrades.adoc
44

55
:_mod-docs-content-type: CONCEPT
66
[id="talo-policies-concept_{context}"]
77
= Update policies on managed clusters
88

9-
The {cgu-operator-first} remediates a set of `inform` policies for the clusters specified in the `ClusterGroupUpgrade` CR. {cgu-operator} remediates `inform` policies by making `enforce` copies of the managed {rh-rhacm} policies. Each copied policy has its own corresponding {rh-rhacm} placement rule and {rh-rhacm} placement binding.
9+
The {cgu-operator-first} remediates a set of `inform` policies for the clusters specified in the `ClusterGroupUpgrade` custom resource (CR). {cgu-operator} remediates `inform` policies by controlling the `remediationAction` specification in a `Policy` CR through the `bindingOverrides.remediationAction` and `subFilter` specifications in the `PlacementBinding` CR. Each policy has its own corresponding {rh-rhacm} placement rule and {rh-rhacm} placement binding.
1010

11-
One by one, {cgu-operator} adds each cluster from the current batch to the placement rule that corresponds with the applicable managed policy. If a cluster is already compliant with a policy, {cgu-operator} skips applying that policy on the compliant cluster. {cgu-operator} then moves on to applying the next policy to the non-compliant cluster. After {cgu-operator} completes the updates in a batch, all clusters are removed from the placement rules associated with the copied policies. Then, the update of the next batch starts.
11+
One by one, {cgu-operator} adds each cluster from the current batch to the placement rule that corresponds with the applicable managed policy. If a cluster is already compliant with a policy, {cgu-operator} skips applying that policy on the compliant cluster. {cgu-operator} then moves on to applying the next policy to the non-compliant cluster. After {cgu-operator} completes the updates in a batch, all clusters are removed from the placement rules associated with the policies. Then, the update of the next batch starts.
1212

1313
If a spoke cluster does not report any compliant state to {rh-rhacm}, the managed policies on the hub cluster can be missing status information that {cgu-operator} needs. {cgu-operator} handles these cases in the following ways:
1414

1515
* If a policy's `status.compliant` field is missing, {cgu-operator} ignores the policy and adds a log entry. Then, {cgu-operator} continues looking at the policy's `status.status` field.
1616
* If a policy's `status.status` is missing, {cgu-operator} produces an error.
1717
* If a cluster's compliance status is missing in the policy's `status.status` field, {cgu-operator} considers that cluster to be non-compliant with that policy.
1818
19-
The `ClusterGroupUpgrade` CR's `batchTimeoutAction` determines what happens if an upgrade fails for a cluster. You can specify `continue` to skip the failing cluster and continue to upgrade other clusters, or specify `abort` to stop the policy remediation for all clusters. Once the timeout elapses, {cgu-operator} removes all enforce policies to ensure that no further updates are made to clusters.
19+
The `ClusterGroupUpgrade` CR's `batchTimeoutAction` determines what happens if an upgrade fails for a cluster. You can specify `continue` to skip the failing cluster and continue to upgrade other clusters, or specify `abort` to stop the policy remediation for all clusters. Once the timeout elapses, {cgu-operator} removes all the resources it created to ensure that no further updates are made to clusters.
2020

2121
include::snippets/cnf-example-upgrade-policy.adoc[]
2222

0 commit comments

Comments
 (0)