|
| 1 | +// Module included in the following assemblies: |
| 2 | +// |
| 3 | +// * post_installation_configuration/machine-configuration-tasks.adoc |
| 4 | + |
| 5 | +:_mod-docs-content-type: PROCEDURE |
| 6 | +[id="machine-config-node-disruption_{context}"] |
| 7 | += Understanding node restart behaviors after machine config changes |
| 8 | + |
| 9 | +By default, when you make certain changes to the fields in a `MachineConfig` object, the Machine Config Operator (MCO) drains and reboots the nodes associated with that machine config. However, you can create a _node disruption policy_ that defines a set of changes to some Ignition config objects that would require little or no disruption to your workloads. |
| 10 | + |
| 11 | +A node disruption policy allows you to define the configuration changes that cause a disruption to your cluster, and which changes do not. This allows you to reduce node downtime when making small machine configuration changes in your cluster. To configure the policy, you modify the `MachineConfiguration` object, which is in the `openshift-machine-config-operator` namespace. See the example node disruption policies in the `MachineConfiguration` objects that follow. |
| 12 | + |
| 13 | +[NOTE] |
| 14 | +==== |
| 15 | +There are machine configuration changes that always require a reboot, regardless of any node disruption policies. For more information, see _About the Machine Config Operator_. |
| 16 | +==== |
| 17 | + |
| 18 | +After you create the node disruption policy, the MCO validates the policy to search for potential issues in the file, such as problems with formatting. The MCO then merges the policy with the cluster defaults and populates the `status.nodeDisruptionPolicyStatus` fields in the machine config with the actions to be performed upon future changes to the machine config. The configurations in your policy always overwrite the cluster defaults. |
| 19 | + |
| 20 | +[IMPORTANT] |
| 21 | +==== |
| 22 | +The MCO does not validate whether a change can be successfully applied by your node disruption policy. Therefore, you are responsible to ensure the accuracy of your node disruption policies. |
| 23 | +==== |
| 24 | + |
| 25 | +For example, you can configure a node disruption policy so that sudo configurations do not require a node drain and reboot. Or, you can configure your cluster so that updates to `sshd` are applied with only a reload of that one service. |
| 26 | + |
| 27 | +:FeatureName: The node disruption policy feature |
| 28 | +include::snippets/technology-preview.adoc[] |
| 29 | + |
| 30 | +You can control the behavior of the MCO when making the changes to the following Ignition configuration objects: |
| 31 | + |
| 32 | +// I used this wording for the objects to match the previous section in the assembly: file:///home/mburke/openshift-docs/_preview/openshift-enterprise/mco-node-disruption-policy/post_installation_configuration/machine-configuration-tasks.html#what-can-you-change-with-machine-configs. |
| 33 | +* *configuration files*: You add to or update the files in the `/var` or `/etc` directory. |
| 34 | +* *systemd units*: You create and set the status of a systemd service or modify an existing systemd service. |
| 35 | +* *users and groups*: You change SSH keys in the `passwd` section post-installation. |
| 36 | +* *ICSP*, *ITMS*, *IDMS* objects: You can remove mirroring rules from an `ImageContentSourcePolicy` (ICSP), `ImageTagMirrorSet` (ITMS), and `ImageDigestMirrorSet` (IDMS) object. |
| 37 | +
|
| 38 | +include::snippets/machine-config-node-disruption-actions.adoc[] |
| 39 | + |
| 40 | +// Examples taken from the test cases: https://polarion.engineering.redhat.com/polarion/#/project/OSE/workitems/testcase?query=trello%3AMCO%5C-507 |
| 41 | + |
| 42 | +[id="machine-config-node-disruption-example_{context}"] |
| 43 | +== Example node disruption policies |
| 44 | + |
| 45 | +The following example `MachineConfiguration` objects contain a node disruption policy. |
| 46 | + |
| 47 | +[TIP] |
| 48 | +==== |
| 49 | +A `MachineConfiguration` object and a `MachineConfig` object are different objects. A `MachineConfiguration` object is a singleton object in the MCO namespace that contains configuration parameters for the MCO operator. A `MachineConfig` object defines changes that are applied to a machine config pool. |
| 50 | +==== |
| 51 | + |
| 52 | +The following example `MachineConfiguration` object shows no user defined policies. The default node disruption policy values are shown in the `status` stanza. |
| 53 | + |
| 54 | +.Default node disruption policy |
| 55 | +[source,yaml] |
| 56 | +---- |
| 57 | +apiVersion: operator.openshift.io/v1 |
| 58 | +kind: MachineConfiguration |
| 59 | + name: cluster |
| 60 | +spec: |
| 61 | + logLevel: Normal |
| 62 | + managementState: Managed |
| 63 | + operatorLogLevel: Normal |
| 64 | +status: |
| 65 | + nodeDisruptionPolicyStatus: |
| 66 | + clusterPolicies: |
| 67 | + files: |
| 68 | + - actions: |
| 69 | + - type: None |
| 70 | + path: /etc/mco/internal-registry-pull-secret.json |
| 71 | + - actions: |
| 72 | + - type: None |
| 73 | + path: /var/lib/kubelet/config.json |
| 74 | + - actions: |
| 75 | + - reload: |
| 76 | + serviceName: crio.service |
| 77 | + type: Reload |
| 78 | + path: /etc/machine-config-daemon/no-reboot/containers-gpg.pub |
| 79 | + - actions: |
| 80 | + - reload: |
| 81 | + serviceName: crio.service |
| 82 | + type: Reload |
| 83 | + path: /etc/containers/policy.json |
| 84 | + - actions: |
| 85 | + - type: Special |
| 86 | + path: /etc/containers/registries.conf |
| 87 | + sshkey: |
| 88 | + actions: |
| 89 | + - type: None |
| 90 | + readyReplicas: 0 |
| 91 | +---- |
| 92 | + |
| 93 | +In the following example, when changes are made to the SSH keys, the MCO drains the cluster nodes, reloads the `crio.service`, reloads the systemd configuration, and restarts the `crio-service`. |
| 94 | + |
| 95 | +.Example node disruption policy for an SSH key change |
| 96 | +[source,yaml] |
| 97 | +---- |
| 98 | +apiVersion: operator.openshift.io/v1 |
| 99 | +kind: MachineConfiguration |
| 100 | +metadata: |
| 101 | + name: cluster |
| 102 | + namespace: openshift-machine-config-operator |
| 103 | +# ... |
| 104 | +spec: |
| 105 | + nodeDisruptionPolicy: |
| 106 | + sshkey: |
| 107 | + actions: |
| 108 | + - type: Drain |
| 109 | + - reload: |
| 110 | + serviceName: crio.service |
| 111 | + type: Reload |
| 112 | + - type: DaemonReload |
| 113 | + - restart: |
| 114 | + serviceName: crio.service |
| 115 | + type: Restart |
| 116 | +# ... |
| 117 | +---- |
| 118 | + |
| 119 | +In the following example, when changes are made to the files in the `/etc/chrony.conf` directory, the MCO reloads the `chronyd.service` on the cluster nodes. |
| 120 | + |
| 121 | +.Example node disruption policy for a configuration file change |
| 122 | +[source,yaml] |
| 123 | +---- |
| 124 | +apiVersion: operator.openshift.io/v1 |
| 125 | +kind: MachineConfiguration |
| 126 | +metadata: |
| 127 | + name: cluster |
| 128 | + namespace: openshift-machine-config-operator |
| 129 | +# ... |
| 130 | +spec: |
| 131 | + nodeDisruptionPolicy: |
| 132 | + files: |
| 133 | + - actions: |
| 134 | + - reload: |
| 135 | + serviceName: chronyd.service |
| 136 | + type: Reload |
| 137 | + path: /etc/chrony.conf |
| 138 | +---- |
| 139 | + |
| 140 | +In the following example, when changes are made to the `auditd.service` systemd unit, the MCO drains the cluster nodes, reloads the `crio.service`, reloads the systemd manager configuration, and restarts the `crio.service`. |
| 141 | + |
| 142 | +.Example node disruption policy for a configuration file change |
| 143 | +[source,yaml] |
| 144 | +---- |
| 145 | +apiVersion: operator.openshift.io/v1 |
| 146 | +kind: MachineConfiguration |
| 147 | +metadata: |
| 148 | + name: cluster |
| 149 | + namespace: openshift-machine-config-operator |
| 150 | +# ... |
| 151 | +spec: |
| 152 | + nodeDisruptionPolicy: |
| 153 | + units: |
| 154 | + - name: auditd.service |
| 155 | + actions: |
| 156 | + - type: Drain |
| 157 | + - type: Reload |
| 158 | + reload: |
| 159 | + serviceName: crio.service |
| 160 | + - type: DaemonReload |
| 161 | + - type: Restart |
| 162 | + restart: |
| 163 | + serviceName: crio.service |
| 164 | +---- |
| 165 | + |
| 166 | +In the following example, when changes are made to the files in the `registries.conf` directory, the MCO does not drain or reboot the nodes and applies the changes with no further action. |
| 167 | + |
| 168 | +.Example node disruption policy for a configuration file change |
| 169 | +[source,yaml] |
| 170 | +---- |
| 171 | +apiVersion: operator.openshift.io/v1 |
| 172 | +kind: MachineConfiguration |
| 173 | +metadata: |
| 174 | + name: cluster |
| 175 | + namespace: openshift-machine-config-operator |
| 176 | +# ... |
| 177 | +spec: |
| 178 | + nodeDisruptionPolicy: |
| 179 | + - actions: |
| 180 | + - type: None |
| 181 | + path: /etc/containers/registries.conf |
| 182 | +---- |
0 commit comments