You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: backup_and_restore/control_plane_backup_and_restore/disaster_recovery/about-disaster-recovery.adoc
+14-6Lines changed: 14 additions & 6 deletions
Original file line number
Diff line number
Diff line change
@@ -17,10 +17,17 @@ state.
17
17
Disaster recovery requires you to have at least one healthy control plane host.
18
18
====
19
19
20
+
xref:../../../backup_and_restore/control_plane_backup_and_restore/disaster_recovery/quorum-restoration.adoc#dr-quorum-restoration[Quorum restoration]:: This solution handles situations where you have lost the majority of your control plane hosts, leading to etcd quorum loss and the cluster going offline. This solution does not require an etcd backup.
21
+
+
22
+
[NOTE]
23
+
====
24
+
If you have a majority of your control plane nodes still available and have an etcd quorum, then xref:../../../backup_and_restore/control_plane_backup_and_restore/replacing-unhealthy-etcd-member.adoc#replacing-unhealthy-etcd-member[replace a single unhealthy etcd member].
25
+
====
26
+
20
27
xref:../../../backup_and_restore/control_plane_backup_and_restore/disaster_recovery/scenario-2-restoring-cluster-state.adoc#dr-restoring-cluster-state[Restoring to a previous cluster state]::
21
28
This solution handles situations where you want to restore your cluster to
22
29
a previous state, for example, if an administrator deletes something critical.
23
-
This also includes situations where you have lost the majority of your control plane hosts, leading to etcd quorum loss and the cluster going offline. As long as you have taken an etcd backup, you can follow this procedure to restore your cluster to a previous state.
30
+
If you have taken an etcd backup, you can restore your cluster to a previous state.
24
31
+
25
32
If applicable, you might also need to xref:../../../backup_and_restore/control_plane_backup_and_restore/disaster_recovery/scenario-3-expired-certs.adoc#dr-recovering-expired-certs[recover from expired control plane certificates].
26
33
+
@@ -30,15 +37,16 @@ Restoring to a previous cluster state is a destructive and destablizing action t
30
37
31
38
Prior to performing a restore, see xref:../../../backup_and_restore/control_plane_backup_and_restore/disaster_recovery/scenario-2-restoring-cluster-state.adoc#dr-scenario-2-restoring-cluster-state-about_dr-restoring-cluster-state[About restoring cluster state] for more information on the impact to the cluster.
32
39
====
33
-
+
34
-
[NOTE]
35
-
====
36
-
If you have a majority of your masters still available and have an etcd quorum, then follow the procedure to xref:../../../backup_and_restore/control_plane_backup_and_restore/replacing-unhealthy-etcd-member.adoc#replacing-unhealthy-etcd-member[replace a single unhealthy etcd member].
37
-
====
38
40
39
41
xref:../../../backup_and_restore/control_plane_backup_and_restore/disaster_recovery/scenario-3-expired-certs.adoc#dr-recovering-expired-certs[Recovering from expired control plane certificates]::
40
42
This solution handles situations where your control plane certificates have
41
43
expired. For example, if you shut down your cluster before the first certificate
42
44
rotation, which occurs 24 hours after installation, your certificates will not
43
45
be rotated and will expire. You can follow this procedure to recover from
* xref:../../../backup_and_restore/control_plane_backup_and_restore/disaster_recovery/scenario-2-restoring-cluster-state.adoc#dr-restoring-cluster-state[Restoring to a previous cluster state]
Copy file name to clipboardExpand all lines: backup_and_restore/control_plane_backup_and_restore/disaster_recovery/scenario-2-restoring-cluster-state.adoc
+3-2Lines changed: 3 additions & 2 deletions
Original file line number
Diff line number
Diff line change
@@ -11,6 +11,9 @@ To restore the cluster to a previous state, you must have previously xref:../../
* xref:../../../installing/installing_bare_metal/ipi/ipi-install-expanding-the-cluster.adoc#replacing-a-bare-metal-control-plane-node_ipi-install-expanding[Replacing a bare-metal control plane node]
Copy file name to clipboardExpand all lines: modules/dr-restoring-cluster-state-about.adoc
+1-1Lines changed: 1 addition & 1 deletion
Original file line number
Diff line number
Diff line change
@@ -18,7 +18,7 @@ Restoring to a previous cluster state is a destructive and destablizing action t
18
18
If you are able to retrieve data using the Kubernetes API server, then etcd is available and you should not restore using an etcd backup.
19
19
====
20
20
21
-
Restoring etcd effectively takes a cluster back in time and all clients will experience a conflicting, parallel history. This can impact the behavior of watching components like kubelets, Kubernetes controller managers, persistent volume controllers, and OpenShift operators, including the network operator.
21
+
Restoring etcd effectively takes a cluster back in time and all clients will experience a conflicting, parallel history. This can impact the behavior of watching components like kubelets, Kubernetes controller managers, persistent volume controllers, and {product-title} Operators, including the network Operator.
22
22
23
23
It can cause Operator churn when the content in etcd does not match the actual content on disk, causing Operators for the Kubernetes API server, Kubernetes controller manager, Kubernetes scheduler, and etcd to get stuck when files on disk conflict with content in etcd. This can require manual actions to resolve the issues.
= Restoring to a previous cluster state for a single node
8
+
9
+
You can use a saved etcd backup to restore a previous cluster state on a single node.
10
+
11
+
[IMPORTANT]
12
+
====
13
+
When you restore your cluster, you must use an etcd backup that was taken from the same z-stream release. For example, an {product-title} {product-version}.2 cluster must use an etcd backup that was taken from {product-version}.2.
14
+
====
15
+
16
+
.Prerequisites
17
+
18
+
* Access to the cluster as a user with the `cluster-admin` role through a certificate-based `kubeconfig` file, like the one that was used during installation.
19
+
* You have SSH access to control plane hosts.
20
+
* A backup directory containing both the etcd snapshot and the resources for the static pods, which were from the same backup. The file names in the directory must be in the following formats: `snapshot_<datetimestamp>.db` and `static_kuberesources_<datetimestamp>.tar.gz`.
21
+
22
+
.Procedure
23
+
24
+
. Use SSH to connect to the single node and copy the etcd backup to the `/home/core` directory by running the following command:
25
+
+
26
+
[source,terminal]
27
+
----
28
+
$ cp <etcd_backup_directory> /home/core
29
+
----
30
+
31
+
. Run the following command in the single node to restore the cluster from a previous backup:
0 commit comments