Skip to content

Commit ad7e95b

Browse files
authored
Merge pull request #93389 from lahinson/osdocs-12351-etcd-backup-assembly-division
[OSDOCS-12351]: Dividing etcd backup assembly
2 parents 8bb1375 + 5c55e00 commit ad7e95b

31 files changed

+225
-168
lines changed

_topic_maps/_topic_map.yml

Lines changed: 8 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2526,7 +2526,14 @@ Topics:
25262526
- Name: Performance considerations for etcd
25272527
File: etcd-performance
25282528
- Name: Backing up and restoring etcd data
2529-
File: etcd-backup
2529+
Dir: etcd-backup-restore
2530+
Topics:
2531+
- Name: Backing up etcd
2532+
File: etcd-backup
2533+
- Name: Replacing an unhealthy etcd member
2534+
File: replace-unhealthy-etcd-member
2535+
- Name: Disaster recovery
2536+
File: etcd-disaster-recovery
25302537
- Name: Encrypting etcd data
25312538
File: etcd-encrypt
25322539
- Name: Setting up fault-tolerant control planes that span data centers

backup_and_restore/control_plane_backup_and_restore/backing-up-etcd.adoc

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,7 @@
1+
//NOTE TO CONTRIBUTORS:
2+
//
3+
//If you update any of the content in this assembly file, be sure to also make the same changes in the assemblies in the following file: backup_and_restore/control_plane_backup_and_restore/etcd-backup.adoc.
4+
15
:_mod-docs-content-type: ASSEMBLY
26
[id="backup-etcd"]
37
= Backing up etcd

backup_and_restore/control_plane_backup_and_restore/disaster_recovery/about-disaster-recovery.adoc

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,7 @@
1+
//NOTE TO CONTRIBUTORS:
2+
//
3+
//If you update any of the content in this assembly file, be sure to also make the same changes in the assemblies in the following directory: etcd/etcd-backup-restore/etcd-disaster-recovery.adoc.
4+
15
:_mod-docs-content-type: ASSEMBLY
26
[id="about-dr"]
37
= About disaster recovery

backup_and_restore/control_plane_backup_and_restore/disaster_recovery/quorum-restoration.adoc

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,7 @@
1+
//NOTE TO CONTRIBUTORS:
2+
//
3+
//If you update any of the content in this assembly file, be sure to also make the same changes in the assemblies in the following directory: etcd/etcd-backup-restore/etcd-disaster-recovery.adoc.
4+
15
:_mod-docs-content-type: ASSEMBLY
26
[id="dr-quorum-restoration"]
37
= Quorum restoration

backup_and_restore/control_plane_backup_and_restore/disaster_recovery/scenario-2-restoring-cluster-state.adoc

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,7 @@
1+
//NOTE TO CONTRIBUTORS:
2+
//
3+
//If you update any of the content in this assembly file, be sure to also make the same changes in the assemblies in the following directory: etcd/etcd-backup-restore/etcd-disaster-recovery.adoc.
4+
15
:_mod-docs-content-type: ASSEMBLY
26
[id="dr-restoring-cluster-state"]
37
= Restoring to a previous cluster state

backup_and_restore/control_plane_backup_and_restore/disaster_recovery/scenario-3-expired-certs.adoc

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,7 @@
1+
//NOTE TO CONTRIBUTORS:
2+
//
3+
//If you update any of the content in this assembly file, be sure to also make the same changes in the assemblies in the following directory: etcd/etcd-backup-restore/etcd-disaster-recovery.adoc.
4+
15
:_mod-docs-content-type: ASSEMBLY
26
[id="dr-recovering-expired-certs"]
37
= Recovering from expired control plane certificates

backup_and_restore/control_plane_backup_and_restore/replacing-unhealthy-etcd-member.adoc

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,7 @@
1+
//NOTE TO CONTRIBUTORS:
2+
//
3+
//If you update any of the content in this assembly file, be sure to also make the same changes in the assemblies in the following file: etcd/etcd-backup-and-restore/replace-unhealthy-etcd-member.adoc
4+
15
:_mod-docs-content-type: ASSEMBLY
26
[id="replacing-unhealthy-etcd-member"]
37
= Replacing an unhealthy etcd member

etcd/etcd-backup-restore/_attributes

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
../_attributes/
Lines changed: 36 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,36 @@
1+
//NOTE TO CONTRIBUTORS:
2+
//
3+
//If you update any of the content in this assembly file, be sure to also make the same changes in the assemblies in the following file: backup_and_restore/control_plane_backup_and_restore/backing-up-etcd.adoc.
4+
5+
:_mod-docs-content-type: ASSEMBLY
6+
[id="etcd-backup"]
7+
include::_attributes/common-attributes.adoc[]
8+
= Backing up and restoring etcd data
9+
:context: etcd-backup
10+
11+
toc::[]
12+
13+
As the key-value store for {product-title}, etcd persists the state of all resource objects.
14+
15+
Back up the etcd data for your cluster regularly and store it in a secure location, ideally outside the {product-title} environment. Do not take an etcd backup before the first certificate rotation completes, which occurs 24 hours after installation, otherwise the backup will contain expired certificates. It is also recommended to take etcd backups during non-peak usage hours because the etcd snapshot has a high I/O cost.
16+
17+
Be sure to take an etcd backup before you update your cluster. Taking a backup before you update is important because when you restore your cluster, you must use an etcd backup that was taken from the same z-stream release. For example, an {product-title} 4.17.5 cluster must use an etcd backup that was taken from 4.17.5.
18+
19+
[IMPORTANT]
20+
====
21+
Back up your cluster's etcd data by performing a single invocation of the backup script on a control plane host. Do not take a backup for each control plane host.
22+
====
23+
24+
After you have an etcd backup, you can xref:../../backup_and_restore/control_plane_backup_and_restore/disaster_recovery/scenario-2-restoring-cluster-state.adoc#dr-restoring-cluster-state[restore to a previous cluster state].
25+
26+
// Backing up etcd data
27+
include::modules/backup-etcd.adoc[leveloffset=+1]
28+
29+
[role="_additional-resources"]
30+
.Additional resources
31+
* xref:../../hosted_control_planes/hcp_high_availability/hcp-recovering-etcd-cluster.adoc#hcp-recovering-etcd-cluster[Recovering an unhealthy etcd cluster]
32+
33+
// Creating automated etcd backups
34+
include::modules/etcd-creating-automated-backups.adoc[leveloffset=+1]
35+
include::modules/creating-single-etcd-backup.adoc[leveloffset=+2]
36+
include::modules/creating-recurring-etcd-backups.adoc[leveloffset=+2]
Lines changed: 81 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,81 @@
1+
//NOTE TO CONTRIBUTORS:
2+
//
3+
//If you update any of the content in this assembly file, be sure to also make the same changes in the assemblies in the following directory: backup_and_restore/control_plane_backup_and_restore/disaster-recovery/.
4+
5+
:_mod-docs-content-type: ASSEMBLY
6+
[id="etcd-disaster-recovery"]
7+
include::_attributes/common-attributes.adoc[]
8+
= Disaster recovery
9+
:context: etcd-disaster-recovery
10+
11+
toc::[]
12+
13+
The disaster recovery documentation provides information for administrators on how to recover from several disaster situations that might occur with their {product-title} cluster. As an administrator, you might need to follow one or more of the following procedures to return your cluster to a working state.
14+
15+
[IMPORTANT]
16+
====
17+
Disaster recovery requires you to have at least one healthy control plane host.
18+
====
19+
20+
[id="etcd-dr-quorum"]
21+
== Quorum restoration
22+
23+
You can use the `quorum-restore.sh` script to restore etcd quorum on clusters that are offline due to quorum loss. When quorum is lost, the {product-title} API becomes read-only. After quorum is restored, the {product-title} API returns to read/write mode.
24+
25+
// Restoring etcd quorum for high availability clusters
26+
include::modules/dr-restoring-etcd-quorum-ha.adoc[leveloffset=+2]
27+
28+
[role="_additional-resources"]
29+
.Additional resources
30+
* xref:../../installing/installing_bare_metal/upi/installing-bare-metal.adoc#installing-bare-metal[Installing a user-provisioned cluster on bare metal]
31+
* xref:../../installing/installing_bare_metal/bare-metal-expanding-the-cluster.adoc#replacing-a-bare-metal-control-plane-node_bare-metal-expanding[Replacing a bare-metal control plane node]
32+
33+
[NOTE]
34+
====
35+
If you have a majority of your control plane nodes still available and have an etcd quorum, xref:../../backup_and_restore/control_plane_backup_and_restore/replacing-unhealthy-etcd-member.adoc#replacing-unhealthy-etcd-member[replace a single unhealthy etcd member].
36+
====
37+
38+
[id="etcd-dr-restore"]
39+
== Restoring to a previous cluster state
40+
41+
To restore the cluster to a previous state, you must have previously backed up the `etcd` data by creating a snapshot. You will use this snapshot to restore the cluster state. For more information, see "Backing up etcd data".
42+
43+
If applicable, you might also need to xref:../../backup_and_restore/control_plane_backup_and_restore/disaster_recovery/scenario-3-expired-certs.adoc#dr-recovering-expired-certs[recover from expired control plane certificates].
44+
45+
[WARNING]
46+
====
47+
Restoring to a previous cluster state is a destructive and destablizing action to take on a running cluster. This procedure should only be used as a last resort.
48+
49+
Before performing a restore, see "About restoring to a previous cluster state" for more information on the impact to the cluster.
50+
====
51+
52+
// About restoring to a previous cluster state
53+
include::modules/dr-restoring-cluster-state-about.adoc[leveloffset=+2]
54+
55+
// Restoring to a previous cluster state for a single node
56+
include::modules/dr-restoring-cluster-state-sno.adoc[leveloffset=+2]
57+
58+
// Restoring to a previous cluster state
59+
include::modules/dr-restoring-cluster-state.adoc[leveloffset=+2]
60+
61+
// Restoring a cluster from etcd backup manually
62+
include::modules/manually-restoring-cluster-etcd-backup.adoc[leveloffset=+2]
63+
64+
[role="_additional-resources"]
65+
.Additional resources
66+
* xref:../../backup_and_restore/control_plane_backup_and_restore/backing-up-etcd.adoc#backing-up-etcd-data_backup-etcd[Backing up etcd data]
67+
* xref:../../installing/installing_bare_metal/upi/installing-bare-metal.adoc#installing-bare-metal[Installing a user-provisioned cluster on bare metal]
68+
* xref:../../networking/accessing-hosts.adoc#accessing-hosts-on-aws_accessing-hosts[Accessing hosts on Amazon Web Services in an installer-provisioned infrastructure cluster]
69+
* xref:../../installing/installing_bare_metal/bare-metal-expanding-the-cluster.adoc#replacing-a-bare-metal-control-plane-node_bare-metal-expanding[Replacing a bare-metal control plane node]
70+
71+
include::modules/dr-scenario-cluster-state-issues.adoc[leveloffset=+2]
72+
73+
// Recovering from expired control plane certificates
74+
include::modules/dr-recover-expired-control-plane-certs.adoc[leveloffset=+1]
75+
76+
//Testing restore procedures
77+
include::modules/dr-testing-restore-procedures.adoc[leveloffset=+1]
78+
79+
[role="_additional-resources"]
80+
.Additional resources
81+
* xref:../../backup_and_restore/control_plane_backup_and_restore/disaster_recovery/scenario-2-restoring-cluster-state.adoc#dr-restoring-cluster-state[Restoring to a previous cluster state]

0 commit comments

Comments
 (0)