Skip to content

Commit a6ec096

Browse files
committed
OSDOCS#10478: Add disaster recovery using the OADP Operator
1 parent c36919f commit a6ec096

File tree

8 files changed

+449
-0
lines changed

8 files changed

+449
-0
lines changed

_topic_maps/_topic_map.yml

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2404,6 +2404,8 @@ Topics:
24042404
- Name: High availability for hosted control planes
24052405
Dir: hcp_high_availability
24062406
Topics:
2407+
- Name: About high availability for hosted control planes
2408+
File: about-hcp-ha
24072409
- Name: Recovering a failing etcd cluster
24082410
File: hcp-recovering-etcd-cluster
24092411
- Name: Backing up and restoring etcd in an on-premise environment
@@ -2412,6 +2414,8 @@ Topics:
24122414
File: hcp-backup-restore-aws
24132415
- Name: Disaster recovery for a hosted cluster in AWS
24142416
File: hcp-disaster-recovery-aws
2417+
- Name: Disaster recovery for a hosted cluster by using OADP
2418+
File: hcp-disaster-recovery-oadp
24152419
- Name: Troubleshooting hosted control planes
24162420
File: hcp-troubleshooting
24172421
---
Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,15 @@
1+
:_mod-docs-content-type: ASSEMBLY
2+
[id="about-hcp-ha"]
3+
= About high availability for hosted control planes
4+
include::_attributes/common-attributes.adoc[]
5+
:context: about-hcp-ha
6+
7+
toc::[]
8+
9+
You can maintain high availability (HA) of hosted control planes by implementing the following actions:
10+
11+
* Recover etcd members for a hosted cluster.
12+
* Back up and restore etcd for a hosted cluster.
13+
* Perform a disaster recovery process for a hosted cluster.
14+
15+
include::modules/hcp-mgmt-component-loss-impact.adoc[leveloffset=+1]
Lines changed: 85 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,85 @@
1+
:_mod-docs-content-type: ASSEMBLY
2+
[id="hcp-disaster-recovery-oadp"]
3+
= Disaster recovery for a hosted cluster by using {oadp-short}
4+
include::_attributes/common-attributes.adoc[]
5+
:context: hcp-disaster-recovery-oadp
6+
7+
toc::[]
8+
9+
You can use the {oadp-first} Operator to perform disaster recovery on {aws-first} and bare metal.
10+
11+
The disaster recovery process with {oadp-first} involves the following steps:
12+
13+
. Preparing your platform, such as {aws-full} or bare metal, to use {oadp-short}
14+
. Backing up the data plane workload
15+
. Backing up the control plane workload
16+
. Restoring a hosted cluster by using {oadp-short}
17+
18+
[id="prerequisites_{context}"]
19+
== Prerequisites
20+
21+
You must meet the following prerequisites on the management cluster:
22+
23+
* You xref:../../backup_and_restore/application_backup_and_restore/installing/oadp-installing-operator.adoc#oadp-installing-operator[installed the {oadp-short} Operator].
24+
* You created a storage class.
25+
* You have access to the cluster with `cluster-admin` privileges.
26+
* You have access to the {oadp-short} subscription through a catalog source.
27+
* You have access to a cloud storage provider that is compatible with {oadp-short}, such as S3, {azure-full}, {gcp-full}, or MinIO.
28+
* In a disconnected environment, you have access to a self-hosted storage provider, for example link:https://docs.redhat.com/en/documentation/red_hat_openshift_data_foundation/[{odf-full}] or link:https://min.io/[MinIO], that is compatible with {oadp-short}.
29+
* Your hosted control planes pods are up and running.
30+
31+
[id="prepare-aws-oadp_{context}"]
32+
== Preparing {aws-short} to use {oadp-short}
33+
34+
To perform disaster recovery for a hosted cluster, you can use {oadp-first} on {aws-first} S3 compatible storage. After creating the `DataProtectionApplication` object, new `velero` deployment and `node-agent` pods are created in the `openshift-adp` namespace.
35+
36+
To prepare {aws-short} to use {oadp-short}, see "Configuring the {oadp-full} with Multicloud Object Gateway".
37+
38+
[role="_additional-resources"]
39+
.Additional resources
40+
41+
* xref:../../backup_and_restore/application_backup_and_restore/installing/installing-oadp-aws.adoc#installing-oadp-aws[Configuring the {oadp-full} with Multicloud Object Gateway]
42+
43+
.Next steps
44+
45+
* Backing up the data plane workload
46+
* Backing up the control plane workload
47+
48+
[id="prepare-bm-dr-oadp_{context}"]
49+
== Preparing bare metal to use {oadp-short}
50+
51+
To perform disaster recovery for a hosted cluster, you can use {oadp-first} on bare metal. After creating the `DataProtectionApplication` object, new `velero` deployment and `node-agent` pods are created in the `openshift-adp` namespace.
52+
53+
To prepare bare metal to use {oadp-short}, see "Configuring the {oadp-full} with AWS S3 compatible storage".
54+
55+
[role="_additional-resources"]
56+
.Additional resources
57+
58+
* xref:../../backup_and_restore/application_backup_and_restore/installing/installing-oadp-mcg.adoc#installing-oadp-mcg[Configuring the {oadp-full} with AWS S3 compatible storage]
59+
60+
.Next steps
61+
62+
* Backing up the data plane workload
63+
* Backing up the control plane workload
64+
65+
[id="backing-up-data-plane-oadp_{context}"]
66+
== Backing up the data plane workload
67+
68+
If the data plane workload is not important, you can skip this procedure. To back up the data plane workload by using the {oadp-short} Operator, see "Backing up applications".
69+
70+
[role="_additional-resources"]
71+
.Additional resources
72+
73+
* xref:../../backup_and_restore/application_backup_and_restore/backing_up_and_restoring/backing-up-applications.adoc#backing-up-applications[Backing up applications]
74+
75+
.Next steps
76+
77+
* Restoring a hosted cluster by using {oadp-short}
78+
79+
include::modules/hcp-dr-oadp-backup-cp-workload.adoc[leveloffset=+1]
80+
81+
include::modules/hcp-dr-oadp-restore.adoc[leveloffset=+1]
82+
83+
include::modules/hcp-dr-oadp-observe.adoc[leveloffset=+1]
84+
85+
include::modules/hcp-dr-oadp-observe-velero.adoc[leveloffset=+1]
Lines changed: 124 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,124 @@
1+
// Module included in the following assemblies:
2+
//
3+
// * hosted_control_planes/hcp-disaster-recovery-oadp.adoc
4+
5+
:_mod-docs-content-type: REFERENCE
6+
[id="hcp-dr-oadp-backup-cp-workload_{context}"]
7+
= Backing up the control plane workload
8+
9+
You can back up the control plane workload by creating the `Backup` custom resource (CR).
10+
11+
To monitor and observe the backup process, see "Observing the backup and restore process".
12+
13+
.Procedure
14+
15+
. Scale down the `NodePool` replicas to `0` by running the following command:
16+
+
17+
[source,terminal]
18+
----
19+
$ oc --kubeconfig <management_cluster_kubeconfig_file> \
20+
scale nodepool -n <hosted_cluster_namespace> \
21+
<node_pool_name> --replicas 0
22+
----
23+
24+
. Pause the reconciliation of the `HostedCluster` resource by running the following command:
25+
+
26+
[source,terminal]
27+
----
28+
$ oc --kubeconfig <management_cluster_kubeconfig_file> \
29+
patch hostedcluster -n <hosted_cluster_namespace> <hosted_cluster_name> \
30+
--type json -p '[{"op": "add", "path": "/spec/pausedUntil", "value": "true"}]'
31+
----
32+
33+
. Pause the reconciliation of the `NodePool` resource by running the following command:
34+
+
35+
[source,terminal]
36+
----
37+
$ oc --kubeconfig <management_cluster_kubeconfig_file> \
38+
patch nodepool -n <hosted_cluster_namespace> <node_pool_name> \
39+
--type json -p '[{"op": "add", "path": "/spec/pausedUntil", "value": "true"}]'
40+
----
41+
42+
. Create a YAML file that defines the `Backup` CR:
43+
+
44+
.Example `backup-control-plane.yaml` file
45+
[%collapsible]
46+
====
47+
[source,yaml]
48+
----
49+
apiVersion: velero.io/v1
50+
kind: Backup
51+
metadata:
52+
name: <backup_resource_name> <1>
53+
namespace: openshift-adp
54+
labels:
55+
velero.io/storage-location: default
56+
spec:
57+
hooks: {}
58+
includedNamespaces: <2>
59+
- <hosted_cluster_namespace> <3>
60+
- <hosted_control_plane_namespace> <4>
61+
includedResources:
62+
- sa
63+
- role
64+
- rolebinding
65+
- pod
66+
- pvc
67+
- pv
68+
- bmh
69+
- configmap
70+
- infraenv <5>
71+
- priorityclasses
72+
- pdb
73+
- agents
74+
- hostedcluster
75+
- nodepool
76+
- secrets
77+
- hostedcontrolplane
78+
- cluster
79+
- agentcluster
80+
- agentmachinetemplate
81+
- agentmachine
82+
- machinedeployment
83+
- machineset
84+
- machine
85+
excludedResources: []
86+
storageLocation: default
87+
ttl: 2h0m0s
88+
snapshotMoveData: true <6>
89+
datamover: "velero" <6>
90+
defaultVolumesToFsBackup: true <7>
91+
----
92+
====
93+
<1> Replace `backup_resource_name` with the name of your `Backup` resource.
94+
<2> Selects specific namespaces to back up objects from them. You must include your hosted cluster namespace and the hosted control plane namespace.
95+
<3> Replace `<hosted_cluster_namespace>` with the name of the hosted cluster namespace, for example, `clusters`.
96+
<4> Replace `<hosted_control_plane_namespace>` with the name of the hosted control plane namespace, for example, `clusters-hosted`.
97+
<5> You must create the `infraenv` resource in a separate namespace. Do not delete the `infraenv` resource during the backup process.
98+
<6> Enables the CSI volume snapshots and uploads the control plane workload automatically to the cloud storage.
99+
<7> Sets the `fs-backup` backing up method for persistent volumes (PVs) as default. This setting is useful when you use a combination of Container Storage Interface (CSI) volume snapshots and the `fs-backup` method.
100+
+
101+
[NOTE]
102+
====
103+
If you want to use CSI volume snapshots, you must add the `backup.velero.io/backup-volumes-excludes=<pv_name>` annotation to your PVs.
104+
====
105+
106+
. Apply the `Backup` CR by running the following command:
107+
+
108+
[source,terminal]
109+
----
110+
$ oc apply -f backup-control-plane.yaml
111+
----
112+
113+
.Verification
114+
115+
* Verify if the value of the `status.phase` is `Completed` by running the following command:
116+
+
117+
[source,terminal]
118+
----
119+
$ oc get backup <backup_resource_name> -n openshift-adp -o jsonpath='{.status.phase}'
120+
----
121+
122+
.Next steps
123+
124+
* Restoring a hosted cluster by using OADP
Lines changed: 34 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,34 @@
1+
// Module included in the following assemblies:
2+
//
3+
// * hosted_control_planes/hcp-disaster-recovery-oadp.adoc
4+
5+
:_mod-docs-content-type: PROCEDURE
6+
[id="hcp-dr-oadp-observe-velero_{context}"]
7+
= Using the velero CLI to describe the Backup and Restore resources
8+
9+
When using {oadp-full}, you can get more details of the `Backup` and `Restore` resources by using the `velero` command-line interface (CLI).
10+
11+
.Procedure
12+
13+
. Create an alias to use the `velero` CLI from a container by running the following command:
14+
+
15+
[source,terminal]
16+
----
17+
$ alias velero='oc -n openshift-adp exec deployment/velero -c velero -it -- ./velero'
18+
----
19+
20+
. Get details of your `Restore` custom resource (CR) by running the following command:
21+
+
22+
[source,terminal]
23+
----
24+
$ velero restore describe <restore_resource_name> --details <1>
25+
----
26+
<1> Replace `<restore_resource_name>` with the name of your `Restore` resource.
27+
28+
. Get details of your `Backup` CR by running the following command:
29+
+
30+
[source,terminal]
31+
----
32+
$ velero restore describe <backup_resource_name> --details <1>
33+
----
34+
<1> Replace `<backup_resource_name>` with the name of your `Backup` resource.

modules/hcp-dr-oadp-observe.adoc

Lines changed: 39 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,39 @@
1+
// Module included in the following assemblies:
2+
//
3+
// * hosted_control_planes/hcp-disaster-recovery-oadp.adoc
4+
5+
:_mod-docs-content-type: PROCEDURE
6+
[id="hcp-dr-oadp-observe_{context}"]
7+
= Observing the backup and restore process
8+
9+
When using {oadp-first} to backup and restore a hosted cluster, you can monitor and observe the process.
10+
11+
.Procedure
12+
13+
. Observe the backup process by running the following command:
14+
+
15+
[source,terminal]
16+
----
17+
$ watch "oc get backup -n openshift-adp <backup_resource_name> -o jsonpath='{.status}'"
18+
----
19+
20+
. Observe the restore process by running the following command:
21+
+
22+
[source,terminal]
23+
----
24+
$ watch "oc get restore -n openshift-adp <backup_resource_name> -o jsonpath='{.status}'"
25+
----
26+
27+
. Observe the Velero logs by running the following command:
28+
+
29+
[source,terminal]
30+
----
31+
$ oc logs -n openshift-adp -ldeploy=velero -f
32+
----
33+
34+
. Observe the progress of all of the {oadp-short} objects by running the following command:
35+
+
36+
[source,terminal]
37+
----
38+
$ watch "echo BackupRepositories:;echo;oc get backuprepositories.velero.io -A;echo; echo BackupStorageLocations: ;echo; oc get backupstoragelocations.velero.io -A;echo;echo DataUploads: ;echo;oc get datauploads.velero.io -A;echo;echo DataDownloads: ;echo;oc get datadownloads.velero.io -n openshift-adp; echo;echo VolumeSnapshotLocations: ;echo;oc get volumesnapshotlocations.velero.io -A;echo;echo Backups:;echo;oc get backup -A; echo;echo Restores:;echo;oc get restore -A"
39+
----

0 commit comments

Comments
 (0)