Skip to content

Commit 3adf96a

Browse files
authored
Merge pull request #89097 from apurvabhide17/OADP-4561-troubleshooting-OADP-toc
OADP-4561: Update parent ToC for Troubleshooting modules
2 parents bd8e48f + 53f8318 commit 3adf96a

17 files changed

+323
-215
lines changed

_topic_maps/_topic_map.yml

Lines changed: 22 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -3654,12 +3654,32 @@ Topics:
36543654
File: oadp-backup-restore-csi-snapshots
36553655
- Name: Overriding Kopia algorithms
36563656
File: overriding-kopia-algorithms
3657-
- Name: Troubleshooting
3658-
File: troubleshooting
36593657
- Name: OADP API
36603658
File: oadp-api
36613659
- Name: Advanced OADP features and functionalities
36623660
File: oadp-advanced-topics
3661+
- Name: Troubleshooting OADP
3662+
File: troubleshooting
3663+
- Name: Velero CLI tool
3664+
File: velero-cli-tool
3665+
- Name: Pods crash or restart due to lack of memory or CPU
3666+
File: pods-crash-or-restart-due-to-lack-of-memory-or-cpu
3667+
- Name: Issues with Velero and admission webhooks
3668+
File: issues-with-velero-and-admission-webhooks
3669+
- Name: OADP installation issues
3670+
File: oadp-installation-issues
3671+
- Name: OADP Operator issues
3672+
File: oadp-operator-issues
3673+
- Name: OADP timeouts
3674+
File: oadp-timeouts
3675+
- Name: Backup and Restore CR issues
3676+
File: backup-and-restore-cr-issues
3677+
- Name: Restic issues
3678+
File: restic-issues
3679+
- Name: Using the must-gather tool
3680+
File: using-the-must-gather-tool
3681+
- Name: OADP monitoring
3682+
File: oadp-monitoring
36633683
- Name: Control plane backup and restore
36643684
Dir: control_plane_backup_and_restore
36653685
Topics:

backup_and_restore/application_backup_and_restore/backing_up_and_restoring/backing-up-applications.adoc

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -61,7 +61,7 @@ You can schedule backups by creating a `Schedule` CR instead of a `Backup` CR. S
6161
This issue has been resolved in the OADP 1.1.6 and OADP 1.2.2 releases, therefore it is recommended that users upgrade to these releases.
6262

6363
ifndef::openshift-rosa,openshift-rosa-hcp[]
64-
For more information, see xref:../../../backup_and_restore/application_backup_and_restore/troubleshooting.adoc#oadp-restic-restore-failing-psa-policy_oadp-troubleshooting[Restic restore partially failing on OCP 4.15 due to changed PSA policy].
64+
For more information, see xref:../../../backup_and_restore/application_backup_and_restore/restic-issues.adoc#oadp-restic-restore-failing-psa-policy_restic-issues[Restic restore partially failing on OCP 4.15 due to changed PSA policy].
6565
endif::openshift-rosa,openshift-rosa-hcp[]
6666

6767
// TODO: Add xrefs to ROSA HCP when Operators book is added.

modules/oadp-backup-restore-cr-issues.adoc renamed to backup_and_restore/application_backup_and_restore/backup-and-restore-cr-issues.adoc

Lines changed: 19 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -1,17 +1,20 @@
1-
// Module included in the following assemblies:
2-
//
3-
// * backup_and_restore/application_backup_and_restore/troubleshooting.adoc
4-
5-
:_mod-docs-content-type: CONCEPT
6-
[id="oadp-backup-restore-cr-issues_{context}"]
1+
:_mod-docs-content-type: ASSEMBLY
2+
[id="backup-and-restore-cr-issues"]
73
= Backup and Restore CR issues
4+
include::_attributes/common-attributes.adoc[]
5+
include::_attributes/attributes-openshift-dedicated.adoc[]
6+
:context: backup-and-restore-cr-issues
7+
:namespace: openshift-adp
8+
:local-product: OADP
9+
10+
toc::[]
811

912
You might encounter these common issues with `Backup` and `Restore` custom resources (CRs).
1013

1114
[id="backup-cannot-retrieve-volume_{context}"]
1215
== Backup CR cannot retrieve volume
1316

14-
The `Backup` CR displays the error message, `InvalidVolume.NotFound: The volume ‘vol-xxxx’ does not exist`.
17+
The `Backup` CR displays the following error message: `InvalidVolume.NotFound: The volume ‘vol-xxxx’ does not exist`.
1518

1619
.Cause
1720

@@ -33,26 +36,26 @@ If a backup is interrupted, it cannot be resumed.
3336

3437
.Solution
3538

36-
. Retrieve the details of the `Backup` CR:
39+
. Retrieve the details of the `Backup` CR by running the following command:
3740
+
3841
[source,terminal]
3942
----
4043
$ oc -n {namespace} exec deployment/velero -c velero -- ./velero \
4144
backup describe <backup>
4245
----
4346

44-
. Delete the `Backup` CR:
47+
. Delete the `Backup` CR by running the following command:
4548
+
4649
[source,terminal]
4750
----
4851
$ oc delete backups.velero.io <backup> -n openshift-adp
4952
----
5053
+
51-
You do not need to clean up the backup location because a `Backup` CR in progress has not uploaded files to object storage.
54+
You do not need to clean up the backup location because an in progress `Backup` CR has not uploaded files to object storage.
5255

5356
. Create a new `Backup` CR.
5457

55-
. View the Velero backup details
58+
. View the Velero backup details by running the following command:
5659
+
5760
[source,terminal, subs="+quotes"]
5861
----
@@ -62,11 +65,11 @@ $ velero backup describe _<backup-name>_ --details
6265
[id="backup-cr-remains-partiallyfailed_{context}"]
6366
== Backup CR status remains in PartiallyFailed
6467

65-
The status of a `Backup` CR without Restic in use remains in the `PartiallyFailed` phase and does not complete. A snapshot of the affiliated PVC is not created.
68+
The status of a `Backup` CR without Restic in use remains in the `PartiallyFailed` phase and is not completed. A snapshot of the affiliated PVC is not created.
6669

6770
.Cause
6871

69-
If the backup is created based on the CSI snapshot class, but the label is missing, CSI snapshot plugin fails to create a snapshot. As a result, the `Velero` pod logs an error similar to the following:
72+
If the backup created based on the CSI snapshot class is missing a label, the CSI snapshot plugin fails to create a snapshot. As a result, the `Velero` pod logs an error similar to the following message:
7073

7174
[source,text]
7275
----
@@ -75,7 +78,7 @@ time="2023-02-17T16:33:13Z" level=error msg="Error backing up item" backup=opens
7578

7679
.Solution
7780

78-
. Delete the `Backup` CR:
81+
. Delete the `Backup` CR by running the following command::
7982
+
8083
[source,terminal]
8184
----
@@ -84,11 +87,11 @@ $ oc delete backups.velero.io <backup> -n openshift-adp
8487

8588
. If required, clean up the stored data on the `BackupStorageLocation` to free up space.
8689

87-
. Apply label `velero.io/csi-volumesnapshot-class=true` to the `VolumeSnapshotClass` object:
90+
. Apply the label `velero.io/csi-volumesnapshot-class=true` to the `VolumeSnapshotClass` object by running the following command:
8891
+
8992
[source,terminal]
9093
----
9194
$ oc label volumesnapshotclass/<snapclass_name> velero.io/csi-volumesnapshot-class=true
9295
----
9396

94-
. Create a new `Backup` CR.
97+
. Create a new `Backup` CR.
Lines changed: 33 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,33 @@
1+
:_mod-docs-content-type: ASSEMBLY
2+
[id="issues-with-velero-and-admission-webhooks"]
3+
= Issues with Velero and admission webhooks
4+
include::_attributes/common-attributes.adoc[]
5+
include::_attributes/attributes-openshift-dedicated.adoc[]
6+
:context: issues-with-velero-and-admission-webhooks
7+
:namespace: openshift-adp
8+
:local-product: OADP
9+
10+
toc::[]
11+
12+
Velero has limited abilities to resolve admission webhook issues during a restore. If you have workloads with admission webhooks, you might need to use an additional Velero plugin or make changes to how you restore the workload.
13+
14+
Typically, workloads with admission webhooks require you to create a resource of a specific kind first. This is especially true if your workload has child resources because admission webhooks typically block child resources.
15+
16+
For example, creating or restoring a top-level object such as `service.serving.knative.dev` typically creates child resources automatically. If you do this first, you will not need to use Velero to create and restore these resources. This avoids the problem of child resources being blocked by an admission webhook that Velero might use.
17+
18+
[id="velero-restore-workarounds-for-workloads-with-admission-webhooks_{context}"]
19+
== Restoring workarounds for Velero backups that use admission webhooks
20+
21+
You need additional steps to restore resources for several types of Velero backups that use admission webhooks.
22+
23+
include::modules/migration-debugging-velero-admission-webhooks-knative.adoc[leveloffset=+2]
24+
include::modules/migration-debugging-velero-admission-webhooks-ibm-appconnect.adoc[leveloffset=+2]
25+
include::modules/oadp-features-plugins-known-issues.adoc[leveloffset=+1]
26+
include::modules/oadp-plugins-receiving-eof-message.adoc[leveloffset=+1]
27+
28+
[role="_additional-resources"]
29+
.Additional resources
30+
31+
* xref:../../architecture/admission-plug-ins.adoc#admission-plug-ins[Admission plugins]
32+
* xref:../../architecture/admission-plug-ins.adoc#admission-webhooks-about_admission-plug-ins[Webhook admission plugins]
33+
* xref:../../architecture/admission-plug-ins.adoc#admission-webhook-types_admission-plug-ins[Types of webhook admission plugins]

modules/oadp-installation-issues.adoc renamed to backup_and_restore/application_backup_and_restore/oadp-installation-issues.adoc

Lines changed: 12 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -1,17 +1,20 @@
1-
// Module included in the following assemblies:
2-
//
3-
// * backup_and_restore/application_backup_and_restore/troubleshooting.adoc
1+
:_mod-docs-content-type: ASSEMBLY
2+
[id="oadp-installation-issues"]
3+
= OADP installation issues
4+
include::_attributes/common-attributes.adoc[]
5+
include::_attributes/attributes-openshift-dedicated.adoc[]
6+
:context: installation-issues
7+
:namespace: openshift-adp
8+
:local-product: OADP
49

5-
:_mod-docs-content-type: CONCEPT
6-
[id="oadp-installation-issues_{context}"]
7-
= Installation issues
10+
toc::[]
811

912
You might encounter issues caused by using invalid directories or incorrect credentials when you install the Data Protection Application.
1013

1114
[id="oadp-backup-location-contains-invalid-directories_{context}"]
1215
== Backup storage contains invalid directories
1316

14-
The `Velero` pod log displays the error message, `Backup storage contains invalid top-level directories`.
17+
The `Velero` pod log displays the following error message: `Backup storage contains invalid top-level directories`.
1518

1619
.Cause
1720

@@ -24,9 +27,9 @@ If the object storage is not dedicated to Velero, you must specify a prefix for
2427
[id="oadp-incorrect-aws-credentials_{context}"]
2528
== Incorrect AWS credentials
2629

27-
The `oadp-aws-registry` pod log displays the error message, `InvalidAccessKeyId: The AWS Access Key Id you provided does not exist in our records.`
30+
The `oadp-aws-registry` pod log displays the following error message: `InvalidAccessKeyId: The AWS Access Key Id you provided does not exist in our records.`
2831

29-
The `Velero` pod log displays the error message, `NoCredentialProviders: no valid providers in chain`.
32+
The `Velero` pod log displays the following error message: `NoCredentialProviders: no valid providers in chain`.
3033

3134
.Cause
3235

Lines changed: 32 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,32 @@
1+
:_mod-docs-content-type: ASSEMBLY
2+
[id="oadp-monitoring"]
3+
= OADP monitoring
4+
include::_attributes/common-attributes.adoc[]
5+
include::_attributes/attributes-openshift-dedicated.adoc[]
6+
:context: oadp-monitoring
7+
:namespace: openshift-adp
8+
:local-product: OADP
9+
10+
toc::[]
11+
12+
By using the {product-title} monitoring stack, users and administrators can effectively perform the following tasks:
13+
14+
* Monitor and manage clusters
15+
* Analyze the workload performance of user applications
16+
* Monitor services running on the clusters
17+
* Receive alerts if an event occurs
18+
19+
[role="_additional-resources"]
20+
.Additional resources
21+
* xref:../../observability/monitoring/about-ocp-monitoring/about-ocp-monitoring.adoc#about-ocp-monitoring[About {product-title} monitoring]
22+
23+
include::modules/oadp-monitoring-setup.adoc[leveloffset=+1]
24+
include::modules/oadp-creating-service-monitor.adoc[leveloffset=+1]
25+
include::modules/oadp-creating-alerting-rule.adoc[leveloffset=+1]
26+
27+
[role="_additional-resources"]
28+
.Additional resources
29+
* xref:../../observability/monitoring/managing-alerts/managing-alerts-as-an-administrator.adoc#managing-alerts-as-an-administrator[Managing alerts as an Administrator]
30+
31+
include::modules/oadp-list-of-metrics.adoc[leveloffset=+1]
32+
include::modules/oadp-viewing-metrics-ui.adoc[leveloffset=+1]

modules/oadp-operator-issues.adoc renamed to backup_and_restore/application_backup_and_restore/oadp-operator-issues.adoc

Lines changed: 17 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -1,17 +1,20 @@
1-
// Module included in the following assemblies:
2-
//
3-
// * backup_and_restore/application_backup_and_restore/troubleshooting.adoc
4-
5-
:_mod-docs-content-type: PROCEDURE
6-
[id="oadp-operator-issues_{context}"]
1+
:_mod-docs-content-type: ASSEMBLY
2+
[id="oadp-operator-issues"]
73
= OADP Operator issues
4+
include::_attributes/common-attributes.adoc[]
5+
include::_attributes/attributes-openshift-dedicated.adoc[]
6+
:context: oadp-operator-issues
7+
:namespace: openshift-adp
8+
:local-product: OADP
9+
10+
toc::[]
811

912
The {oadp-first} Operator might encounter issues caused by problems it is not able to resolve.
1013

1114
[id="oadp-operator-fails-silently_{context}"]
1215
== OADP Operator fails silently
1316

14-
The S3 buckets of an OADP Operator might be empty, but when you run the command `oc get po -n <OADP_Operator_namespace>`, you see that the Operator has a status of `Running`. In such a case, the Operator is said to have _failed silently_ because it incorrectly reports that it is running.
17+
The S3 buckets of an OADP Operator might be empty, but when you run the command `oc get po -n <oadp_operator_namespace>`, you see that the Operator has a status of `Running`. In such a case, the Operator is said to have _failed silently_ because it incorrectly reports that it is running.
1518

1619
.Cause
1720

@@ -23,31 +26,28 @@ Retrieve a list of backup storage locations (BSLs) and check the manifest of eac
2326

2427
.Procedure
2528

26-
. Run one of the following commands to retrieve a list of BSLs:
27-
28-
.. Using the OpenShift CLI:
29+
. Retrieve a list of BSLs by using either the OpenShift or Velero command-line interface (CLI):
30+
.. Retrieve a list of BSLs by using the OpenShift CLI (`oc`):
2931
+
3032
[source,terminal]
3133
----
3234
$ oc get backupstoragelocations.velero.io -A
3335
----
34-
35-
.. Using the Velero CLI:
36+
.. Retrieve a list of BSLs by using the `velero` CLI:
3637
+
3738
[source,terminal]
3839
----
39-
$ velero backup-location get -n <OADP_Operator_namespace>
40+
$ velero backup-location get -n <oadp_operator_namespace>
4041
----
4142

42-
. Using the list of BSLs, run the following command to display the manifest of each BSL, and examine each manifest for an error.
43+
. Use the list of BSLs from the previous step and run the following command to examine the manifest of each BSL for an error:
4344
+
4445
[source,terminal]
4546
----
4647
$ oc get backupstoragelocations.velero.io -n <namespace> -o yaml
4748
----
48-
49+
+
4950
.Example result
50-
5151
[source, yaml]
5252
----
5353
apiVersion: v1
@@ -90,4 +90,4 @@ items:
9090
kind: List
9191
metadata:
9292
resourceVersion: ""
93-
----
93+
----
Lines changed: 36 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,36 @@
1+
:_mod-docs-content-type: ASSEMBLY
2+
[id="oadp-timeouts"]
3+
= OADP timeouts
4+
include::_attributes/common-attributes.adoc[]
5+
include::_attributes/attributes-openshift-dedicated.adoc[]
6+
:context: oadp-timeouts
7+
:namespace: openshift-adp
8+
:local-product: OADP
9+
10+
toc::[]
11+
12+
Extending a timeout allows complex or resource-intensive processes to complete successfully without premature termination. This configuration can reduce errors, retries, or failures.
13+
14+
Ensure that you balance timeout extensions in a logical manner so that you do not configure excessively long timeouts that might hide underlying issues in the process. Consider and monitor an appropriate timeout value that meets the needs of the process and the overall system performance.
15+
16+
The following OADP timeouts show instructions of how and when to implement these parameters:
17+
18+
* xref:../../backup_and_restore/application_backup_and_restore/oadp-timeouts.adoc#restic-timeout_oadp-timeouts[Restic timeout]
19+
20+
* xref:../../backup_and_restore/application_backup_and_restore/oadp-timeouts.adoc#velero-timeout_oadp-timeouts[Velero resource timeout]
21+
22+
* xref:../../backup_and_restore/application_backup_and_restore/oadp-timeouts.adoc#datamover-timeout_oadp-timeouts[Data Mover timeout]
23+
24+
* xref:../../backup_and_restore/application_backup_and_restore/oadp-timeouts.adoc#csisnapshot-timeout_oadp-timeouts[CSI snapshot timeout]
25+
26+
* xref:../../backup_and_restore/application_backup_and_restore/oadp-timeouts.adoc#item-operation-timeout-backup_oadp-timeouts[Item operation timeout - backup]
27+
28+
* xref:../../backup_and_restore/application_backup_and_restore/oadp-timeouts.adoc#item-operation-timeout-restore_oadp-timeouts[Item operation timeout - restore]
29+
30+
include::modules/oadp-restic-timeouts.adoc[leveloffset=+1]
31+
include::modules/oadp-velero-timeouts.adoc[leveloffset=+1]
32+
include::modules/oadp-velero-default-timeouts.adoc[leveloffset=+2]
33+
include::modules/oadp-datamover-timeouts.adoc[leveloffset=+1]
34+
include::modules/oadp-csi-snapshot-timeouts.adoc[leveloffset=+1]
35+
include::modules/oadp-item-restore-timeouts.adoc[leveloffset=+1]
36+
include::modules/oadp-item-backup-timeouts.adoc[leveloffset=+1]
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,31 @@
1+
:_mod-docs-content-type: ASSEMBLY
2+
[id="pods-crash-or-restart-due-to-lack-of-memory-or-cpu"]
3+
= Pods crash or restart due to lack of memory or CPU
4+
include::_attributes/common-attributes.adoc[]
5+
include::_attributes/attributes-openshift-dedicated.adoc[]
6+
:context: pods-crash-or-restart-due-to-lack-of-memory-or-cpu
7+
:namespace: openshift-adp
8+
:local-product: OADP
9+
:must-gather-v1-3: registry.redhat.io/oadp/oadp-mustgather-rhel9:v1.3
10+
:must-gather-v1-4: registry.redhat.io/oadp/oadp-mustgather-rhel9:v1.4
11+
12+
toc::[]
13+
14+
If a Velero or Restic pod crashes due to a lack of memory or CPU, you can set specific resource requests for either of those resources.
15+
16+
The values for the resource request fields must follow the same format as Kubernetes resource requirements.
17+
If you do not specify `configuration.velero.podConfig.resourceAllocations` or `configuration.restic.podConfig.resourceAllocations`, see the following default `resources` specification configuration for a Velero or Restic pod:
18+
19+
[source,yaml]
20+
----
21+
requests:
22+
cpu: 500m
23+
memory: 128Mi
24+
----
25+
26+
[role="_additional-resources"]
27+
.Additional resources
28+
* xref:../../backup_and_restore/application_backup_and_restore/installing/about-installing-oadp.adoc#oadp-velero-cpu-memory-requirements_about-installing-oadp[Velero CPU and memory requirements based on collected data]
29+
30+
include::modules/oadp-pod-crash-set-resource-request-velero.adoc[leveloffset=+1]
31+
include::modules/oadp-pod-crash-set-resource-request-restic.adoc[leveloffset=+1]

0 commit comments

Comments
 (0)