Skip to content

Commit 6105379

Browse files
committed
TELCODOCS-2005: Add Telco Day 2 Troubleshooting and maintenance docs
1 parent 9919a8d commit 6105379

18 files changed

+516
-11
lines changed

_topic_maps/_topic_map.yml

Lines changed: 9 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -3435,17 +3435,15 @@ Topics:
34353435
File: telco-update-completing-the-y-stream-update
34363436
- Name: Completing the z-stream update
34373437
File: telco-update-completing-the-z-stream-update
3438-
# - Name: Troubleshooting and maintaining telco core CNF clusters
3439-
# Dir: troubleshooting
3440-
# Topics:
3441-
# - Name: Troubleshooting and maintaining telco core CNF clusters
3442-
# File: telco-troubleshooting-intro
3443-
# - Name: Getting Support
3444-
# File: telco-troubleshooting-getting-support
3445-
# - Name: General troubleshooting
3446-
# File: telco-troubleshooting-general-troubleshooting
3447-
# - Name: Cluster maintenance
3448-
# File: telco-troubleshooting-cluster-maintenance
3438+
- Name: Troubleshooting and maintaining telco core CNF clusters
3439+
Dir: troubleshooting
3440+
Topics:
3441+
- Name: Troubleshooting and maintaining telco core CNF clusters
3442+
File: telco-troubleshooting-intro
3443+
- Name: General troubleshooting
3444+
File: telco-troubleshooting-general-troubleshooting
3445+
- Name: Cluster maintenance
3446+
File: telco-troubleshooting-cluster-maintenance
34493447
# - Name: Security
34503448
# File: telco-troubleshooting-security
34513449
# - Name: Certificate maintenance
Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,21 @@
1+
:_mod-docs-content-type: ASSEMBLY
2+
[id="telco-troubleshooting-cluster-maintenance"]
3+
= Cluster maintenance
4+
include::_attributes/common-attributes.adoc[]
5+
:context: telco-troubleshooting-cluster-maintenance
6+
7+
toc::[]
8+
9+
In telco networks, you must pay more attention to certain configurations due the nature of bare-metal deployments.
10+
You can troubleshoot more effectively by completing these tasks:
11+
12+
* Monitor for failed or failing hardware components
13+
* Periodically check the status of the cluster Operators
14+
15+
[NOTE]
16+
====
17+
For hardware monitoring, contact your hardware vendor to find the appropriate logging tool for your specific hardware.
18+
====
19+
20+
include::modules/telco-troubleshooting-clusters-check-cluster-operators.adoc[leveloffset=+1]
21+
include::modules/telco-troubleshooting-clusters-check-for-failed-pods.adoc[leveloffset=+1]
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,71 @@
1+
:_mod-docs-content-type: ASSEMBLY
2+
[id="telco-troubleshooting-general-troubleshooting"]
3+
= General troubleshooting
4+
include::_attributes/common-attributes.adoc[]
5+
:context: telco-troubleshooting-general-troubleshooting
6+
7+
toc::[]
8+
9+
When you encounter a problem, the first step is to find the specific area where the issue is happening.
10+
To narrow down the potential problematic areas, complete one or more tasks:
11+
12+
* Query your cluster
13+
* Check your pod logs
14+
* Debug a pod
15+
* Review events
16+
17+
include::modules/telco-troubleshooting-general-query-cluster.adoc[leveloffset=+1]
18+
19+
[role="_additional-resources"]
20+
.Additional resources
21+
22+
* xref:../../../cli_reference/openshift_cli/developer-cli-commands.adoc#oc-get[oc get]
23+
* xref:../../../support/troubleshooting/investigating-pod-issues.adoc#reviewing-pod-status_investigating-pod-issues[Reviewing pod status]
24+
25+
include::modules/telco-troubleshooting-general-check-logs.adoc[leveloffset=+1]
26+
27+
[role="_additional-resources"]
28+
.Additional resources
29+
30+
* xref:../../../cli_reference/openshift_cli/developer-cli-commands.adoc#oc-logs[oc logs]
31+
* xref:../../../security/container_security/security-monitoring.adoc#security-monitoring-cluster-logging_security-monitoring[Logging]
32+
* xref:../../../support/troubleshooting/investigating-pod-issues.adoc#inspecting-pod-and-container-logs_investigating-pod-issues[Inspecting pod and container logs]
33+
34+
35+
include::modules/telco-troubleshooting-general-describe-pod.adoc[leveloffset=+1]
36+
37+
[role="_additional-resources"]
38+
.Additional resources
39+
40+
* xref:../../../cli_reference/openshift_cli/developer-cli-commands.adoc#oc-describe[oc describe]
41+
42+
include::modules/telco-troubleshooting-general-review-events.adoc[leveloffset=+1]
43+
44+
[role="_additional-resources"]
45+
.Additional resources
46+
47+
* xref:../../../security/container_security/security-monitoring.adoc#security-monitoring-events_security-monitoring[Watching cluster events]
48+
49+
include::modules/telco-troubleshooting-general-connect-to-pod.adoc[leveloffset=+1]
50+
51+
[role="_additional-resources"]
52+
.Additional resources
53+
54+
* xref:../../../cli_reference/openshift_cli/developer-cli-commands.adoc#oc-rsh[oc rsh]
55+
* xref:../../../support/troubleshooting/investigating-pod-issues.adoc#accessing-running-pods_investigating-pod-issues[Accessing running pods]
56+
57+
include::modules/telco-troubleshooting-general-debug-pod.adoc[leveloffset=+1]
58+
59+
[role="_additional-resources"]
60+
.Additional resources
61+
62+
* xref:../../../cli_reference/openshift_cli/developer-cli-commands.adoc#oc-debug[oc debug]
63+
* xref:../../../support/troubleshooting/investigating-pod-issues.adoc#starting-debug-pods-with-root-access_investigating-pod-issues[Starting debug pods with root access]
64+
65+
include::modules/telco-troubleshooting-general-run-command-on-pod.adoc[leveloffset=+1]
66+
67+
[role="_additional-resources"]
68+
.Additional resources
69+
70+
* xref:../../../cli_reference/openshift_cli/developer-cli-commands.adoc#oc-exec[oc exec]
71+
* xref:../../../nodes/containers/nodes-containers-remote-commands.adoc#nodes-containers-remote-commands-about_nodes-containers-remote-commands[Executing remote commands in containers]
Lines changed: 25 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,25 @@
1+
:_mod-docs-content-type: ASSEMBLY
2+
[id="telco-troubleshooting-intro"]
3+
= Troubleshooting and maintaining telco core CNF clusters
4+
include::_attributes/common-attributes.adoc[]
5+
:context: telco-troubleshooting-intro
6+
7+
toc::[]
8+
9+
Troubleshooting and maintenance are weekly tasks that can be a challenge if you do not have the tools to reach your goal, whether you want to update a component or investigate an issue.
10+
Part of the challenge is knowing where and how to search for tools and answers.
11+
12+
To maintain and troubleshoot a bare-metal environment where high-bandwidth network throughput is required, see the following procedures.
13+
14+
[IMPORTANT]
15+
====
16+
This troubleshooting information is not a reference for configuring {product-title} or developing Cloud-native Network Function (CNF) applications.
17+
18+
For information about developing CNF applications for telco, see link:https://redhat-best-practices-for-k8s.github.io/guide/[Red Hat Best Practices for Kubernetes].
19+
====
20+
21+
include::modules/telco-troubleshooting-cnfs.adoc[leveloffset=+1]
22+
include::modules/support-getting-support.adoc[leveloffset=+1]
23+
include::modules/support-knowledgebase-about.adoc[leveloffset=+2]
24+
include::modules/support-knowledgebase-search.adoc[leveloffset=+2]
25+
include::modules/support-submitting-a-case.adoc[leveloffset=+2]

modules/support-getting-support.adoc

Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,19 @@
1+
// Module included in the following assemblies:
2+
//
3+
// * edge_computing/day_2_core_cnf_clusters/troubleshooting/telco-troubleshooting-intro.adoc
4+
5+
:_mod-docs-content-type: CONCEPT
6+
[id="support-getting-support_{context}"]
7+
= Getting Support
8+
9+
If you experience difficulty with a procedure, visit the link:https://access.redhat.com/[Red{nbsp}Hat Customer Portal].
10+
From the Customer Portal, you can find help in various ways:
11+
12+
* Search or browse through the Red{nbsp}Hat Knowledgebase of articles and solutions about Red{nbsp}Hat products.
13+
* Submit a support case to Red{nbsp}Hat Support.
14+
* Access other product documentation.
15+
16+
To identify issues with your deployment, you can use the debugging tool or check the health endpoint of your deployment.
17+
After you have debugged or obtained health information about your deployment, you can search the Red{nbsp}Hat Knowledgebase for a solution or file a support ticket.
18+
19+
//If you have a suggestion for improving this documentation or have found an error, submit a Jira issue to the ProjectQuay project. Provide specific details, such as the section name and Red Hat Quay version.

modules/support-knowledgebase-about.adoc

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,7 @@
55
// * service_mesh/v2x/ossm-troubleshooting-istio.adoc
66
// * osd_architecture/osd-support.adoc
77
// * microshift_support/microshift-getting-support.adoc
8+
// * edge_computing/day_2_core_cnf_clusters/troubleshooting/telco-troubleshooting-intro.adoc
89

910
:_mod-docs-content-type: CONCEPT
1011
[id="support-knowledgebase-about_{context}"]

modules/support-knowledgebase-search.adoc

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,7 @@
55
// * service_mesh/v2x/ossm-troubleshooting-istio.adoc
66
// * osd_architecture/osd-support.adoc
77
// * microshift_support/microshift-getting-support.adoc
8+
// * edge_computing/day_2_core_cnf_clusters/troubleshooting/telco-troubleshooting-intro.adoc
89

910
:_mod-docs-content-type: PROCEDURE
1011
[id="support-knowledgebase-search_{context}"]

modules/support-submitting-a-case.adoc

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,7 @@
44
// * support/getting-support.adoc
55
// * service_mesh/v2x/ossm-troubleshooting-istio.adoc
66
// * osd_architecture/osd-support.adoc
7+
// * edge_computing/day_2_core_cnf_clusters/troubleshooting/telco-troubleshooting-intro.adoc
78

89
:_mod-docs-content-type: PROCEDURE
910
[id="support-submitting-a-case_{context}"]
Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,18 @@
1+
// Module included in the following assemblies:
2+
//
3+
// * edge_computing/day_2_core_cnf_clusters/troubleshooting/telco-troubleshooting-cluster-maintenance.adoc
4+
5+
:_mod-docs-content-type: PROCEDURE
6+
[id="telco-troubleshooting-clusters-check-cluster-operators_{context}"]
7+
= Checking cluster Operators
8+
9+
Periodically check the status of your cluster Operators to find issues early.
10+
11+
.Procedure
12+
13+
* Check the status of the cluster Operators by running the following command:
14+
+
15+
[source,terminal]
16+
----
17+
$ oc get co
18+
----
Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,18 @@
1+
// Module included in the following assemblies:
2+
//
3+
// * edge_computing/day_2_core_cnf_clusters/troubleshooting/telco-troubleshooting-cluster-maintenance.adoc
4+
5+
:_mod-docs-content-type: PROCEDURE
6+
[id="telco-troubleshooting-clusters-check-for-failed-pods_{context}"]
7+
= Watching for failed pods
8+
9+
To reduce troubleshooting time, regularly monitor for failed pods in your cluster.
10+
11+
.Procedure
12+
13+
* To watch for failed pods, run the following command:
14+
+
15+
[source,terminal]
16+
----
17+
$ oc get po -A | grep -Eiv 'complete|running'
18+
----

0 commit comments

Comments
 (0)