Skip to content

Commit 900e9ee

Browse files
committed
TELCODOCS-2247-core updating modules from gitlab 419 core
1 parent 992a6bd commit 900e9ee

File tree

45 files changed

+439
-235
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

45 files changed

+439
-235
lines changed
24.2 KB
Loading

modules/nodes-cluster-worker-latency-profiles-about.adoc

Lines changed: 0 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -15,12 +15,10 @@ Setting these parameters manually is not supported. Incorrect parameter settings
1515

1616
All worker latency profiles configure the following parameters:
1717

18-
--
1918
node-status-update-frequency:: Specifies how often the kubelet posts node status to the API server.
2019
node-monitor-grace-period:: Specifies the amount of time in seconds that the Kubernetes Controller Manager waits for an update from a kubelet before marking the node unhealthy and adding the `node.kubernetes.io/not-ready` or `node.kubernetes.io/unreachable` taint to the node.
2120
default-not-ready-toleration-seconds:: Specifies the amount of time in seconds after marking a node unhealthy that the Kube API Server Operator waits before evicting pods from that node.
2221
default-unreachable-toleration-seconds:: Specifies the amount of time in seconds after marking a node unreachable that the Kube API Server Operator waits before evicting pods from that node.
23-
--
2422

2523
The following Operators monitor the changes to the worker latency profiles and respond accordingly:
2624

modules/telco-core-about-the-telco-core-cluster-use-model.adoc

Lines changed: 7 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -7,17 +7,19 @@
77
= About the telco core cluster use model
88

99
The telco core cluster use model is designed for clusters that run on commodity hardware.
10-
Telco core clusters support large scale telco applications including control plane functions such as signaling, aggregation, and session border controller (SBC); and centralized data plane functions such as 5G user plane functions (UPF).
10+
Telco core clusters support large scale telco applications including control plane functions like signaling, aggregation, session border controller (SBC), and centralized data plane functions such as 5G user plane functions (UPF).
1111
Telco core cluster functions require scalability, complex networking support, resilient software-defined storage, and support performance requirements that are less stringent and constrained than far-edge RAN deployments.
1212

13-
.Telco core RDS cluster service-based architecture and networking topology
14-
image::openshift-5g-core-cluster-architecture-networking.png[5G core cluster showing a service-based architecture with overlaid networking topology]
15-
1613
Networking requirements for telco core functions vary widely across a range of networking features and performance points.
1714
IPv6 is a requirement and dual-stack is common.
1815
Some functions need maximum throughput and transaction rate and require support for user-plane DPDK networking.
1916
Other functions use more typical cloud-native patterns and can rely on OVN-Kubernetes, kernel networking, and load balancing.
2017

21-
Telco core clusters are configured as standard with three control plane and two or more worker nodes configured with the stock (non-RT) kernel.
18+
Telco core clusters are configured as standard with three control plane and one or more worker nodes configured with the stock (non-RT) kernel.
2219
In support of workloads with varying networking and performance requirements, you can segment worker nodes by using `MachineConfigPool` custom resources (CR), for example, for non-user data plane or high-throughput use cases.
2320
In support of required telco operational features, core clusters have a standard set of Day 2 OLM-managed Operators installed.
21+
22+
23+
.Telco core RDS cluster service-based architecture and networking topology
24+
image::openshift-5g-core-cluster-architecture-networking.png[5G core cluster showing a service-based architecture with overlaid networking topology]
25+

modules/telco-core-additional-storage-solutions.adoc

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -9,3 +9,4 @@ You can use other storage solutions to provide persistent storage for telco core
99
The configuration and integration of these solutions is outside the scope of the reference design specifications (RDS).
1010

1111
Integration of the storage solution into the telco core cluster must include proper sizing and performance analysis to ensure the storage meets overall performance and resource usage requirements.
12+

modules/telco-core-agent-based-installer.adoc

Lines changed: 11 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -7,27 +7,27 @@
77
= Agent-based Installer
88

99
New in this release::
10-
* No reference design updates in this release
10+
* No reference design updates in this release.
1111

1212
Description::
1313
+
1414
--
15-
Telco core clusters can be installed by using the Agent-based Installer.
16-
This method allows you to install OpenShift on bare-metal servers without requiring additional servers or VMs for managing the installation.
15+
Telco core clusters can be installed using the Agent-based Installer.
16+
This method allows you to install {product-title} on bare-metal servers without requiring additional servers or VMs for managing the installation.
1717
The Agent-based Installer can be run on any system (for example, from a laptop) to generate an ISO installation image.
1818
The ISO is used as the installation media for the cluster supervisor nodes.
19-
Installation progress can be monitored using the ABI tool from any system with network connectivity to the supervisor node's API interfaces.
19+
Progress can be monitored using the Agent-based Installer from any system with network connectivity to the supervisor node's API interfaces.
2020

21-
ABI supports the following:
21+
Agent-based Installer supports the following:
2222

23-
* Installation from declarative CRs
24-
* Installation in disconnected environments
25-
* Installation with no additional supporting install or bastion servers required to complete the installation
23+
* Installation from declarative CRs.
24+
* Installation in disconnected environments.
25+
* Installation without the use of additional servers to support installation, for example, the bastion node.
2626
--
2727
2828
Limits and requirements::
29-
* Disconnected installation requires a registry that is reachable from the installed host, with all required content mirrored in that registry.
29+
* Disconnected installation requires a registry with all required content mirrored and reachable from the installed host.
3030

3131
Engineering considerations::
32-
* Networking configuration should be applied as NMState configuration during installation.
33-
Day 2 networking configuration using the NMState Operator is not supported.
32+
* Networking configuration should be applied as NMState configuration during installation. Day 2 networking configuration using the NMState Operator is not supported.
33+

modules/telco-core-application-workloads.adoc

Lines changed: 3 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -20,8 +20,7 @@ Engineering considerations::
2020
--
2121
Use the following information to plan telco core workloads and cluster resources:
2222

23-
include::snippets/nodes-cgroup-vi-removed.adoc[]
24-
23+
* As of {product-title} 4.19, cgroup v1 is no longer supported and has been removed. All workloads must now be compatible with cgroup v2. For more information, see link:https://www.redhat.com/en/blog/rhel-9-changes-context-red-hat-openshift-workloads[Red Hat Enterprise Linux 9 changes in the context of Red Hat OpenShift workloads].
2524
* CNF applications should conform to the latest version of https://redhat-best-practices-for-k8s.github.io/guide/[Red Hat Best Practices for Kubernetes].
2625
* Use a mix of best-effort and burstable QoS pods as required by your applications.
2726
** Use guaranteed QoS pods with proper configuration of reserved or isolated CPUs in the `PerformanceProfile` CR that configures the node.
@@ -34,6 +33,6 @@ Use other probe implementations, for example, `httpGet` or `tcpSocket`.
3433
** When you need to use exec probes, limit the exec probe frequency and quantity.
3534
The maximum number of exec probes must be kept below 10, and the frequency must not be set to less than 10 seconds.
3635
** You can use startup probes, because they do not use significant resources at steady-state operation.
37-
This limitation on exec probes applies primarily to liveness and readiness probes.
36+
The limitation on exec probes applies primarily to liveness and readiness probes.
3837
Exec probes cause much higher CPU usage on management cores compared to other probe types because they require process forking.
39-
--
38+
--

modules/telco-core-cluster-common-use-model-engineering-considerations.adoc

Lines changed: 6 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -8,16 +8,11 @@
88

99
* Cluster workloads are detailed in "Application workloads".
1010
* Worker nodes should run on either of the following CPUs:
11-
** Intel 3rd Generation Xeon (IceLake) CPUs or better when supported by {product-title}, or CPUs with the silicon security bug (Spectre and similar) mitigations turned off.
12-
Skylake and older CPUs can experience 40% transaction performance drops when Spectre and similar mitigations are enabled.
13-
** AMD EPYC Zen 4 CPUs (Genoa, Bergamo, or newer) or better when supported by {product-title}.
14-
+
15-
[NOTE]
16-
====
17-
Currently, per-pod power management is not available for AMD CPUs.
18-
====
11+
** Intel 3rd Generation Xeon (IceLake) CPUs or newer when supported by {product-title}, or CPUs with the silicon security bug (Spectre and similar) mitigations turned off.
12+
Skylake and older CPUs can experience 40% transaction performance drops when Spectre and similar mitigations are enabled. When Skylake and older CPUs change power states, this can cause latency.
13+
** AMD EPYC Zen 4 CPUs (Genoa, Bergamo).
1914
** IRQ balancing is enabled on worker nodes.
20-
The `PerformanceProfile` CR sets `globallyDisableIrqLoadBalancing` to false.
15+
The `PerformanceProfile` CR sets the `globallyDisableIrqLoadBalancing` parameter to a value of `false`.
2116
Guaranteed QoS pods are annotated to ensure isolation as described in "CPU partitioning and performance tuning".
2217
2318
* All cluster nodes should have the following features:
@@ -37,7 +32,7 @@ See "CPU partitioning and performance tuning" for additional considerations.
3732
* CPU requirements for {product-title} depend on the configured feature set and application workload characteristics.
3833
For a cluster configured according to the reference configuration running a simulated workload of 3000 pods as created by the kube-burner node-density test, the following CPU requirements are validated:
3934
** The minimum number of reserved CPUs for control plane and worker nodes is 2 CPUs (4 hyper-threads) per NUMA node.
40-
** The NICs used for non-DPDK network traffic should be configured to use at least 16 RX/TX queues.
35+
** The NICs used for non-DPDK network traffic should be configured to use at most 32 RX/TX queues.
4136
** Nodes with large numbers of pods or other resources might require additional reserved CPUs.
4237
The remaining CPUs are available for user workloads.
4338
@@ -46,3 +41,4 @@ The remaining CPUs are available for user workloads.
4641
====
4742
Variations in {product-title} configuration, workload size, and workload characteristics require additional analysis to determine the effect on the number of required CPUs for the OpenShift platform.
4843
====
44+

modules/telco-core-cluster-network-operator.adoc

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -38,7 +38,9 @@ Review the source code for more details:
3838
* Clusters with single-stack IP configuration are not validated.
3939
* The `reachabilityTotalTimeoutSeconds` parameter in the `Network` CR configures the `EgressIP` node reachability check total timeout in seconds.
4040
The recommended value is `1` second.
41+
* Pod-level SR-IOV bonding mode must be set to `active-backup` and a value in `miimon` must be set (`100` is recommended).
4142

4243
Engineering considerations::
4344
* Pod egress traffic is handled by kernel routing table using the `routingViaHost` option.
4445
Appropriate static routes must be configured in the host.
46+

modules/telco-core-common-baseline-model.adoc

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -17,7 +17,7 @@ Telco core clusters conform to the following requirements:
1717
* Multiple machine config pools
1818
1919
Storage::
20-
Telco core use cases require persistent storage as provided by {rh-storage-first}.
20+
Telco core use cases require persistent storage as provided by {rh-storage}.
2121

2222
Networking::
2323
Telco core cluster networking conforms to the following requirements:
@@ -45,3 +45,4 @@ Service Mesh::
4545
Telco CNFs can use Service Mesh.
4646
All telco core clusters require a Service Mesh implementation.
4747
The choice of implementation and configuration is outside the scope of this specification.
48+

modules/telco-core-cpu-partitioning-and-performance-tuning.adoc

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@
77
= CPU partitioning and performance tuning
88

99
New in this release::
10-
* No reference design updates in this release
10+
* No reference design updates in this release.
1111

1212
Description::
1313
CPU partitioning improves performance and reduces latency by separating sensitive workloads from general-purpose tasks, interrupts, and driver work queues.
@@ -24,10 +24,8 @@ Limits and requirements::
2424
For more information, see "Creating a performance profile".
2525

2626
Engineering considerations::
27-
28-
include::snippets/nodes-cgroup-vi-removed.adoc[]
29-
30-
* The minimum reserved capacity (`systemReserved`) required can be found by following the guidance in the link:https://access.redhat.com/solutions/5843241[Which amount of CPU and memory are recommended to reserve for the system in OpenShift 4 nodes?] Knowledgebase article.
27+
* As of {product-title} 4.19, `cgroup v1` is no longer supported and has been removed. All workloads must now be compatible with `cgroup v2`. For more information, see link:https://www.redhat.com/en/blog/rhel-9-changes-context-red-hat-openshift-workloads[Red Hat Enterprise Linux 9 changes in the context of Red Hat OpenShift workloads](Red Hat Knowledgebase).
28+
* The minimum reserved capacity (`systemReserved`) required can be found by following the guidance in the Red Hat Knowledgebase solution link:https://access.redhat.com/solutions/5843241[Which amount of CPU and memory are recommended to reserve for the system in OCP 4 nodes?]
3129
* The actual required reserved CPU capacity depends on the cluster configuration and workload attributes.
3230
* The reserved CPU value must be rounded up to a full core (2 hyper-threads) alignment.
3331
* Changes to CPU partitioning cause the nodes contained in the relevant machine config pool to be drained and rebooted.
@@ -44,8 +42,10 @@ You do not need to reserve an additional CPU for handling high network throughpu
4442
* If workloads running on the cluster use kernel level networking, the RX/TX queue count for the participating NICs should be set to 16 or 32 queues if the hardware permits it.
4543
Be aware of the default queue count.
4644
With no configuration, the default queue count is one RX/TX queue per online CPU; which can result in too many interrupts being allocated.
45+
* The irdma kernel module may result in the allocation of too many interrupt vectors on systems with high core counts. To prevent this condition the reference configuration excludes this kernel module from loading through a kernel commandline argument in the `PerformanceProfile`. Typically core workloads do not require this kernel module.
4746
+
4847
[NOTE]
4948
====
5049
Some drivers do not deallocate the interrupts even after reducing the queue count.
5150
====
51+

0 commit comments

Comments
 (0)