Skip to content

Commit ce17dec

Browse files
authored
Merge pull request #88622 from aireilly/core-rds-418
TELCODOCS-2045 - Telco core RDS 4.18 docs
2 parents 80cb678 + 66585fb commit ce17dec

File tree

52 files changed

+1119
-499
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

52 files changed

+1119
-499
lines changed

_topic_maps/_topic_map.yml

Lines changed: 6 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -3279,30 +3279,16 @@ Topics:
32793279
File: recommended-infrastructure-practices
32803280
- Name: Recommended etcd practices
32813281
File: recommended-etcd-practices
3282+
- Name: Telco core reference design
3283+
Dir: telco_core_ref_design_specs
3284+
Topics:
3285+
- Name: Telco core reference design specification
3286+
File: telco-core-rds
32823287
- Name: Telco RAN DU reference design
32833288
Dir: telco_ran_du_ref_design_specs
32843289
Topics:
3285-
- Name: Telco RAN DU RDS
3290+
- Name: Telco RAN DU reference design specification
32863291
File: telco-ran-du-rds
3287-
- Name: Reference design specifications
3288-
Dir: telco_ref_design_specs
3289-
Distros: openshift-origin,openshift-enterprise
3290-
Topics:
3291-
- Name: Telco reference design specifications
3292-
File: telco-ref-design-specs-overview
3293-
- Name: Telco core reference design specification
3294-
Dir: core
3295-
Topics:
3296-
- Name: Telco core reference design overview
3297-
File: telco-core-rds-overview
3298-
- Name: Telco core use model overview
3299-
File: telco-core-rds-use-cases
3300-
- Name: Core reference design components
3301-
File: telco-core-ref-design-components
3302-
- Name: Core reference design configuration CRs
3303-
File: telco-core-ref-crs
3304-
- Name: Telco core software specifications
3305-
File: telco-core-ref-software-artifacts
33063292
- Name: Comparing cluster configurations
33073293
Dir: cluster-compare
33083294
Distros: openshift-origin,openshift-enterprise
Loading
81.9 KB
Loading
Lines changed: 23 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,23 @@
1+
// Module included in the following assemblies:
2+
//
3+
// * scalability_and_performance/telco_core_ref_design_specs/telco-core-rds.adoc
4+
5+
:_mod-docs-content-type: REFERENCE
6+
[id="telco-core-about-the-telco-core-cluster-use-model_{context}"]
7+
= About the telco core cluster use model
8+
9+
The telco core cluster use model is designed for clusters that run on commodity hardware.
10+
Telco core clusters support large scale telco applications including control plane functions such as signaling, aggregation, and session border controller (SBC); and centralized data plane functions such as 5G user plane functions (UPF).
11+
Telco core cluster functions require scalability, complex networking support, resilient software-defined storage, and support performance requirements that are less stringent and constrained than far-edge RAN deployments.
12+
13+
.Telco core RDS cluster service-based architecture and networking topology
14+
image::openshift-5g-core-cluster-architecture-networking.png[5G core cluster showing a service-based architecture with overlaid networking topology]
15+
16+
Networking requirements for telco core functions vary widely across a range of networking features and performance points.
17+
IPv6 is a requirement and dual-stack is common.
18+
Some functions need maximum throughput and transaction rate and require support for user-plane DPDK networking.
19+
Other functions use more typical cloud-native patterns and can rely on OVN-Kubernetes, kernel networking, and load balancing.
20+
21+
Telco core clusters are configured as standard with three control plane and two or more worker nodes configured with the stock (non-RT) kernel.
22+
In support of workloads with varying networking and performance requirements, you can segment worker nodes by using `MachineConfigPool` custom resources (CR), for example, for non-user data plane or high-throughput use cases.
23+
In support of required telco operational features, core clusters have a standard set of Day 2 OLM-managed Operators installed.
Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,11 @@
1+
// Module included in the following assemblies:
2+
//
3+
// * scalability_and_performance/telco_core_ref_design_specs/telco-core-rds.adoc
4+
5+
:_mod-docs-content-type: REFERENCE
6+
[id="telco-core-additional-storage-solutions_{context}"]
7+
= Additional storage solutions
8+
You can use other storage solutions to provide persistent storage for telco core clusters.
9+
The configuration and integration of these solutions is outside the scope of the reference design specification (RDS).
10+
11+
Integration of the storage solution into the telco core cluster must include proper sizing and performance analysis to ensure the storage meets overall performance and resource usage requirements.
Lines changed: 33 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,33 @@
1+
// Module included in the following assemblies:
2+
//
3+
// * scalability_and_performance/telco_core_ref_design_specs/telco-core-rds.adoc
4+
5+
:_mod-docs-content-type: REFERENCE
6+
[id="telco-core-agent-based-installer_{context}"]
7+
= Agent-based Installer
8+
9+
New in this release::
10+
* No reference design updates in this release
11+
12+
Description::
13+
+
14+
--
15+
Telco core clusters can be installed by using the Agent-based Installer.
16+
This method allows you to install OpenShift on bare-metal servers without requiring additional servers or VMs for managing the installation.
17+
The Agent-based Installer can be run on any system (for example, from a laptop) to generate an ISO installation image.
18+
The ISO is used as the installation media for the cluster supervisor nodes.
19+
Installation progress can be monitored using the ABI tool from any system with network connectivity to the supervisor node's API interfaces.
20+
21+
ABI supports the following:
22+
23+
* Installation from declarative CRs
24+
* Installation in disconnected environments
25+
* Installation with no additional supporting install or bastion servers required to complete the installation
26+
--
27+
28+
Limits and requirements::
29+
* Disconnected installation requires a registry that is reachable from the installed host, with all required content mirrored in that registry.
30+
31+
Engineering considerations::
32+
* Networking configuration should be applied as NMState configuration during installation.
33+
Day 2 networking configuration using the NMState Operator is not supported.
Lines changed: 37 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,37 @@
1+
// Module included in the following assemblies:
2+
//
3+
// * scalability_and_performance/telco_core_ref_design_specs/telco-core-rds.adoc
4+
5+
:_mod-docs-content-type: REFERENCE
6+
[id="telco-core-application-workloads_{context}"]
7+
= Application workloads
8+
9+
Application workloads running on telco core clusters can include a mix of high performance cloud-native network functions (CNFs) and traditional best-effort or burstable pod workloads.
10+
11+
Guaranteed QoS scheduling is available to pods that require exclusive or dedicated use of CPUs due to performance or security requirements.
12+
Typically, pods that run high performance or latency sensitive CNFs by using user plane networking (for example, DPDK) require exclusive use of dedicated whole CPUs achieved through node tuning and guaranteed QoS scheduling.
13+
When creating pod configurations that require exclusive CPUs, be aware of the potential implications of hyper-threaded systems.
14+
Pods should request multiples of 2 CPUs when the entire core (2 hyper-threads) must be allocated to the pod.
15+
16+
Pods running network functions that do not require high throughput or low latency networking should be scheduled with best-effort or burstable QoS pods and do not require dedicated or isolated CPU cores.
17+
18+
Engineering considerations::
19+
+
20+
--
21+
Use the following information to plan telco core workloads and cluster resources:
22+
23+
* CNF applications should conform to the latest version of https://redhat-best-practices-for-k8s.github.io/guide/[Red Hat Best Practices for Kubernetes].
24+
* Use a mix of best-effort and burstable QoS pods as required by your applications.
25+
** Use guaranteed QoS pods with proper configuration of reserved or isolated CPUs in the `PerformanceProfile` CR that configures the node.
26+
** Guaranteed QoS Pods must include annotations for fully isolating CPUs.
27+
** Best effort and burstable pods are not guaranteed exclusive CPU use.
28+
Workloads can be preempted by other workloads, operating system daemons, or kernel tasks.
29+
* Use exec probes sparingly and only when no other suitable option is available.
30+
** Do not use exec probes if a CNF uses CPU pinning.
31+
Use other probe implementations, for example, `httpGet` or `tcpSocket`.
32+
** When you need to use exec probes, limit the exec probe frequency and quantity.
33+
The maximum number of exec probes must be kept below 10, and the frequency must not be set to less than 10 seconds.
34+
** You can use startup probes, because they do not use significant resources at steady-state operation.
35+
This limitation on exec probes applies primarily to liveness and readiness probes.
36+
Exec probes cause much higher CPU usage on management cores compared to other probe types because they require process forking.
37+
--
Lines changed: 48 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,48 @@
1+
// Module included in the following assemblies:
2+
//
3+
// * scalability_and_performance/telco_core_ref_design_specs/telco-core-rds.adoc
4+
5+
:_mod-docs-content-type: REFERENCE
6+
[id="telco-core-cluster-common-use-model-engineering-considerations_{context}"]
7+
= Telco core cluster common use model engineering considerations
8+
9+
* Cluster workloads are detailed in "Application workloads".
10+
* Worker nodes should run on either of the following CPUs:
11+
** Intel 3rd Generation Xeon (IceLake) CPUs or better when supported by {product-title}, or CPUs with the silicon security bug (Spectre and similar) mitigations turned off.
12+
Skylake and older CPUs can experience 40% transaction performance drops when Spectre and similar mitigations are enabled.
13+
** AMD EPYC Zen 4 CPUs (Genoa, Bergamo, or newer) or better when supported by {product-title}.
14+
+
15+
[NOTE]
16+
====
17+
Currently, per-pod power management is not available for AMD CPUs.
18+
====
19+
** IRQ balancing is enabled on worker nodes.
20+
The `PerformanceProfile` CR sets `globallyDisableIrqLoadBalancing` to false.
21+
Guaranteed QoS pods are annotated to ensure isolation as described in "CPU partitioning and performance tuning".
22+
23+
* All cluster nodes should have the following features:
24+
** Have Hyper-Threading enabled
25+
** Have x86_64 CPU architecture
26+
** Have the stock (non-realtime) kernel enabled
27+
** Are not configured for workload partitioning
28+
29+
* The balance between power management and maximum performance varies between machine config pools in the cluster.
30+
The following configurations should be consistent for all nodes in a machine config pools group.
31+
** Cluster scaling.
32+
See "Scalability" for more information.
33+
** Clusters should be able to scale to at least 120 nodes.
34+
35+
* CPU partitioning is configured using a `PerformanceProfile` CR and is applied to nodes on a per `MachineConfigPool` basis.
36+
See "CPU partitioning and performance tuning" for additional considerations.
37+
* CPU requirements for {product-title} depend on the configured feature set and application workload characteristics.
38+
For a cluster configured according to the reference configuration running a simulated workload of 3000 pods as created by the kube-burner node-density test, the following CPU requirements are validated:
39+
** The minimum number of reserved CPUs for control plane and worker nodes is 2 CPUs (4 hyper-threads) per NUMA node.
40+
** The NICs used for non-DPDK network traffic should be configured to use at least 16 RX/TX queues.
41+
** Nodes with large numbers of pods or other resources might require additional reserved CPUs.
42+
The remaining CPUs are available for user workloads.
43+
44+
+
45+
[NOTE]
46+
====
47+
Variations in {product-title} configuration, workload size, and workload characteristics require additional analysis to determine the effect on the number of required CPUs for the OpenShift platform.
48+
====
Lines changed: 24 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
// Module included in the following assemblies:
22
//
3-
// * scalability_and_performance/telco_ref_design_specs/core/telco-core-ref-design-components.adoc
3+
// * scalability_and_performance/telco_core_ref_design_specs/telco-core-rds.adoc
44

55
:_mod-docs-content-type: REFERENCE
66
[id="telco-core-cluster-network-operator_{context}"]
@@ -10,27 +10,35 @@ New in this release::
1010
* No reference design updates in this release
1111

1212
Description::
13-
The Cluster Network Operator (CNO) deploys and manages the cluster network components including the default OVN-Kubernetes network plugin during {product-title} cluster installation. It allows configuring primary interface MTU settings, OVN gateway modes to use node routing tables for pod egress, and additional secondary networks such as MACVLAN.
13+
+
14+
--
15+
The Cluster Network Operator (CNO) deploys and manages the cluster network components including the default OVN-Kubernetes network plugin during cluster installation.
16+
The CNO allows for configuring primary interface MTU settings, OVN gateway modes to use node routing tables for pod egress, and additional secondary networks such as MACVLAN.
17+
18+
In support of network traffic separation, multiple network interfaces are configured through the CNO.
19+
Traffic steering to these interfaces is configured through static routes applied by using the NMState Operator.
20+
To ensure that pod traffic is properly routed, OVN-K is configured with the `routingViaHost` option enabled.
21+
This setting uses the kernel routing table and the applied static routes rather than OVN for pod egress traffic.
22+
23+
The Whereabouts CNI plugin is used to provide dynamic IPv4 and IPv6 addressing for additional pod network interfaces without the use of a DHCP server.
24+
--
1425

1526
Limits and requirements::
1627
* OVN-Kubernetes is required for IPv6 support.
17-
1828
* Large MTU cluster support requires connected network equipment to be set to the same or larger value.
19-
29+
MTU size up to 8900 is supported.
30+
//https://issues.redhat.com/browse/CNF-10593
2031
* MACVLAN and IPVLAN cannot co-locate on the same main interface due to their reliance on the same underlying kernel mechanism, specifically the `rx_handler`.
2132
This handler allows a third-party module to process incoming packets before the host processes them, and only one such handler can be registered per network interface.
2233
Since both MACVLAN and IPVLAN need to register their own `rx_handler` to function, they conflict and cannot coexist on the same interface.
23-
See link:https://elixir.bootlin.com/linux/v6.10.2/source/drivers/net/ipvlan/ipvlan_main.c#L82[ipvlan/ipvlan_main.c#L82] and link:https://elixir.bootlin.com/linux/v6.10.2/source/drivers/net/macvlan.c#L1260[net/macvlan.c#L1260] for details.
24-
25-
* Alternative NIC configurations include splitting the shared NIC into multiple NICs or using a single dual-port NIC.
26-
+
27-
[IMPORTANT]
28-
====
29-
Splitting the shared NIC into multiple NICs or using a single dual-port NIC has not been validated with the telco core reference design.
30-
====
31-
32-
* Single-stack IP cluster not validated.
33-
34+
Review the source code for more details:
35+
** https://elixir.bootlin.com/linux/v6.10.2/source/drivers/net/ipvlan/ipvlan_main.c#L82[linux/v6.10.2/source/drivers/net/ipvlan/ipvlan_main.c#L82]
36+
** https://elixir.bootlin.com/linux/v6.10.2/source/drivers/net/macvlan.c#L1260[linux/v6.10.2/source/drivers/net/macvlan.c#L1260]
37+
* Alternative NIC configurations include splitting the shared NIC into multiple NICs or using a single dual-port NIC, though they have not been tested and validated.
38+
* Clusters with single-stack IP configuration are not validated.
39+
* The `reachabilityTotalTimeoutSeconds` parameter in the `Network` CR configures the `EgressIP` node reachability check total timeout in seconds.
40+
The recommended value is `1` second.
3441

3542
Engineering considerations::
36-
* Pod egress traffic is handled by kernel routing table with the `routingViaHost` option. Appropriate static routes must be configured in the host.
43+
* Pod egress traffic is handled by kernel routing table using the `routingViaHost` option.
44+
Appropriate static routes must be configured in the host.
Lines changed: 47 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,47 @@
1+
// Module included in the following assemblies:
2+
//
3+
// * scalability_and_performance/telco_core_ref_design_specs/telco-core-rds.adoc
4+
5+
:_mod-docs-content-type: REFERENCE
6+
[id="telco-core-common-baseline-model_{context}"]
7+
= Telco core common baseline model
8+
9+
The following configurations and use models are applicable to all telco core use cases.
10+
The telco core use cases build on this common baseline of features.
11+
12+
Cluster topology::
13+
Telco core clusters conform to the following requirements:
14+
15+
* High availability control plane (three or more control plane nodes)
16+
* Non-schedulable control plane nodes
17+
* Multiple machine config pools
18+
19+
Storage::
20+
Telco core use cases require persistent storage as provided by {rh-storage-first}.
21+
22+
Networking::
23+
Telco core cluster networking conforms to the following requirements:
24+
25+
* Dual stack IPv4/IPv6 (IPv4 primary).
26+
* Fully disconnected – clusters do not have access to public networking at any point in their lifecycle.
27+
* Supports multiple networks.
28+
Segmented networking provides isolation between operations, administration and maintenance (OAM), signaling, and storage traffic.
29+
* Cluster network type is OVN-Kubernetes as required for IPv6 support.
30+
* Telco core clusters have multiple layers of networking supported by underlying RHCOS, SR-IOV Network Operator, Load Balancer and other components.
31+
These layers include the following:
32+
** Cluster networking layer.
33+
The cluster network configuration is defined and applied through the installation configuration.
34+
Update the configuration during Day 2 operations with the NMState Operator.
35+
Use the initial configuration to establish the following:
36+
*** Host interface configuration.
37+
*** Active/active bonding (LACP).
38+
** Secondary/additional network layer.
39+
Configure the {product-title} CNI through network `additionalNetwork` or `NetworkAttachmentDefinition` CRs.
40+
Use the initial configuration to configure MACVLAN virtual network interfaces.
41+
** Application workload layer.
42+
User plane networking runs in cloud-native network functions (CNFs).
43+
44+
Service Mesh::
45+
Telco CNFs can use Service Mesh.
46+
All telco core clusters require a Service Mesh implementation.
47+
The choice of implementation and configuration is outside the scope of this specification.

0 commit comments

Comments
 (0)