openshift
diff --git a/‎images/openshift-telco-core-rds-networking.png
24.2 KB b/‎images/openshift-telco-core-rds-networking.png
24.2 KB
diff --git a/‎modules/nodes-cluster-worker-latency-profiles-about.adoc
Lines changed: 0 additions & 2 deletions b/‎modules/nodes-cluster-worker-latency-profiles-about.adoc
Lines changed: 0 additions & 2 deletions
diff --git a/‎modules/telco-core-about-the-telco-core-cluster-use-model.adoc
Lines changed: 7 additions & 5 deletions b/‎modules/telco-core-about-the-telco-core-cluster-use-model.adoc
Lines changed: 7 additions & 5 deletions
diff --git a/‎modules/telco-core-additional-storage-solutions.adoc
Lines changed: 1 addition & 0 deletions b/‎modules/telco-core-additional-storage-solutions.adoc
Lines changed: 1 addition & 0 deletions
diff --git a/‎modules/telco-core-agent-based-installer.adoc
Lines changed: 11 additions & 11 deletions b/‎modules/telco-core-agent-based-installer.adoc
Lines changed: 11 additions & 11 deletions
diff --git a/‎modules/telco-core-application-workloads.adoc
Lines changed: 3 additions & 4 deletions b/‎modules/telco-core-application-workloads.adoc
Lines changed: 3 additions & 4 deletions
diff --git a/‎modules/telco-core-cluster-common-use-model-engineering-considerations.adoc
Lines changed: 6 additions & 10 deletions b/‎modules/telco-core-cluster-common-use-model-engineering-considerations.adoc
Lines changed: 6 additions & 10 deletions
diff --git a/‎modules/telco-core-cluster-network-operator.adoc
Lines changed: 2 additions & 0 deletions b/‎modules/telco-core-cluster-network-operator.adoc
Lines changed: 2 additions & 0 deletions
diff --git a/‎modules/telco-core-common-baseline-model.adoc
Lines changed: 2 additions & 1 deletion b/‎modules/telco-core-common-baseline-model.adoc
Lines changed: 2 additions & 1 deletion
diff --git a/‎modules/telco-core-cpu-partitioning-and-performance-tuning.adoc
Lines changed: 5 additions & 5 deletions b/‎modules/telco-core-cpu-partitioning-and-performance-tuning.adoc
Lines changed: 5 additions & 5 deletions
@@ -15,12 +15,10 @@ Setting these parameters manually is not supported. Incorrect parameter settings
 
 All worker latency profiles configure the following parameters:
 
---
 node-status-update-frequency:: Specifies how often the kubelet posts node status to the API server.
 node-monitor-grace-period::  Specifies the amount of time in seconds that the Kubernetes Controller Manager waits for an update from a kubelet before marking the node unhealthy and adding the `node.kubernetes.io/not-ready` or `node.kubernetes.io/unreachable` taint to the node.
 default-not-ready-toleration-seconds:: Specifies the amount of time in seconds after marking a node unhealthy that the Kube API Server Operator waits before evicting pods from that node.
 default-unreachable-toleration-seconds:: Specifies the amount of time in seconds after marking a node unreachable that the Kube API Server Operator waits before evicting pods from that node.
---
 
 The following Operators monitor the changes to the worker latency profiles and respond accordingly:
 
 
@@ -7,17 +7,19 @@
 = About the telco core cluster use model
 
 The telco core cluster use model is designed for clusters that run on commodity hardware.
-Telco core clusters support large scale telco applications including control plane functions such as signaling, aggregation, and session border controller (SBC); and centralized data plane functions such as 5G user plane functions (UPF).
+Telco core clusters support large scale telco applications including control plane functions like signaling, aggregation, session border controller (SBC), and centralized data plane functions such as 5G user plane functions (UPF).
 Telco core cluster functions require scalability, complex networking support, resilient software-defined storage, and support performance requirements that are less stringent and constrained than far-edge RAN deployments.
 
-.Telco core RDS cluster service-based architecture and networking topology
-image::openshift-5g-core-cluster-architecture-networking.png[5G core cluster showing a service-based architecture with overlaid networking topology]
-
 Networking requirements for telco core functions vary widely across a range of networking features and performance points.
 IPv6 is a requirement and dual-stack is common.
 Some functions need maximum throughput and transaction rate and require support for user-plane DPDK networking.
 Other functions use more typical cloud-native patterns and can rely on OVN-Kubernetes, kernel networking, and load balancing.
 
-Telco core clusters are configured as standard with three control plane and two or more worker nodes configured with the stock (non-RT) kernel.
+Telco core clusters are configured as standard with three control plane and one or more worker nodes configured with the stock (non-RT) kernel.
 In support of workloads with varying networking and performance requirements, you can segment worker nodes by using `MachineConfigPool` custom resources (CR), for example, for non-user data plane or high-throughput use cases.
 In support of required telco operational features, core clusters have a standard set of Day 2 OLM-managed Operators installed.
+
+
+.Telco core RDS cluster service-based architecture and networking topology
+image::openshift-5g-core-cluster-architecture-networking.png[5G core cluster showing a service-based architecture with overlaid networking topology]
+
@@ -9,3 +9,4 @@ You can use other storage solutions to provide persistent storage for telco core
 The configuration and integration of these solutions is outside the scope of the reference design specifications (RDS).
 
 Integration of the storage solution into the telco core cluster must include proper sizing and performance analysis to ensure the storage meets overall performance and resource usage requirements.
+
@@ -7,27 +7,27 @@
 = Agent-based Installer
 
 New in this release::
-* No reference design updates in this release
+* No reference design updates in this release.
 
 Description::
 +
 --
-Telco core clusters can be installed by using the Agent-based Installer.
-This method allows you to install OpenShift on bare-metal servers without requiring additional servers or VMs for managing the installation.
+Telco core clusters can be installed using the Agent-based Installer.
+This method allows you to install {product-title} on bare-metal servers without requiring additional servers or VMs for managing the installation.
 The Agent-based Installer can be run on any system (for example, from a laptop) to generate an ISO installation image.
 The ISO is used as the installation media for the cluster supervisor nodes.
-Installation progress can be monitored using the ABI tool from any system with network connectivity to the supervisor node's API interfaces.
+Progress can be monitored using the Agent-based Installer from any system with network connectivity to the supervisor node's API interfaces.
 
-ABI supports the following:
+Agent-based Installer supports the following:
 
-* Installation from declarative CRs
-* Installation in disconnected environments
-* Installation with no additional supporting install or bastion servers required to complete the installation
+* Installation from declarative CRs.
+* Installation in disconnected environments.
+* Installation without the use of additional servers to support installation, for example, the bastion node.
 --
 
 Limits and requirements::
-* Disconnected installation requires a registry that is reachable from the installed host, with all required content mirrored in that registry.
+* Disconnected installation requires a registry with all required content mirrored and reachable from the installed host.
 
 Engineering considerations::
-* Networking configuration should be applied as NMState configuration during installation.
-Day 2 networking configuration using the NMState Operator is not supported.
+* Networking configuration should be applied as NMState configuration during installation. Day 2 networking configuration using the NMState Operator is not supported.
+
@@ -20,8 +20,7 @@ Engineering considerations::
 --
 Use the following information to plan telco core workloads and cluster resources:
 
-include::snippets/nodes-cgroup-vi-removed.adoc[]
-
+* As of {product-title} 4.19, cgroup v1 is no longer supported and has been removed. All workloads must now be compatible with cgroup v2. For more information, see link:https://www.redhat.com/en/blog/rhel-9-changes-context-red-hat-openshift-workloads[Red Hat Enterprise Linux 9 changes in the context of Red Hat OpenShift workloads].
 * CNF applications should conform to the latest version of https://redhat-best-practices-for-k8s.github.io/guide/[Red Hat Best Practices for Kubernetes].
 * Use a mix of best-effort and burstable QoS pods as required by your applications.
 ** Use guaranteed QoS pods with proper configuration of reserved or isolated CPUs in the `PerformanceProfile` CR that configures the node.
@@ -34,6 +33,6 @@ Use other probe implementations, for example, `httpGet` or `tcpSocket`.
 ** When you need to use exec probes, limit the exec probe frequency and quantity.
 The maximum number of exec probes must be kept below 10, and the frequency must not be set to less than 10 seconds.
 ** You can use startup probes, because they do not use significant resources at steady-state operation.
-This limitation on exec probes applies primarily to liveness and readiness probes.
+The limitation on exec probes applies primarily to liveness and readiness probes.
 Exec probes cause much higher CPU usage on management cores compared to other probe types because they require process forking.
---
+--
@@ -8,16 +8,11 @@
 
 * Cluster workloads are detailed in "Application workloads".
 * Worker nodes should run on either of the following CPUs:
-** Intel 3rd Generation Xeon (IceLake) CPUs or better when supported by {product-title}, or CPUs with the silicon security bug (Spectre and similar) mitigations turned off.
-Skylake and older CPUs can experience 40% transaction performance drops when Spectre and similar mitigations are enabled.
-** AMD EPYC Zen 4 CPUs (Genoa, Bergamo, or newer) or better when supported by {product-title}.
-+
-[NOTE]
-====
-Currently, per-pod power management is not available for AMD CPUs.
-====
+** Intel 3rd Generation Xeon (IceLake) CPUs or newer when supported by {product-title}, or CPUs with the silicon security bug (Spectre and similar) mitigations turned off.
+Skylake and older CPUs can experience 40% transaction performance drops when Spectre and similar mitigations are enabled. When Skylake and older CPUs change power states, this can cause latency.
+** AMD EPYC Zen 4 CPUs (Genoa, Bergamo).
 ** IRQ balancing is enabled on worker nodes.
-The `PerformanceProfile` CR sets `globallyDisableIrqLoadBalancing` to false.
+The `PerformanceProfile` CR sets the `globallyDisableIrqLoadBalancing` parameter to a value of `false`.
 Guaranteed QoS pods are annotated to ensure isolation as described in "CPU partitioning and performance tuning".
 
 * All cluster nodes should have the following features:
@@ -37,7 +32,7 @@ See "CPU partitioning and performance tuning" for additional considerations.
 * CPU requirements for {product-title} depend on the configured feature set and application workload characteristics.
 For a cluster configured according to the reference configuration running a simulated workload of 3000 pods as created by the kube-burner node-density test, the following CPU requirements are validated:
 ** The minimum number of reserved CPUs for control plane and worker nodes is 2 CPUs (4 hyper-threads) per NUMA node.
-** The NICs used for non-DPDK network traffic should be configured to use at least 16 RX/TX queues.
+** The NICs used for non-DPDK network traffic should be configured to use at most 32 RX/TX queues.
 ** Nodes with large numbers of pods or other resources might require additional reserved CPUs.
 The remaining CPUs are available for user workloads.
 
@@ -46,3 +41,4 @@ The remaining CPUs are available for user workloads.
 ====
 Variations in {product-title} configuration, workload size, and workload characteristics require additional analysis to determine the effect on the number of required CPUs for the OpenShift platform.
 ====
+
@@ -38,7 +38,9 @@ Review the source code for more details:
 * Clusters with single-stack IP configuration are not validated.
 * The `reachabilityTotalTimeoutSeconds` parameter in the `Network` CR configures the `EgressIP` node reachability check total timeout in seconds.
 The recommended value is `1` second.
+* Pod-level SR-IOV bonding mode must be set to `active-backup` and a value in `miimon` must be set (`100` is recommended).
 
 Engineering considerations::
 * Pod egress traffic is handled by kernel routing table using the `routingViaHost` option.
 Appropriate static routes must be configured in the host.
+
@@ -17,7 +17,7 @@ Telco core clusters conform to the following requirements:
 * Multiple machine config pools
 
 Storage::
-Telco core use cases require persistent storage as provided by {rh-storage-first}.
+Telco core use cases require persistent storage as provided by {rh-storage}.
 
 Networking::
 Telco core cluster networking conforms to the following requirements:
@@ -45,3 +45,4 @@ Service Mesh::
 Telco CNFs can use Service Mesh.
 All telco core clusters require a Service Mesh implementation.
 The choice of implementation and configuration is outside the scope of this specification.
+
@@ -7,7 +7,7 @@
 = CPU partitioning and performance tuning
 
 New in this release::
-* No reference design updates in this release
+* No reference design updates in this release.
 
 Description::
 CPU partitioning improves performance and reduces latency by separating sensitive workloads from general-purpose tasks, interrupts, and driver work queues.
@@ -24,10 +24,8 @@ Limits and requirements::
 For more information, see "Creating a performance profile".
 
 Engineering considerations::
-
-include::snippets/nodes-cgroup-vi-removed.adoc[]
-
-* The minimum reserved capacity (`systemReserved`) required can be found by following the guidance in the link:https://access.redhat.com/solutions/5843241[Which amount of CPU and memory are recommended to reserve for the system in OpenShift 4 nodes?] Knowledgebase article.
+* As of {product-title} 4.19, `cgroup v1` is no longer supported and has been removed. All workloads must now be compatible with `cgroup v2`. For more information, see link:https://www.redhat.com/en/blog/rhel-9-changes-context-red-hat-openshift-workloads[Red Hat Enterprise Linux 9 changes in the context of Red Hat OpenShift workloads](Red Hat Knowledgebase).
+* The minimum reserved capacity (`systemReserved`) required can be found by following the guidance in the Red Hat Knowledgebase solution link:https://access.redhat.com/solutions/5843241[Which amount of CPU and memory are recommended to reserve for the system in OCP 4 nodes?]
 * The actual required reserved CPU capacity depends on the cluster configuration and workload attributes.
 * The reserved CPU value must be rounded up to a full core (2 hyper-threads) alignment.
 * Changes to CPU partitioning cause the nodes contained in the relevant machine config pool to be drained and rebooted.
@@ -44,8 +42,10 @@ You do not need to reserve an additional CPU for handling high network throughpu
 * If workloads running on the cluster use kernel level networking, the RX/TX queue count for the participating NICs should be set to 16 or 32 queues if the hardware permits it.
 Be aware of the default queue count.
 With no configuration, the default queue count is one RX/TX queue per online CPU; which can result in too many interrupts being allocated.
+* The irdma kernel module may result in the allocation of too many interrupt vectors on systems with high core counts. To prevent this condition the reference configuration excludes this kernel module from loading through a kernel commandline argument in the `PerformanceProfile`. Typically core workloads do not require this kernel module.
 +
 [NOTE]
 ====
 Some drivers do not deallocate the interrupts even after reducing the queue count.
 ====
+
Original file line number	Diff line number	Diff line change
`@@ -9,3 +9,4 @@ You can use other storage solutions to provide persistent storage for telco core`
`9`	`9`	`The configuration and integration of these solutions is outside the scope of the reference design specifications (RDS).`
`10`	`10`
`11`	`11`	`Integration of the storage solution into the telco core cluster must include proper sizing and performance analysis to ensure the storage meets overall performance and resource usage requirements.`
	`12`	`+`