Skip to content

Commit e81808a

Browse files
committed
TELCODOCS-1785 Improving NROP documentation
Trying to break up the wall of text with headings 3
1 parent df70980 commit e81808a

13 files changed

+359
-114
lines changed

modules/cnf-about-collecting-nro-data.adoc

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -20,5 +20,5 @@ You can use the `oc adm must-gather` CLI command to collect information about yo
2020
+
2121
[source,terminal,subs="attributes+"]
2222
----
23-
$ oc adm must-gather --image=registry.redhat.io/numaresources-must-gather/numaresources-must-gather-rhel9:{product-version}
23+
$ oc adm must-gather --image=registry.redhat.io/numaresources-must-gather/numaresources-must-gather-rhel9:v{product-version}
2424
----

modules/cnf-about-numa-aware-scheduling.adoc

Lines changed: 30 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -6,13 +6,40 @@
66
[id="cnf-about-numa-aware-scheduling_{context}"]
77
= About NUMA-aware scheduling
88

9-
Non-Uniform Memory Access (NUMA) is a compute platform architecture that allows different CPUs to access different regions of memory at different speeds. NUMA resource topology refers to the locations of CPUs, memory, and PCI devices relative to each other in the compute node. Co-located resources are said to be in the same _NUMA zone_. For high-performance applications, the cluster needs to process pod workloads in a single NUMA zone.
9+
[discrete]
10+
[id="introduction-to-numa_{context}"]
11+
== Introduction to NUMA
1012

11-
NUMA architecture allows a CPU with multiple memory controllers to use any available memory across CPU complexes, regardless of where the memory is located. This allows for increased flexibility at the expense of performance. A CPU processing a workload using memory that is outside its NUMA zone is slower than a workload processed in a single NUMA zone. Also, for I/O-constrained workloads, the network interface on a distant NUMA zone slows down how quickly information can reach the application. High-performance workloads, such as telecommunications workloads, cannot operate to specification under these conditions. NUMA-aware scheduling aligns the requested cluster compute resources (CPUs, memory, devices) in the same NUMA zone to process latency-sensitive or high-performance workloads efficiently. NUMA-aware scheduling also improves pod density per compute node for greater resource efficiency.
13+
Non-Uniform Memory Access (NUMA) is a compute platform architecture that allows different CPUs to access different regions of memory at different speeds. NUMA resource topology refers to the locations of CPUs, memory, and PCI devices relative to each other in the compute node. Colocated resources are said to be in the same _NUMA zone_. For high-performance applications, the cluster needs to process pod workloads in a single NUMA zone.
14+
15+
[discrete]
16+
[id="performance-considerations_{context}"]
17+
== Performance considerations
18+
19+
NUMA architecture allows a CPU with multiple memory controllers to use any available memory across CPU complexes, regardless of where the memory is located. This allows for increased flexibility at the expense of performance. A CPU processing a workload using memory that is outside its NUMA zone is slower than a workload processed in a single NUMA zone. Also, for I/O-constrained workloads, the network interface on a distant NUMA zone slows down how quickly information can reach the application. High-performance workloads, such as telecommunications workloads, cannot operate to specification under these conditions.
20+
21+
[discrete]
22+
[id="numa-aware-scheduling_{context}"]
23+
== NUMA-aware scheduling
24+
25+
NUMA-aware scheduling aligns the requested cluster compute resources (CPUs, memory, devices) in the same NUMA zone to process latency-sensitive or high-performance workloads efficiently. NUMA-aware scheduling also improves pod density per compute node for greater resource efficiency.
26+
27+
[discrete]
28+
[id="integration-with-node-tuning-operator_{context}"]
29+
== Integration with Node Tuning Operator
1230

1331
By integrating the Node Tuning Operator's performance profile with NUMA-aware scheduling, you can further configure CPU affinity to optimize performance for latency-sensitive workloads.
1432

15-
The default {product-title} pod scheduler scheduling logic considers the available resources of the entire compute node, not individual NUMA zones. If the most restrictive resource alignment is requested in the kubelet topology manager, error conditions can occur when admitting the pod to a node. Conversely, if the most restrictive resource alignment is not requested, the pod can be admitted to the node without proper resource alignment, leading to worse or unpredictable performance. For example, runaway pod creation with `Topology Affinity Error` statuses can occur when the pod scheduler makes suboptimal scheduling decisions for guaranteed pod workloads by not knowing if the pod's requested resources are available. Scheduling mismatch decisions can cause indefinite pod startup delays. Also, depending on the cluster state and resource allocation, poor pod scheduling decisions can cause extra load on the cluster because of failed startup attempts.
33+
[discrete]
34+
[id="default-scheduling-logic_{context}"]
35+
== Default scheduling logic
36+
37+
The default {product-title} pod scheduler scheduling logic considers the available resources of the entire compute node, not individual NUMA zones. If the most restrictive resource alignment is requested in the kubelet topology manager, error conditions can occur when admitting the pod to a node. Conversely, if the most restrictive resource alignment is not requested, the pod can be admitted to the node without proper resource alignment, leading to worse or unpredictable performance. For example, runaway pod creation with `Topology Affinity Error` statuses can occur when the pod scheduler makes suboptimal scheduling decisions for guaranteed pod workloads without knowing if the pod's requested resources are available. Scheduling mismatch decisions can cause indefinite pod startup delays. Also, depending on the cluster state and resource allocation, poor pod scheduling decisions can cause extra load on the cluster because of failed startup attempts.
38+
39+
40+
[discrete]
41+
[id="numa-aware-pod-scheduling-diagram_{context}"]
42+
== NUMA-aware pod scheduling diagram
1643

1744
The NUMA Resources Operator deploys a custom NUMA resources secondary scheduler and other resources to mitigate against the shortcomings of the default {product-title} pod scheduler. The following diagram provides a high-level overview of NUMA-aware pod scheduling.
1845

Lines changed: 60 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,60 @@
1+
// Module included in the following assemblies:
2+
//
3+
// *scalability_and_performance/cnf-numa-aware-scheduling.adoc
4+
5+
:_module-type: PROCEDURE
6+
[id="cnf-configuring-kubelet-config-nro_{context}"]
7+
= Creating a KubeletConfig CRD
8+
9+
The recommended way to configure a single NUMA node policy is to apply a performance profile. Another way is by creating and applying a `KubeletConfig` custom resource (CR), as shown in the following procedure.
10+
11+
.Procedure
12+
13+
. Create the `KubeletConfig` custom resource (CR) that configures the pod admittance policy for the machine profile:
14+
15+
.. Save the following YAML in the `nro-kubeletconfig.yaml` file:
16+
+
17+
[source,yaml]
18+
----
19+
apiVersion: machineconfiguration.openshift.io/v1
20+
kind: KubeletConfig
21+
metadata:
22+
name: worker-tuning
23+
spec:
24+
machineConfigPoolSelector:
25+
matchLabels:
26+
pools.operator.machineconfiguration.openshift.io/worker: "" <1>
27+
kubeletConfig:
28+
cpuManagerPolicy: "static" <2>
29+
cpuManagerReconcilePeriod: "5s"
30+
reservedSystemCPUs: "0,1" <3>
31+
memoryManagerPolicy: "Static" <4>
32+
evictionHard:
33+
memory.available: "100Mi"
34+
kubeReserved:
35+
memory: "512Mi"
36+
reservedMemory:
37+
- numaNode: 0
38+
limits:
39+
memory: "1124Mi"
40+
systemReserved:
41+
memory: "512Mi"
42+
topologyManagerPolicy: "single-numa-node" <5>
43+
----
44+
<1> Adjust this label to match the the `machineConfigPoolSelector` in the `NUMAResourcesOperator` CR.
45+
<2> For `cpuManagerPolicy`, `static` must use a lowercase `s`.
46+
<3> Adjust this based on the CPU on your nodes.
47+
<4> For `memoryManagerPolicy`, `Static` must use an uppercase `S`.
48+
<5> `topologyManagerPolicy` must be set to `single-numa-node`.
49+
50+
.. Create the `KubeletConfig` CR by running the following command:
51+
+
52+
[source,terminal]
53+
----
54+
$ oc create -f nro-kubeletconfig.yaml
55+
----
56+
+
57+
[NOTE]
58+
====
59+
Applying performance profile or `KubeletConfig` automatically triggers rebooting of the nodes. If no reboot is triggered, you can troubleshoot the issue by looking at the labels in `KubeletConfig` that address the node group.
60+
====

modules/cnf-configuring-node-groups-for-the-numaresourcesoperator.adoc

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -46,7 +46,7 @@ spec:
4646
----
4747
<1> Valid values are `Periodic`, `Events`, `PeriodicAndEvents`. Use `Periodic` to poll the kubelet at intervals that you define in `infoRefreshPeriod`. Use `Events` to poll the kubelet at every pod lifecycle event. Use `PeriodicAndEvents` to enable both methods.
4848
<2> Define the polling interval for `Periodic` or `PeriodicAndEvents` refresh modes. The field is ignored if the refresh mode is `Events`.
49-
<3> Valid values are `Enabled` or `Disabled`. Setting to `Enabled` is a requirement for the `cacheResyncPeriod` specification in the `NUMAResourcesScheduler`.
49+
<3> Valid values are `Enabled`, `Disabled`, and `EnabledExclusiveResources`. Setting to `Enabled` is a requirement for the `cacheResyncPeriod` specification in the `NUMAResourcesScheduler`.
5050

5151
.Verification
5252

Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,16 @@
1+
// Module included in the following assemblies:
2+
//
3+
// *scalability_and_performance/cnf-numa-aware-scheduling.adoc
4+
5+
:_module-type: PROCEDURE
6+
[id="cnf-configuring-single-numa-policy_{context}"]
7+
= Configuring a single NUMA node policy
8+
9+
The NUMA Resources Operator requires a single NUMA node policy to be configured on the cluster. This can be achieved in two ways: by creating and applying a performance profile, or by configuring a KubeletConfig.
10+
11+
[NOTE]
12+
====
13+
The preferred way to configure a single NUMA node policy is to apply a performance profile. You can use the Performance Profile Creator (PPC) tool to create the performance profile. If a performance profile is created on the cluster, it automatically creates other tuning components like `KubeletConfig` and the `tuned` profile.
14+
====
15+
16+
For more information about creating a performance profile, see "About the Performance Profile Creator" in the "Additional Resources" section.

modules/cnf-creating-nrop-cr.adoc

Lines changed: 28 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -18,7 +18,7 @@ When you have installed the NUMA Resources Operator, then create the `NUMAResour
1818

1919
. Create the `NUMAResourcesOperator` custom resource:
2020

21-
.. Save the following YAML in the `nrop.yaml` file:
21+
.. Save the following minimal required YAML file example as `nrop.yaml`:
2222
+
2323
[source,yaml]
2424
----
@@ -30,19 +30,26 @@ spec:
3030
nodeGroups:
3131
- machineConfigPoolSelector:
3232
matchLabels:
33-
pools.operator.machineconfiguration.openshift.io/worker: ""
33+
pools.operator.machineconfiguration.openshift.io/worker: "" <1>
3434
----
35+
+
36+
<1> This should match the `MachineConfigPool` that you want to configure the NUMA Resources Operator on. For example, you might have created a `MachineConfigPool` named `worker-cnf` that designates a set of nodes expected to run telecommunications workloads.
3537

3638
.. Create the `NUMAResourcesOperator` CR by running the following command:
3739
+
3840
[source,terminal]
3941
----
4042
$ oc create -f nrop.yaml
4143
----
44+
+
45+
[NOTE]
46+
====
47+
Creating the `NUMAResourcesOperator` triggers a reboot on the corresponding machine config pool and therefore the affected node.
48+
====
4249

4350
.Verification
4451

45-
* Verify that the NUMA Resources Operator deployed successfully by running the following command:
52+
. Verify that the NUMA Resources Operator deployed successfully by running the following command:
4653
+
4754
[source,terminal]
4855
----
@@ -53,5 +60,22 @@ $ oc get numaresourcesoperators.nodetopology.openshift.io
5360
[source,terminal]
5461
----
5562
NAME AGE
56-
numaresourcesoperator 10m
63+
numaresourcesoperator 27s
64+
----
65+
66+
. After a few minutes, run the following command to verify that the required resources deployed successfully:
67+
+
68+
[source,terminal]
69+
----
70+
$ oc get all -n openshift-numaresources
71+
----
72+
+
73+
.Example output
74+
[source,terminal]
5775
----
76+
NAME READY STATUS RESTARTS AGE
77+
pod/numaresources-controller-manager-7d9d84c58d-qk2mr 1/1 Running 0 12m
78+
pod/numaresourcesoperator-worker-7d96r 2/2 Running 0 97s
79+
pod/numaresourcesoperator-worker-crsht 2/2 Running 0 97s
80+
pod/numaresourcesoperator-worker-jp9mw 2/2 Running 0 97s
81+
----

modules/cnf-deploying-the-numa-aware-scheduler.adoc

Lines changed: 15 additions & 77 deletions
Original file line numberDiff line numberDiff line change
@@ -8,59 +8,11 @@
88

99
After you install the NUMA Resources Operator, do the following to deploy the NUMA-aware secondary pod scheduler:
1010

11-
* Configure the performance profile.
12-
13-
* Deploy the NUMA-aware secondary scheduler.
14-
15-
.Prerequisites
16-
17-
* Install the OpenShift CLI (`oc`).
18-
19-
* Log in as a user with `cluster-admin` privileges.
20-
21-
* Create the required machine config pool.
22-
23-
* Install the NUMA Resources Operator.
24-
2511
.Procedure
2612

27-
. Create the `PerformanceProfile` custom resource (CR):
28-
29-
.. Save the following YAML in the `nro-perfprof.yaml` file:
30-
+
31-
[source,yaml]
32-
----
33-
apiVersion: performance.openshift.io/v2
34-
kind: PerformanceProfile
35-
metadata:
36-
name: perfprof-nrop
37-
spec:
38-
cpu: <1>
39-
isolated: "4-51,56-103"
40-
reserved: "0,1,2,3,52,53,54,55"
41-
nodeSelector:
42-
node-role.kubernetes.io/worker: ""
43-
numa:
44-
topologyPolicy: single-numa-node
45-
----
46-
<1> The `cpu.isolated` and `cpu.reserved` specifications define ranges for isolated and reserved CPUs. Enter valid values for your CPU configuration. See the _Additional resources_ section for more information about configuring a performance profile.
47-
48-
.. Create the `PerformanceProfile` CR by running the following command:
49-
+
50-
[source,terminal]
51-
----
52-
$ oc create -f nro-perfprof.yaml
53-
----
54-
+
55-
.Example output
56-
[source,terminal]
57-
----
58-
performanceprofile.performance.openshift.io/perfprof-nrop created
59-
----
60-
6113
. Create the `NUMAResourcesScheduler` custom resource that deploys the NUMA-aware custom pod scheduler:
6214

63-
.. Save the following YAML in the `nro-scheduler.yaml` file:
15+
.. Save the following minimal required YAML in the `nro-scheduler.yaml` file:
6416
+
6517
[source,yaml,subs="attributes+"]
6618
----
@@ -70,16 +22,7 @@ metadata:
7022
name: numaresourcesscheduler
7123
spec:
7224
imageSpec: "registry.redhat.io/openshift4/noderesourcetopology-scheduler-rhel9:v{product-version}"
73-
cacheResyncPeriod: "5s" <1>
7425
----
75-
<1> Enter an interval value in seconds for synchronization of the scheduler cache. A value of `5s` is typical for most implementations.
76-
+
77-
[NOTE]
78-
====
79-
* Enable the `cacheResyncPeriod` specification to help the NUMA Resource Operator report more exact resource availability by monitoring pending resources on nodes and synchronizing this information in the scheduler cache at a defined interval. This also helps to minimize `Topology Affinity Error` errors because of sub-optimal scheduling decisions. The lower the interval the greater the network load. The `cacheResyncPeriod` specification is disabled by default.
80-
81-
* Setting a value of `Enabled` for the `podsFingerprinting` specification in the `NUMAResourcesOperator` CR is a requirement for the implementation of the `cacheResyncPeriod` specification.
82-
====
8326

8427
.. Create the `NUMAResourcesScheduler` CR by running the following command:
8528
+
@@ -88,16 +31,7 @@ spec:
8831
$ oc create -f nro-scheduler.yaml
8932
----
9033

91-
.Verification
92-
93-
. Verify that the performance profile was applied by running the following command:
94-
+
95-
[source,terminal]
96-
----
97-
$ oc describe performanceprofile <performance-profile-name>
98-
----
99-
100-
. Verify that the required resources deployed successfully by running the following command:
34+
. After a few seconds, run the following command to confirm the successful deployment of the required resources:
10135
+
10236
[source,terminal]
10337
----
@@ -108,16 +42,20 @@ $ oc get all -n openshift-numaresources
10842
[source,terminal]
10943
----
11044
NAME READY STATUS RESTARTS AGE
111-
pod/numaresources-controller-manager-7575848485-bns4s 1/1 Running 0 13m
112-
pod/numaresourcesoperator-worker-dvj4n 2/2 Running 0 16m
113-
pod/numaresourcesoperator-worker-lcg4t 2/2 Running 0 16m
114-
pod/secondary-scheduler-56994cf6cf-7qf4q 1/1 Running 0 16m
45+
pod/numaresources-controller-manager-7d9d84c58d-qk2mr 1/1 Running 0 12m
46+
pod/numaresourcesoperator-worker-7d96r 2/2 Running 0 97s
47+
pod/numaresourcesoperator-worker-crsht 2/2 Running 0 97s
48+
pod/numaresourcesoperator-worker-jp9mw 2/2 Running 0 97s
49+
pod/secondary-scheduler-847cb74f84-9whlm 1/1 Running 0 10m
50+
11551
NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE
116-
daemonset.apps/numaresourcesoperator-worker 2 2 2 2 2 node-role.kubernetes.io/worker= 16m
52+
daemonset.apps/numaresourcesoperator-worker 3 3 3 3 3 node-role.kubernetes.io/worker= 98s
53+
11754
NAME READY UP-TO-DATE AVAILABLE AGE
118-
deployment.apps/numaresources-controller-manager 1/1 1 1 13m
119-
deployment.apps/secondary-scheduler 1/1 1 1 16m
55+
deployment.apps/numaresources-controller-manager 1/1 1 1 12m
56+
deployment.apps/secondary-scheduler 1/1 1 1 10m
57+
12058
NAME DESIRED CURRENT READY AGE
121-
replicaset.apps/numaresources-controller-manager-7575848485 1 1 1 13m
122-
replicaset.apps/secondary-scheduler-56994cf6cf 1 1 1 16m
59+
replicaset.apps/numaresources-controller-manager-7d9d84c58d 1 1 1 12m
60+
replicaset.apps/secondary-scheduler-847cb74f84 1 1 1 10m
12361
----

modules/cnf-installing-numa-resources-operator-console.adoc

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -20,7 +20,7 @@ As a cluster administrator, you can install the NUMA Resources Operator using th
2020

2121
.. In the {product-title} web console, click *Operators* -> *OperatorHub*.
2222

23-
.. Choose *NUMA Resources Operator* from the list of available Operators, and then click *Install*.
23+
.. Choose *numaresources-operator* from the list of available Operators, and then click *Install*.
2424

2525
.. In the *Installed Namespaces* field, select the `openshift-numaresources` namespace, and then click *Install*.
2626

0 commit comments

Comments
 (0)