Merge pull request #76483 from SNiemann15/cpu_manager_ibmz

mburke5678 · web-flow · commit fdc7c6ee0467 · 2024-06-21T11:14:29.000-04:00
[MULTIARCH-4234] Add further steps to cpu manager setup
diff --git a/modules/setting-up-cpu-manager.adoc b/modules/setting-up-cpu-manager.adoc
@@ -7,23 +7,25 @@
 [id="setting_up_cpu_manager_{context}"]
 = Setting up CPU Manager
 
+To configure CPU manager, create a KubeletConfig custom resource (CR) and apply it to the desired set of nodes.
+
 .Procedure
 
-. Optional: Label a node:
+. Label a node by running the following command:
 +
 [source,terminal]
 ----
 # oc label node perf-node.example.com cpumanager=true
 ----
 
-. Edit the `MachineConfigPool` of the nodes where CPU Manager should be enabled. In this example, all workers have CPU Manager enabled:
+. To enable CPU Manager for all compute nodes, edit the CR by running the following command:
 +
 [source,terminal]
 ----
 # oc edit machineconfigpool worker
 ----
 
-. Add a label to the worker machine config pool:
+. Add the `custom-kubelet: cpumanager-enabled` label to `metadata.labels` section.
 +
 [source,yaml]
 ----
@@ -55,7 +57,7 @@ spec:
 * `static`. This policy allows containers in guaranteed pods with integer CPU requests. It also limits access to exclusive CPUs on the node. If `static`, you must use a lowercase `s`.
 <2> Optional. Specify the CPU Manager reconcile frequency. The default is `5s`.
 
-. Create the dynamic kubelet config:
+. Create the dynamic kubelet config by running the following command:
 +
 [source,terminal]
 ----
@@ -64,7 +66,7 @@ spec:
 +
 This adds the CPU Manager feature to the kubelet config and, if needed, the Machine Config Operator (MCO) reboots the node. To enable CPU Manager, a reboot is not needed.
 
-. Check for the merged kubelet config:
+. Check for the merged kubelet config by running the following command:
 +
 [source,terminal]
 ----
@@ -84,7 +86,7 @@ This adds the CPU Manager feature to the kubelet config and, if needed, the Mach
         ]
 ----
 
-. Check the worker for the updated `kubelet.conf`:
+. Check the compute node for the updated `kubelet.conf` file by running the following command:
 +
 [source,terminal]
 ----
@@ -101,6 +103,13 @@ cpuManagerReconcilePeriod: 5s   <2>
 <1> `cpuManagerPolicy` is defined when you create the `KubeletConfig` CR.
 <2> `cpuManagerReconcilePeriod` is defined when you create the `KubeletConfig` CR.
 
+. Create a project by running the following command:
++
+[source,terminal]
+----
+$ oc new-project <project_name>
+----
+
 . Create a pod that requests a core or multiple cores. Both limits and requests must have their CPU value set to a whole integer. That is the number of cores that will be dedicated to this pod:
 +
 [source,terminal]
@@ -145,7 +154,9 @@ spec:
 # oc create -f cpumanager-pod.yaml
 ----
 
-. Verify that the pod is scheduled to the node that you labeled:
+.Verification
+
+. Verify that the pod is scheduled to the node that you labeled by running the following command:
 +
 [source,terminal]
 ----
@@ -172,34 +183,73 @@ QoS Class:       Guaranteed
 Node-Selectors:  cpumanager=true
 ----
 
-. Verify that the `cgroups` are set up correctly. Get the process ID (PID) of the `pause` process:
+. Verify that a CPU has been exclusively assigned to the pod by running the following command:
 +
 [source,terminal]
 ----
+# oc describe node --selector='cpumanager=true' | grep -i cpumanager- -B2
+----
++
+.Example output
+[source,terminal]
+----
+NAMESPACE    NAME                CPU Requests  CPU Limits  Memory Requests  Memory Limits  Age
+cpuman       cpumanager-mlrrz    1 (28%)       1 (28%)     1G (13%)         1G (13%)       27m
+----
+
+. Verify that the `cgroups` are set up correctly. Get the process ID (PID) of the `pause` process by running the following commands:
++
+[source,terminal]
+----
+# oc debug node/perf-node.example.com
+----
++
+[source,terminal]
+----
+sh-4.2# systemctl status | grep -B5 pause
+----
++
+[NOTE]
+====
+If the output returns multiple pause process entries, you must identify the correct pause process.
+====
++
+.Example output
+[source,terminal]
+----
 # ├─init.scope
 │ └─1 /usr/lib/systemd/systemd --switched-root --system --deserialize 17
 └─kubepods.slice
   ├─kubepods-pod69c01f8e_6b74_11e9_ac0f_0a2b62178a22.slice
   │ ├─crio-b5437308f1a574c542bdf08563b865c0345c8f8c0b0a655612c.scope
   │ └─32706 /pause
 ----
+
+. Verify that pods of quality of service (QoS) tier `Guaranteed` are placed within the `kubepods.slice` subdirectory by running the following commands:
 +
-Pods of quality of service (QoS) tier `Guaranteed` are placed within the `kubepods.slice`. Pods of other QoS tiers end up in child `cgroups` of `kubepods`:
+[source,terminal]
+----
+# cd /sys/fs/cgroup/kubepods.slice/kubepods-pod69c01f8e_6b74_11e9_ac0f_0a2b62178a22.slice/crio-b5437308f1ad1a7db0574c542bdf08563b865c0345c86e9585f8c0b0a655612c.scope
+----
 +
 [source,terminal]
 ----
-# cd /sys/fs/cgroup/cpuset/kubepods.slice/kubepods-pod69c01f8e_6b74_11e9_ac0f_0a2b62178a22.slice/crio-b5437308f1ad1a7db0574c542bdf08563b865c0345c86e9585f8c0b0a655612c.scope
-# for i in `ls cpuset.cpus tasks` ; do echo -n "$i "; cat $i ; done
+# for i in `ls cpuset.cpus cgroup.procs` ; do echo -n "$i "; cat $i ; done
 ----
 +
+[NOTE]
+====
+Pods of other QoS tiers end up in child `cgroups` of the parent `kubepods`.
+====
++
 .Example output
 [source,terminal]
 ----
 cpuset.cpus 1
 tasks 32706
 ----
 
-. Check the allowed CPU list for the task:
+. Check the allowed CPU list for the task by running the following command:
 +
 [source,terminal]
 ----
@@ -212,12 +262,15 @@ tasks 32706
  Cpus_allowed_list:    1
 ----
 
-. Verify that another pod (in this case, the pod in the `burstable` QoS tier) on the system cannot run on the core allocated for the `Guaranteed` pod:
+. Verify that another pod on the system cannot run on the core allocated for the `Guaranteed` pod. For example, to verify the pod in the `besteffort` QoS tier, run the following commands:
++
+[source,terminal]
+----
+# cat /sys/fs/cgroup/kubepods.slice/kubepods-besteffort.slice/kubepods-besteffort-podc494a073_6b77_11e9_98c0_06bba5c387ea.slice/crio-c56982f57b75a2420947f0afc6cafe7534c5734efc34157525fa9abbf99e3849.scope/cpuset.cpus
+----
 +
 [source,terminal]
 ----
-# cat /sys/fs/cgroup/cpuset/kubepods.slice/kubepods-besteffort.slice/kubepods-besteffort-podc494a073_6b77_11e9_98c0_06bba5c387ea.slice/crio-c56982f57b75a2420947f0afc6cafe7534c5734efc34157525fa9abbf99e3849.scope/cpuset.cpus
-0
 # oc describe node perf-node.example.com
 ----
 +