Skip to content

OCPBUGS-54188: Update Pod interactions with Topology Manager policies #95111

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 7 additions & 5 deletions modules/pod-interactions-with-topology-manager.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
[id="pod-interactions-with-topology-manager_{context}"]
= Pod interactions with Topology Manager policies

The example `Pod` specs below help illustrate pod interactions with Topology Manager.
The example `Pod` specs illustrate pod interactions with Topology Manager.

The following pod runs in the `BestEffort` QoS class because no resource requests or limits are specified.

Expand All @@ -32,9 +32,11 @@ spec:
memory: "100Mi"
----

If the selected policy is anything other than `none`, Topology Manager would not consider either of these `Pod` specifications.
If the selected policy is anything other than `none`, Topology Manager would consider either of the `BestEffort` or the `Burstable` QoS class `Pod` specifications.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure here. When the topology manager policy is not None, it will indeed try to align all pods, but for pods whose QoS class is not Guaranteed, all the alignment logic will degrade in a no-operation. So, yes, we will do all the dance, but the result will be "no pinning, no alignment"

When the Topology Manager policy is set to `none`, the relevant containers are pinned to any available CPU without considering NUMA affinity. This is the default behavior and does not optimize for performance-sensitive workloads.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we usually mean "pinning" as "run on a precise set of resources", so not sure the terminology is best here. "pinned to anything" is something I don't see used much, but I'm also not a native english speaker.

Copy link
Contributor Author

@amolnar-rh amolnar-rh Jul 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about:

the relevant containers are assigned to run on any available set of CPUs...

Or should we keep it vague and instead of specifying CPU say resources?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"the relevant containers are assigned to run on any available set of CPUs..." seems fine to me

Other values enable the use of topology awareness information from device plugins. The Topology Manager attempts to align the CPU, memory, and device allocations according to the topology of the node when the policy is set to other values than `none`. For more information about the available values, see _Additional resources_.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

device plugins and core resources (cpu, memory)


The last example pod below runs in the Guaranteed QoS class because requests are equal to limits.
The following example pod runs in the `Guaranteed` QoS class because requests are equal to limits.

[source,yaml]
----
Expand All @@ -53,6 +55,6 @@ spec:
example.com/device: "1"
----

Topology Manager would consider this pod. The Topology Manager would consult the hint providers, which are CPU Manager and Device Manager, to get topology hints for the pod.
Topology Manager would consider this pod. The Topology Manager would consult the Hint Providers, which are CPU Manager and Device Manager, to get topology hints for the pod.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CPU Manager, Device Manager and Memory Manager


Topology Manager will use this information to store the best topology for this container. In the case of this pod, CPU Manager and Device Manager will use this stored information at the resource allocation stage.
Topology Manager will use this information to store the best topology for this container. In the case of this pod, CPU Manager and Device Manager will use this stored information at the resource allocation stage.
11 changes: 4 additions & 7 deletions modules/topology-manager-policies.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
// * scaling_and_performance/using-topology-manager.adoc
// * post_installation_configuration/node-tasks.adoc

[id="topology_manager_policies_{context}"]
[id="topology-manager-policies_{context}"]
= Topology Manager policies

Topology Manager aligns `Pod` resources of all Quality of Service (QoS) classes by collecting topology hints from Hint Providers, such as CPU Manager and Device Manager, and using the collected hints to align the `Pod` resources.
Expand All @@ -16,15 +16,12 @@ This is the default policy and does not perform any topology alignment.

`best-effort` policy::

For each container in a pod with the `best-effort` topology management policy, kubelet calls each Hint Provider to discover their resource
availability. Using this information, the Topology Manager stores the preferred NUMA Node affinity for that container. If the affinity is not preferred, Topology Manager stores this and admits the pod to the node.
For each container in a pod with the `best-effort` topology management policy, kubelet calls each Hint Provider to discover their resource availability. Using this information, the Topology Manager stores the preferred NUMA Node affinity for that container. If the affinity is not preferred, Topology Manager stores this and admits the pod to the node.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is technically correct but maybe too low level. The observable behavior of the best-effort policy is that the kubelet will try to align all the required resources on a NUMA node, but if the allocation is impossible (no enough resources) the allocation will spill into other NUMA nodes unpredictably. The pod will always be admitted.

Copy link
Contributor Author

@amolnar-rh amolnar-rh Jul 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to rephrase it. WDYT?

Kubelet tries to align all the required resources on a NUMA node according to the preferred NUMA node affinity for that container. Even if the allocation is not possible due to insufficient resources, the Topology Manager still admits the pod but the allocation is shared with other NUMA nodes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only reason I'm leaving out "unpredictably" is because I feel like we'd need to explain what that means exactly.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your rephrasing seems fine to me, thanks


`restricted` policy::

For each container in a pod with the `restricted` topology management policy, kubelet calls each Hint Provider to discover their resource
availability. Using this information, the Topology Manager stores the preferred NUMA Node affinity for that container. If the affinity is not
preferred, Topology Manager rejects this pod from the node, resulting in a pod in a `Terminated` state with a pod admission failure.
For each container in a pod with the `restricted` topology management policy, kubelet calls each Hint Provider to discover their resource availability. Using this information, the Topology Manager stores the preferred NUMA Node affinity for that container. If the affinity is not preferred, Topology Manager rejects this pod from the node, resulting in a pod in a `Terminated` state with a pod admission failure.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The observable behavior here is that the kubelet will determine the theoretical minimal number of NUMA nodes that can fullfil the request, and reject the admission if the actual allocation would take more than that number of NUMA nodes; otherwise the pod will go running.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you mean that the "pod will go running"? Do you mean that the pod is admitted and it will run/operate?

Except for that part, I rephrased it:

kubelet determines the theoretical minimum number of NUMA nodes that can fulfill the request. If the actual allocation requires more than the that number of NUMA nodes, the Topology Manager rejects the admission, resulting in a pod in a Terminated state with a pod admission failure.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you mean that the "pod will go running"? Do you mean that the pod is admitted and it will run/operate?

yes, precisely.


`single-numa-node` policy::

For each container in a pod with the `single-numa-node` topology management policy, kubelet calls each Hint Provider to discover their resource availability. Using this information, the Topology Manager determines if a single NUMA Node affinity is possible. If it is, the pod is admitted to the node. If a single NUMA Node affinity is not possible, the Topology Manager rejects the pod from the node. This results in a pod in a Terminated state with a pod admission failure.
For each container in a pod with the `single-numa-node` topology management policy, kubelet calls each Hint Provider to discover their resource availability. Using this information, the Topology Manager determines if a single NUMA Node affinity is possible. If it is, the pod is admitted to the node. If a single NUMA Node affinity is not possible, the Topology Manager rejects the pod from the node. This results in a pod in a `Terminated` state with a pod admission failure.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The observable behavior is that the kubelet will admit the pod iff all the resources required by the pod itself can be allocated on a same NUMA node. Arguably, its the same as Restricted with minimal number of NUMA nodes = 1.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PTAL:

kubelet admits the pod if all the resources required by the pod can be allocated on the same NUMA node. If a single NUMA node affinity is not possible, the Topology Manager rejects the pod from the node. This results in a pod in a Terminated state with a pod admission failure.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

5 changes: 5 additions & 0 deletions scalability_and_performance/using-cpu-manager.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -31,3 +31,8 @@ include::modules/topology-manager-policies.adoc[leveloffset=+1]
include::modules/setting-up-topology-manager.adoc[leveloffset=+1]

include::modules/pod-interactions-with-topology-manager.adoc[leveloffset=+1]

[role="_additional-resources"]
.Additional resources

* xref:../scalability_and_performance/using-cpu-manager.adoc#topology_manager_policies_using-cpu-manager-and-topology_manager[Topology Manager policies]