-
Notifications
You must be signed in to change notification settings - Fork 1.8k
OCPBUGS-54188: Update Pod interactions with Topology Manager policies #95111
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -5,7 +5,7 @@ | |
[id="pod-interactions-with-topology-manager_{context}"] | ||
= Pod interactions with Topology Manager policies | ||
|
||
The example `Pod` specs below help illustrate pod interactions with Topology Manager. | ||
The example `Pod` specs illustrate pod interactions with Topology Manager. | ||
|
||
The following pod runs in the `BestEffort` QoS class because no resource requests or limits are specified. | ||
|
||
|
@@ -32,9 +32,11 @@ spec: | |
memory: "100Mi" | ||
---- | ||
|
||
If the selected policy is anything other than `none`, Topology Manager would not consider either of these `Pod` specifications. | ||
If the selected policy is anything other than `none`, Topology Manager would consider either of the `BestEffort` or the `Burstable` QoS class `Pod` specifications. | ||
When the Topology Manager policy is set to `none`, the relevant containers are pinned to any available CPU without considering NUMA affinity. This is the default behavior and does not optimize for performance-sensitive workloads. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. we usually mean "pinning" as "run on a precise set of resources", so not sure the terminology is best here. "pinned to anything" is something I don't see used much, but I'm also not a native english speaker. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What about:
Or should we keep it vague and instead of specifying CPU say resources? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. "the relevant containers are assigned to run on any available set of CPUs..." seems fine to me |
||
Other values enable the use of topology awareness information from device plugins. The Topology Manager attempts to align the CPU, memory, and device allocations according to the topology of the node when the policy is set to other values than `none`. For more information about the available values, see _Additional resources_. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. device plugins and core resources (cpu, memory) |
||
|
||
The last example pod below runs in the Guaranteed QoS class because requests are equal to limits. | ||
The following example pod runs in the `Guaranteed` QoS class because requests are equal to limits. | ||
|
||
[source,yaml] | ||
---- | ||
|
@@ -53,6 +55,6 @@ spec: | |
example.com/device: "1" | ||
---- | ||
|
||
Topology Manager would consider this pod. The Topology Manager would consult the hint providers, which are CPU Manager and Device Manager, to get topology hints for the pod. | ||
Topology Manager would consider this pod. The Topology Manager would consult the Hint Providers, which are CPU Manager and Device Manager, to get topology hints for the pod. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. CPU Manager, Device Manager and Memory Manager |
||
|
||
Topology Manager will use this information to store the best topology for this container. In the case of this pod, CPU Manager and Device Manager will use this stored information at the resource allocation stage. | ||
Topology Manager will use this information to store the best topology for this container. In the case of this pod, CPU Manager and Device Manager will use this stored information at the resource allocation stage. |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -3,7 +3,7 @@ | |
// * scaling_and_performance/using-topology-manager.adoc | ||
// * post_installation_configuration/node-tasks.adoc | ||
|
||
[id="topology_manager_policies_{context}"] | ||
[id="topology-manager-policies_{context}"] | ||
= Topology Manager policies | ||
|
||
Topology Manager aligns `Pod` resources of all Quality of Service (QoS) classes by collecting topology hints from Hint Providers, such as CPU Manager and Device Manager, and using the collected hints to align the `Pod` resources. | ||
|
@@ -16,15 +16,12 @@ This is the default policy and does not perform any topology alignment. | |
|
||
`best-effort` policy:: | ||
|
||
For each container in a pod with the `best-effort` topology management policy, kubelet calls each Hint Provider to discover their resource | ||
availability. Using this information, the Topology Manager stores the preferred NUMA Node affinity for that container. If the affinity is not preferred, Topology Manager stores this and admits the pod to the node. | ||
For each container in a pod with the `best-effort` topology management policy, kubelet calls each Hint Provider to discover their resource availability. Using this information, the Topology Manager stores the preferred NUMA Node affinity for that container. If the affinity is not preferred, Topology Manager stores this and admits the pod to the node. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. this is technically correct but maybe too low level. The observable behavior of the best-effort policy is that the kubelet will try to align all the required resources on a NUMA node, but if the allocation is impossible (no enough resources) the allocation will spill into other NUMA nodes unpredictably. The pod will always be admitted. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I tried to rephrase it. WDYT?
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The only reason I'm leaving out "unpredictably" is because I feel like we'd need to explain what that means exactly. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Your rephrasing seems fine to me, thanks |
||
|
||
`restricted` policy:: | ||
|
||
For each container in a pod with the `restricted` topology management policy, kubelet calls each Hint Provider to discover their resource | ||
availability. Using this information, the Topology Manager stores the preferred NUMA Node affinity for that container. If the affinity is not | ||
preferred, Topology Manager rejects this pod from the node, resulting in a pod in a `Terminated` state with a pod admission failure. | ||
For each container in a pod with the `restricted` topology management policy, kubelet calls each Hint Provider to discover their resource availability. Using this information, the Topology Manager stores the preferred NUMA Node affinity for that container. If the affinity is not preferred, Topology Manager rejects this pod from the node, resulting in a pod in a `Terminated` state with a pod admission failure. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The observable behavior here is that the kubelet will determine the theoretical minimal number of NUMA nodes that can fullfil the request, and reject the admission if the actual allocation would take more than that number of NUMA nodes; otherwise the pod will go running. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What do you mean that the "pod will go running"? Do you mean that the pod is admitted and it will run/operate? Except for that part, I rephrased it:
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
yes, precisely. |
||
|
||
`single-numa-node` policy:: | ||
|
||
For each container in a pod with the `single-numa-node` topology management policy, kubelet calls each Hint Provider to discover their resource availability. Using this information, the Topology Manager determines if a single NUMA Node affinity is possible. If it is, the pod is admitted to the node. If a single NUMA Node affinity is not possible, the Topology Manager rejects the pod from the node. This results in a pod in a Terminated state with a pod admission failure. | ||
For each container in a pod with the `single-numa-node` topology management policy, kubelet calls each Hint Provider to discover their resource availability. Using this information, the Topology Manager determines if a single NUMA Node affinity is possible. If it is, the pod is admitted to the node. If a single NUMA Node affinity is not possible, the Topology Manager rejects the pod from the node. This results in a pod in a `Terminated` state with a pod admission failure. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The observable behavior is that the kubelet will admit the pod iff all the resources required by the pod itself can be allocated on a same NUMA node. Arguably, its the same as Restricted with There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. PTAL:
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. LGTM |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not sure here. When the topology manager policy is not None, it will indeed try to align all pods, but for pods whose QoS class is not
Guaranteed
, all the alignment logic will degrade in a no-operation. So, yes, we will do all the dance, but the result will be "no pinning, no alignment"