Skip to content

Commit c1365ab

Browse files
committed
new assemblies and modules for cluster resoures
1 parent 85ed202 commit c1365ab

24 files changed

+1289
-0
lines changed

_topic_map.yml

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -311,6 +311,12 @@ Topics:
311311
File: nodes-containers-port-forwarding
312312
- Name: Viewing system event information in a cluster
313313
File: nodes-containers-events
314+
- Name: Analyzing cluster resource levels
315+
File: nodes-cluster-resource-levels
316+
- Name: Configuring cluster memory to meet container memory and risk requirements
317+
File: nodes-cluster-resource-configure
318+
- Name: Configuring your cluster to place pods on overcommited nodes
319+
File: nodes-cluster-overcommit
314320
---
315321
Name: Logging
316322
Dir: logging
Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,19 @@
1+
// Module included in the following assemblies:
2+
//
3+
// * nodes/nodes-cluster-overcommit.adoc
4+
5+
[id='nodes-cluster-overcommit-about_{context}']
6+
= Understanding overcommitment in {product-title}
7+
8+
Requests and limits enable administrators to allow and manage the overcommitment of resources on a node. The scheduler uses requests for scheduling your container and providing a minimum service guarantee. Limits constrain the amount of compute resource that may be consumed on your node.
9+
10+
{product-title} administrators can control the level of overcommit and manage container density on nodes by configuring masters to override the ratio between request and limit set on developer containers. In conjunction with a per-project LimitRange specifying limits and defaults, this adjusts the container limit and request to achieve the desired level of overcommit.
11+
12+
[NOTE]
13+
====
14+
That these overrides have no effect if no limits have been set on containers. Create a LimitRange object with default limits (per individual project, or in the project template) in order to ensure that the overrides apply.
15+
====
16+
17+
After these overrides, the container limits and requests must still be validated by any LimitRange objects in the project. It is possible, for example, for developers to specify a limit close to the minimum limit, and have the request then be overridden below the minimum limit, causing the pod to be forbidden. This unfortunate user experience should be addressed with future work, but for now, configure this capability and LimitRanges with caution.
18+
19+
Lines changed: 52 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,52 @@
1+
// Module included in the following assemblies:
2+
//
3+
// * nodes/nodes-cluster-overcommit.adoc
4+
5+
[id='nodes-cluster-overcommit-configure-masters_{context}']
6+
= Configuring masters for overcommitment
7+
8+
{product-title} administrators control overcommit by configuring masters
9+
to override the ratio between request and limit set on developer
10+
containers.
11+
12+
.Prerequisites
13+
14+
Because these overrides have no effect if no limits have
15+
been set on containers, you must create a *LimitRange*
16+
object with default limits, per individual project or in the
17+
project template, in order to ensure that the overrides apply.
18+
19+
20+
.Procedure
21+
22+
To configure a master for overcommit:
23+
24+
. Configuring the `*ClusterResourceOverride*` admission controller in the
25+
*_master-config.yaml_* as in the following example (reuse the existing configuration tree
26+
if it exists, or introduce absent elements as needed):
27+
+
28+
[source,yaml]
29+
----
30+
admissionConfig:
31+
pluginConfig:
32+
ClusterResourceOverride: <1>
33+
configuration:
34+
apiVersion: v1
35+
kind: ClusterResourceOverrideConfig
36+
memoryRequestToLimitPercent: 25 <2>
37+
cpuRequestToLimitPercent: 25 <3>
38+
limitCPUToMemoryPercent: 200 <4>
39+
----
40+
<1> This is the plug-in name; case matters and anything but an exact match for a plug-in name is ignored.
41+
<2> (optional, 1-100) If a container memory limit has been specified or defaulted, the memory request is overridden to this percentage of the limit.
42+
<3> (optional, 1-100) If a container CPU limit has been specified or defaulted, the CPU request is overridden to this percentage of the limit.
43+
<4> (optional, positive integer) If a container memory limit has been specified or defaulted, the CPU limit is overridden to a percentage of the memory limit, with a 100 percentage scaling 1Gi of RAM to equal 1 CPU core. This is processed prior to overriding CPU request (if configured).
44+
45+
. Restart the master service:
46+
+
47+
[source,bash]
48+
----
49+
# master-restart api
50+
# master-restart controllers
51+
----
52+
Lines changed: 43 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,43 @@
1+
// Module included in the following assemblies:
2+
//
3+
// * nodes/nodes-cluster-overcommit.adoc
4+
5+
[id='nodes-cluster-overcommit-configure-nodes_{context}']
6+
7+
= Configuring nodes for overcommitment
8+
9+
In an overcommitted environment, it is important to properly configure your node to provide best system behavior.
10+
11+
When the node starts, it ensures that the kernel tunable flags for memory
12+
management are set properly. The kernel should never fail memory allocations
13+
unless it runs out of physical memory.
14+
15+
To ensure this behavior, the node instructs the kernel to always overcommit
16+
memory:
17+
18+
[source,bash]
19+
----
20+
$ sysctl -w vm.overcommit_memory=1
21+
----
22+
23+
The node also instructs the kernel not to panic when it runs out of memory.
24+
Instead, the kernel OOM killer should kill processes based on priority:
25+
26+
[source,bash]
27+
----
28+
$ sysctl -w vm.panic_on_oom=0
29+
----
30+
31+
[NOTE]
32+
====
33+
The above flags should already be set on nodes, and no further action is
34+
required.
35+
====
36+
37+
You can also perform the following configurations for each node:
38+
39+
* Disable or enforce CPU limits using CPU CFS quotas
40+
41+
* Reserve resources for system processes
42+
43+
* Reserve memory across quality of service tiers
Lines changed: 30 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,30 @@
1+
// Module included in the following assemblies:
2+
//
3+
// * nodes/nodes-cluster-overcommit.adoc
4+
5+
[id='nodes-cluster-overcommit-master-disable_{context}']
6+
= Disabling overcommitment for a project
7+
8+
When configured, overcommitment can be disabled per-project.
9+
For example, you can allow infrastructure components to be configured independently of overcommitment.
10+
11+
.Procedure
12+
13+
To disable overcommitment in a project:
14+
15+
. Edit the project object file
16+
17+
. Add the following annotation:
18+
+
19+
[source,bash]
20+
----
21+
quota.openshift.io/cluster-resource-override-enabled: "false"
22+
----
23+
24+
. Create the project object:
25+
+
26+
[source,bash]
27+
----
28+
oc create -f <file-name>.yaml
29+
----
30+
Lines changed: 25 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,25 @@
1+
// Module included in the following assemblies:
2+
//
3+
// * nodes/nodes-cluster-overcommit.adoc
4+
5+
[id='nodes-cluster-overcommit-disabling-swap_{context}']
6+
= Understanding swap memory and QOS
7+
8+
You can disable swap by across all nodes in your cluster with a single command.
9+
10+
.Procedure
11+
12+
* Run the following command to disable swap:
13+
+
14+
[source,bash]
15+
----
16+
$ swapoff -a
17+
----
18+
19+
* Run the following command to enable swap:
20+
+
21+
[source,bash]
22+
----
23+
$ swapon -a
24+
----
25+
Lines changed: 36 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,36 @@
1+
// Module included in the following assemblies:
2+
//
3+
// * nodes/nodes-cluster-overcommit.adoc
4+
5+
[id='nodes-cluster-node-enforcing_{context}']
6+
7+
= Disabling or enforcing CPU limits using CPU CFS quotas
8+
9+
Nodes by default enforce specified CPU limits using the Completely Fair Scheduler (CFS) quota support in
10+
the Linux kernel.
11+
12+
.Procedure
13+
14+
If you do not want to enforce CPU limits on the node, you can
15+
disable its enforcement by modifying the appropriate node configuration map
16+
to include the following parameters:
17+
18+
[source,yaml]
19+
----
20+
kubeletArguments:
21+
cpu-cfs-quota:
22+
- "false"
23+
----
24+
25+
If CPU limit enforcement is disabled, it is important to understand the impact that will have on your node:
26+
27+
- If a container makes a request for CPU, it will continue to be enforced by CFS
28+
shares in the Linux kernel.
29+
- If a container makes no explicit request for CPU, but it does specify a limit,
30+
the request will default to the specified limit, and be enforced by CFS shares
31+
in the Linux kernel.
32+
- If a container specifies both a request and a limit for CPU, the request will
33+
be enforced by CFS shares in the Linux kernel, and the limit will have no
34+
impact on the node.
35+
36+
Lines changed: 43 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,43 @@
1+
// Module included in the following assemblies:
2+
//
3+
// * nodes/nodes-cluster-overcommit.adoc
4+
5+
[id='nodes-cluster-overcommit-node-memory_{context}']
6+
7+
= Reserving memory across quality of service tiers
8+
9+
You can use the `qos-reserved` parameter to specify a percentage of memory to be reserved
10+
by a pod in a particular QoS level. This feature attempts to reserve requested resources to exclude pods
11+
from lower OoS classes from using resources requested by pods in higher QoS classes.
12+
13+
By reserving resources for higher QOS levels, pods that don't have resource limits are prevented from encroaching on the resources
14+
requested by pods at higher QoS levels.
15+
16+
.Procedure
17+
18+
To configure the `qos-reserved` parameter, edit the appropriate node configuration map.
19+
20+
----
21+
kubeletArguments:
22+
cgroups-per-qos:
23+
- true
24+
cgroup-driver:
25+
- 'systemd'
26+
cgroup-root:
27+
- '/'
28+
qos-reserved: <1>
29+
- 'memory=50%'
30+
----
31+
<1> Specifies how pod resource requests are reserved at the QoS level.
32+
{product-title} uses the `qos-reserved` parameter as follows:
33+
- A value of `qos-reserved=memory=100%` will prevent the `Burstable` and `BestEffort` QOS classes from consuming memory
34+
that was requested by a higher QoS class. This increases the risk of inducing OOM
35+
on `BestEffort` and `Burstable` workloads in favor of increasing memory resource guarantees
36+
for `Guaranteed` and `Burstable` workloads.
37+
- A value of `qos-reserved=memory=50%` will allow the `Burstable` and `BestEffort` QOS classes
38+
to consume half of the memory requested by a higher QoS class.
39+
- A value of `qos-reserved=memory=0%`
40+
will allow a `Burstable` and `BestEffort` QoS classes to consume up to the full node
41+
allocatable amount if available, but increases the risk that a `Guaranteed` workload
42+
will not have access to requested memory. This condition effectively disables this feature.
43+
Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,17 @@
1+
// Module included in the following assemblies:
2+
//
3+
// * nodes/nodes-cluster-overcommit.adoc
4+
5+
[id='nodes-cluster-overcommit-node-resources_{context}']
6+
7+
= Reserving resources for system processes
8+
9+
To provide more reliable scheduling and minimize node resource overcommitment,
10+
each node can reserve a portion of its resources for use by system daemons
11+
that are required to run on your node for your cluster to function (*sshd*, *docker*, etc.).
12+
In particular, it is recommended that you reserve resources for incompressible resources such as memory.
13+
14+
.Procedure
15+
16+
To explicitly reserve resources for non-pod processes, allocate node resources by specifying resources
17+
available for scheduling. See Allocating Node Resources for more details.
Lines changed: 101 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,101 @@
1+
// Module included in the following assemblies:
2+
//
3+
// * nodes/nodes-cluster-overcommit.adoc
4+
5+
[id='nodes-cluster-overcommit-qos-about_{context}']
6+
= Understanding overcomitment and quality of service classes
7+
8+
A node is _overcommitted_ when it has a pod scheduled that makes no request, or
9+
when the sum of limits across all pods on that node exceeds available machine
10+
capacity.
11+
12+
In an overcommitted environment, it is possible that the pods on the node will
13+
attempt to use more compute resource than is available at any given point in
14+
time. When this occurs, the node must give priority to one pod over another. The
15+
facility used to make this decision is referred to as a Quality of Service (QoS)
16+
Class.
17+
18+
For each compute resource, a container is divided into one of three QoS classes
19+
with decreasing order of priority:
20+
21+
.Quality of Service Classes
22+
[options="header",cols="1,1,5"]
23+
|===
24+
|Priority |Class Name |Description
25+
26+
|1 (highest)
27+
|*Guaranteed*
28+
|If limits and optionally requests are set (not equal to 0) for all resources
29+
and they are equal, then the container is classified as *Guaranteed*.
30+
31+
|2
32+
|*Burstable*
33+
|If requests and optionally limits are set (not equal to 0) for all resources,
34+
and they are not equal, then the container is classified as *Burstable*.
35+
36+
|3 (lowest)
37+
|*BestEffort*
38+
|If requests and limits are not set for any of the resources, then the container
39+
is classified as *BestEffort*.
40+
|===
41+
42+
Memory is an incompressible resource, so in low memory situations, containers
43+
that have the lowest priority are terminated first:
44+
45+
- *Guaranteed* containers are considered top priority, and are guaranteed to
46+
only be terminated if they exceed their limits, or if the system is under memory
47+
pressure and there are no lower priority containers that can be evicted.
48+
- *Burstable* containers under system memory pressure are more likely to be
49+
terminated once they exceed their requests and no other *BestEffort* containers
50+
exist.
51+
- *BestEffort* containers are treated with the lowest priority. Processes in
52+
these containers are first to be terminated if the system runs out of memory.
53+
54+
[[nodes-cluster-overcommit-qos-about-reserve]]
55+
== Understanding how to reserve memory across quality of service tiers
56+
57+
You can use the `qos-reserved` parameter to specify a percentage of memory to be reserved
58+
by a pod in a particular QoS level. This feature attempts to reserve requested resources to exclude pods
59+
from lower OoS classes from using resources requested by pods in higher QoS classes.
60+
61+
{product-title} uses the `qos-reserved` parameter as follows:
62+
63+
- A value of `qos-reserved=memory=100%` will prevent the `Burstable` and `BestEffort` QOS classes from consuming memory
64+
that was requested by a higher QoS class. This increases the risk of inducing OOM
65+
on `BestEffort` and `Burstable` workloads in favor of increasing memory resource guarantees
66+
for `Guaranteed` and `Burstable` workloads.
67+
68+
- A value of `qos-reserved=memory=50%` will allow the `Burstable` and `BestEffort` QOS classes
69+
to consume half of the memory requested by a higher QoS class.
70+
71+
- A value of `qos-reserved=memory=0%`
72+
will allow a `Burstable` and `BestEffort` QoS classes to consume up to the full node
73+
allocatable amount if available, but increases the risk that a `Guaranteed` workload
74+
will not have access to requested memory. This condition effectively disables this feature.
75+
76+
[[nodes-cluster-overcommit-qos-about-swap]]
77+
= Understanding swap memory and QOS
78+
79+
You can disable swap by default on your nodes in order to preserve quality of
80+
service (QOS) guarantees. Otherwise, physical resources on a node can oversubscribe,
81+
affecting the resource guarantees the Kubernetes scheduler makes during pod
82+
placement.
83+
84+
For example, if two guaranteed pods have reached their memory limit, each
85+
container could start using swap memory. Eventually, if there is not enough swap
86+
space, processes in the pods can be terminated due to the system being
87+
oversubscribed.
88+
89+
Failing to disable swap results in nodes not recognizing that they are
90+
experiencing *MemoryPressure*, resulting in pods not receiving the memory they
91+
made in their scheduling request. As a result, additional pods are placed on the
92+
node to further increase memory pressure, ultimately increasing your risk of
93+
experiencing a system out of memory (OOM) event.
94+
95+
[IMPORTANT]
96+
====
97+
If swap is enabled, any out-of-resource handling eviction thresholds for available memory will not work as
98+
expected. Take advantage of out-of-resource handling to allow pods to be evicted
99+
from a node when it is under memory pressure, and rescheduled on an alternative
100+
node that has no such pressure.
101+
====

0 commit comments

Comments
 (0)