Skip to content

Commit 6b4ccf1

Browse files
committed
Merge tag 'pm-6.13-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull power management fixes from Rafael Wysocki: "Update the documentation of cpuidle governors that does not match the code any more after previous functional changes (Rafael Wysocki) and fix up the cpufreq Kconfig file broken inadvertently by a previous update (Viresh Kumar)" * tag 'pm-6.13-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: cpufreq: Move endif to the end of Kconfig file cpuidle: teo: Update documentation after previous changes cpuidle: menu: Update documentation after previous changes
2 parents 5d5c478 + 3744b08 commit 6b4ccf1

File tree

3 files changed

+81
-86
lines changed

3 files changed

+81
-86
lines changed

Documentation/admin-guide/pm/cpuidle.rst

Lines changed: 31 additions & 41 deletions
Original file line numberDiff line numberDiff line change
@@ -269,27 +269,7 @@ Namely, when invoked to select an idle state for a CPU (i.e. an idle state that
269269
the CPU will ask the processor hardware to enter), it attempts to predict the
270270
idle duration and uses the predicted value for idle state selection.
271271

272-
It first obtains the time until the closest timer event with the assumption
273-
that the scheduler tick will be stopped. That time, referred to as the *sleep
274-
length* in what follows, is the upper bound on the time before the next CPU
275-
wakeup. It is used to determine the sleep length range, which in turn is needed
276-
to get the sleep length correction factor.
277-
278-
The ``menu`` governor maintains two arrays of sleep length correction factors.
279-
One of them is used when tasks previously running on the given CPU are waiting
280-
for some I/O operations to complete and the other one is used when that is not
281-
the case. Each array contains several correction factor values that correspond
282-
to different sleep length ranges organized so that each range represented in the
283-
array is approximately 10 times wider than the previous one.
284-
285-
The correction factor for the given sleep length range (determined before
286-
selecting the idle state for the CPU) is updated after the CPU has been woken
287-
up and the closer the sleep length is to the observed idle duration, the closer
288-
to 1 the correction factor becomes (it must fall between 0 and 1 inclusive).
289-
The sleep length is multiplied by the correction factor for the range that it
290-
falls into to obtain the first approximation of the predicted idle duration.
291-
292-
Next, the governor uses a simple pattern recognition algorithm to refine its
272+
It first uses a simple pattern recognition algorithm to obtain a preliminary
293273
idle duration prediction. Namely, it saves the last 8 observed idle duration
294274
values and, when predicting the idle duration next time, it computes the average
295275
and variance of them. If the variance is small (smaller than 400 square
@@ -301,29 +281,39 @@ Again, if the variance of them is small (in the above sense), the average is
301281
taken as the "typical interval" value and so on, until either the "typical
302282
interval" is determined or too many data points are disregarded, in which case
303283
the "typical interval" is assumed to equal "infinity" (the maximum unsigned
304-
integer value). The "typical interval" computed this way is compared with the
305-
sleep length multiplied by the correction factor and the minimum of the two is
306-
taken as the predicted idle duration.
307-
308-
Then, the governor computes an extra latency limit to help "interactive"
309-
workloads. It uses the observation that if the exit latency of the selected
310-
idle state is comparable with the predicted idle duration, the total time spent
311-
in that state probably will be very short and the amount of energy to save by
312-
entering it will be relatively small, so likely it is better to avoid the
313-
overhead related to entering that state and exiting it. Thus selecting a
314-
shallower state is likely to be a better option then. The first approximation
315-
of the extra latency limit is the predicted idle duration itself which
316-
additionally is divided by a value depending on the number of tasks that
317-
previously ran on the given CPU and now they are waiting for I/O operations to
318-
complete. The result of that division is compared with the latency limit coming
319-
from the power management quality of service, or `PM QoS <cpu-pm-qos_>`_,
320-
framework and the minimum of the two is taken as the limit for the idle states'
321-
exit latency.
284+
integer value).
285+
286+
If the "typical interval" computed this way is long enough, the governor obtains
287+
the time until the closest timer event with the assumption that the scheduler
288+
tick will be stopped. That time, referred to as the *sleep length* in what follows,
289+
is the upper bound on the time before the next CPU wakeup. It is used to determine
290+
the sleep length range, which in turn is needed to get the sleep length correction
291+
factor.
292+
293+
The ``menu`` governor maintains an array containing several correction factor
294+
values that correspond to different sleep length ranges organized so that each
295+
range represented in the array is approximately 10 times wider than the previous
296+
one.
297+
298+
The correction factor for the given sleep length range (determined before
299+
selecting the idle state for the CPU) is updated after the CPU has been woken
300+
up and the closer the sleep length is to the observed idle duration, the closer
301+
to 1 the correction factor becomes (it must fall between 0 and 1 inclusive).
302+
The sleep length is multiplied by the correction factor for the range that it
303+
falls into to obtain an approximation of the predicted idle duration that is
304+
compared to the "typical interval" determined previously and the minimum of
305+
the two is taken as the idle duration prediction.
306+
307+
If the "typical interval" value is small, which means that the CPU is likely
308+
to be woken up soon enough, the sleep length computation is skipped as it may
309+
be costly and the idle duration is simply predicted to equal the "typical
310+
interval" value.
322311

323312
Now, the governor is ready to walk the list of idle states and choose one of
324313
them. For this purpose, it compares the target residency of each state with
325-
the predicted idle duration and the exit latency of it with the computed latency
326-
limit. It selects the state with the target residency closest to the predicted
314+
the predicted idle duration and the exit latency of it with the with the latency
315+
limit coming from the power management quality of service, or `PM QoS <cpu-pm-qos_>`_,
316+
framework. It selects the state with the target residency closest to the predicted
327317
idle duration, but still below it, and exit latency that does not exceed the
328318
limit.
329319

drivers/cpufreq/Kconfig

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -325,8 +325,6 @@ config QORIQ_CPUFREQ
325325
This adds the CPUFreq driver support for Freescale QorIQ SoCs
326326
which are capable of changing the CPU's frequency dynamically.
327327

328-
endif
329-
330328
config ACPI_CPPC_CPUFREQ
331329
tristate "CPUFreq driver based on the ACPI CPPC spec"
332330
depends on ACPI_PROCESSOR
@@ -355,4 +353,6 @@ config ACPI_CPPC_CPUFREQ_FIE
355353

356354
If in doubt, say N.
357355

356+
endif
357+
358358
endmenu

drivers/cpuidle/governors/teo.c

Lines changed: 48 additions & 43 deletions
Original file line numberDiff line numberDiff line change
@@ -10,25 +10,27 @@
1010
* DOC: teo-description
1111
*
1212
* The idea of this governor is based on the observation that on many systems
13-
* timer events are two or more orders of magnitude more frequent than any
14-
* other interrupts, so they are likely to be the most significant cause of CPU
15-
* wakeups from idle states. Moreover, information about what happened in the
16-
* (relatively recent) past can be used to estimate whether or not the deepest
17-
* idle state with target residency within the (known) time till the closest
18-
* timer event, referred to as the sleep length, is likely to be suitable for
19-
* the upcoming CPU idle period and, if not, then which of the shallower idle
20-
* states to choose instead of it.
13+
* timer interrupts are two or more orders of magnitude more frequent than any
14+
* other interrupt types, so they are likely to dominate CPU wakeup patterns.
15+
* Moreover, in principle, the time when the next timer event is going to occur
16+
* can be determined at the idle state selection time, although doing that may
17+
* be costly, so it can be regarded as the most reliable source of information
18+
* for idle state selection.
2119
*
22-
* Of course, non-timer wakeup sources are more important in some use cases
23-
* which can be covered by taking a few most recent idle time intervals of the
24-
* CPU into account. However, even in that context it is not necessary to
25-
* consider idle duration values greater than the sleep length, because the
26-
* closest timer will ultimately wake up the CPU anyway unless it is woken up
27-
* earlier.
20+
* Of course, non-timer wakeup sources are more important in some use cases,
21+
* but even then it is generally unnecessary to consider idle duration values
22+
* greater than the time time till the next timer event, referred as the sleep
23+
* length in what follows, because the closest timer will ultimately wake up the
24+
* CPU anyway unless it is woken up earlier.
2825
*
29-
* Thus this governor estimates whether or not the prospective idle duration of
30-
* a CPU is likely to be significantly shorter than the sleep length and selects
31-
* an idle state for it accordingly.
26+
* However, since obtaining the sleep length may be costly, the governor first
27+
* checks if it can select a shallow idle state using wakeup pattern information
28+
* from recent times, in which case it can do without knowing the sleep length
29+
* at all. For this purpose, it counts CPU wakeup events and looks for an idle
30+
* state whose target residency has not exceeded the idle duration (measured
31+
* after wakeup) in the majority of relevant recent cases. If the target
32+
* residency of that state is small enough, it may be used right away and the
33+
* sleep length need not be determined.
3234
*
3335
* The computations carried out by this governor are based on using bins whose
3436
* boundaries are aligned with the target residency parameter values of the CPU
@@ -39,7 +41,11 @@
3941
* idle state 2, the third bin spans from the target residency of idle state 2
4042
* up to, but not including, the target residency of idle state 3 and so on.
4143
* The last bin spans from the target residency of the deepest idle state
42-
* supplied by the driver to infinity.
44+
* supplied by the driver to the scheduler tick period length or to infinity if
45+
* the tick period length is less than the target residency of that state. In
46+
* the latter case, the governor also counts events with the measured idle
47+
* duration between the tick period length and the target residency of the
48+
* deepest idle state.
4349
*
4450
* Two metrics called "hits" and "intercepts" are associated with each bin.
4551
* They are updated every time before selecting an idle state for the given CPU
@@ -49,47 +55,46 @@
4955
* sleep length and the idle duration measured after CPU wakeup fall into the
5056
* same bin (that is, the CPU appears to wake up "on time" relative to the sleep
5157
* length). In turn, the "intercepts" metric reflects the relative frequency of
52-
* situations in which the measured idle duration is so much shorter than the
53-
* sleep length that the bin it falls into corresponds to an idle state
54-
* shallower than the one whose bin is fallen into by the sleep length (these
55-
* situations are referred to as "intercepts" below).
58+
* non-timer wakeup events for which the measured idle duration falls into a bin
59+
* that corresponds to an idle state shallower than the one whose bin is fallen
60+
* into by the sleep length (these events are also referred to as "intercepts"
61+
* below).
5662
*
5763
* In order to select an idle state for a CPU, the governor takes the following
5864
* steps (modulo the possible latency constraint that must be taken into account
5965
* too):
6066
*
61-
* 1. Find the deepest CPU idle state whose target residency does not exceed
62-
* the current sleep length (the candidate idle state) and compute 2 sums as
63-
* follows:
67+
* 1. Find the deepest enabled CPU idle state (the candidate idle state) and
68+
* compute 2 sums as follows:
6469
*
65-
* - The sum of the "hits" and "intercepts" metrics for the candidate state
66-
* and all of the deeper idle states (it represents the cases in which the
67-
* CPU was idle long enough to avoid being intercepted if the sleep length
68-
* had been equal to the current one).
70+
* - The sum of the "hits" metric for all of the idle states shallower than
71+
* the candidate one (it represents the cases in which the CPU was likely
72+
* woken up by a timer).
6973
*
70-
* - The sum of the "intercepts" metrics for all of the idle states shallower
71-
* than the candidate one (it represents the cases in which the CPU was not
72-
* idle long enough to avoid being intercepted if the sleep length had been
73-
* equal to the current one).
74+
* - The sum of the "intercepts" metric for all of the idle states shallower
75+
* than the candidate one (it represents the cases in which the CPU was
76+
* likely woken up by a non-timer wakeup source).
7477
*
75-
* 2. If the second sum is greater than the first one the CPU is likely to wake
76-
* up early, so look for an alternative idle state to select.
78+
* 2. If the second sum computed in step 1 is greater than a half of the sum of
79+
* both metrics for the candidate state bin and all subsequent bins(if any),
80+
* a shallower idle state is likely to be more suitable, so look for it.
7781
*
78-
* - Traverse the idle states shallower than the candidate one in the
82+
* - Traverse the enabled idle states shallower than the candidate one in the
7983
* descending order.
8084
*
8185
* - For each of them compute the sum of the "intercepts" metrics over all
8286
* of the idle states between it and the candidate one (including the
8387
* former and excluding the latter).
8488
*
85-
* - If each of these sums that needs to be taken into account (because the
86-
* check related to it has indicated that the CPU is likely to wake up
87-
* early) is greater than a half of the corresponding sum computed in step
88-
* 1 (which means that the target residency of the state in question had
89-
* not exceeded the idle duration in over a half of the relevant cases),
90-
* select the given idle state instead of the candidate one.
89+
* - If this sum is greater than a half of the second sum computed in step 1,
90+
* use the given idle state as the new candidate one.
9191
*
92-
* 3. By default, select the candidate state.
92+
* 3. If the current candidate state is state 0 or its target residency is short
93+
* enough, return it and prevent the scheduler tick from being stopped.
94+
*
95+
* 4. Obtain the sleep length value and check if it is below the target
96+
* residency of the current candidate state, in which case a new shallower
97+
* candidate state needs to be found, so look for it.
9398
*/
9499

95100
#include <linux/cpuidle.h>

0 commit comments

Comments
 (0)