Skip to content

Commit f20af84

Browse files
committed
cpufreq: intel_pstate: Document hybrid processor support
Describe the support for hybrid processors in intel_pstate, including the CAS and EAS support, in the admin-guide documentation. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Link: https://patch.msgid.link/1935040.CQOukoFCf9@rjwysocki.net
1 parent 05cf8b8 commit f20af84

File tree

1 file changed

+102
-2
lines changed

1 file changed

+102
-2
lines changed

Documentation/admin-guide/pm/intel_pstate.rst

Lines changed: 102 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -329,6 +329,106 @@ information listed above is the same for all of the processors supporting the
329329
HWP feature, which is why ``intel_pstate`` works with all of them.]
330330

331331

332+
Support for Hybrid Processors
333+
=============================
334+
335+
Some processors supported by ``intel_pstate`` contain two or more types of CPU
336+
cores differing by the maximum turbo P-state, performance vs power characteristics,
337+
cache sizes, and possibly other properties. They are commonly referred to as
338+
hybrid processors. To support them, ``intel_pstate`` requires HWP to be enabled
339+
and it assumes the HWP performance units to be the same for all CPUs in the
340+
system, so a given HWP performance level always represents approximately the
341+
same physical performance regardless of the core (CPU) type.
342+
343+
Hybrid Processors with SMT
344+
--------------------------
345+
346+
On systems where SMT (Simultaneous Multithreading), also referred to as
347+
HyperThreading (HT) in the context of Intel processors, is enabled on at least
348+
one core, ``intel_pstate`` assigns performance-based priorities to CPUs. Namely,
349+
the priority of a given CPU reflects its highest HWP performance level which
350+
causes the CPU scheduler to generally prefer more performant CPUs, so the less
351+
performant CPUs are used when the other ones are fully loaded. However, SMT
352+
siblings (that is, logical CPUs sharing one physical core) are treated in a
353+
special way such that if one of them is in use, the effective priority of the
354+
other ones is lowered below the priorities of the CPUs located in the other
355+
physical cores.
356+
357+
This approach maximizes performance in the majority of cases, but unfortunately
358+
it also leads to excessive energy usage in some important scenarios, like video
359+
playback, which is not generally desirable. While there is no other viable
360+
choice with SMT enabled because the effective capacity and utilization of SMT
361+
siblings are hard to determine, hybrid processors without SMT can be handled in
362+
more energy-efficient ways.
363+
364+
.. _CAS:
365+
366+
Capacity-Aware Scheduling Support
367+
---------------------------------
368+
369+
The capacity-aware scheduling (CAS) support in the CPU scheduler is enabled by
370+
``intel_pstate`` by default on hybrid processors without SMT. CAS generally
371+
causes the scheduler to put tasks on a CPU so long as there is a sufficient
372+
amount of spare capacity on it, and if the utilization of a given task is too
373+
high for it, the task will need to go somewhere else.
374+
375+
Since CAS takes CPU capacities into account, it does not require CPU
376+
prioritization and it allows tasks to be distributed more symmetrically among
377+
the more performant and less performant CPUs. Once placed on a CPU with enough
378+
capacity to accommodate it, a task may just continue to run there regardless of
379+
whether or not the other CPUs are fully loaded, so on average CAS reduces the
380+
utilization of the more performant CPUs which causes the energy usage to be more
381+
balanced because the more performant CPUs are generally less energy-efficient
382+
than the less performant ones.
383+
384+
In order to use CAS, the scheduler needs to know the capacity of each CPU in
385+
the system and it needs to be able to compute scale-invariant utilization of
386+
CPUs, so ``intel_pstate`` provides it with the requisite information.
387+
388+
First of all, the capacity of each CPU is represented by the ratio of its highest
389+
HWP performance level, multiplied by 1024, to the highest HWP performance level
390+
of the most performant CPU in the system, which works because the HWP performance
391+
units are the same for all CPUs. Second, the frequency-invariance computations,
392+
carried out by the scheduler to always express CPU utilization in the same units
393+
regardless of the frequency it is currently running at, are adjusted to take the
394+
CPU capacity into account. All of this happens when ``intel_pstate`` has
395+
registered itself with the ``CPUFreq`` core and it has figured out that it is
396+
running on a hybrid processor without SMT.
397+
398+
Energy-Aware Scheduling Support
399+
-------------------------------
400+
401+
If ``CONFIG_ENERGY_MODEL`` has been set during kernel configuration and
402+
``intel_pstate`` runs on a hybrid processor without SMT, in addition to enabling
403+
`CAS <CAS_>`_ it registers an Energy Model for the processor. This allows the
404+
Energy-Aware Scheduling (EAS) support to be enabled in the CPU scheduler if
405+
``schedutil`` is used as the ``CPUFreq`` governor which requires ``intel_pstate``
406+
to operate in the `passive mode <Passive Mode_>`_.
407+
408+
The Energy Model registered by ``intel_pstate`` is artificial (that is, it is
409+
based on abstract cost values and it does not include any real power numbers)
410+
and it is relatively simple to avoid unnecessary computations in the scheduler.
411+
There is a performance domain in it for every CPU in the system and the cost
412+
values for these performance domains have been chosen so that running a task on
413+
a less performant (small) CPU appears to be always cheaper than running that
414+
task on a more performant (big) CPU. However, for two CPUs of the same type,
415+
the cost difference depends on their current utilization, and the CPU whose
416+
current utilization is higher generally appears to be a more expensive
417+
destination for a given task. This helps to balance the load among CPUs of the
418+
same type.
419+
420+
Since EAS works on top of CAS, high-utilization tasks are always migrated to
421+
CPUs with enough capacity to accommodate them, but thanks to EAS, low-utilization
422+
tasks tend to be placed on the CPUs that look less expensive to the scheduler.
423+
Effectively, this causes the less performant and less loaded CPUs to be
424+
preferred as long as they have enough spare capacity to run the given task
425+
which generally leads to reduced energy usage.
426+
427+
The Energy Model created by ``intel_pstate`` can be inspected by looking at
428+
the ``energy_model`` directory in ``debugfs`` (typlically mounted on
429+
``/sys/kernel/debug/``).
430+
431+
332432
User Space Interface in ``sysfs``
333433
=================================
334434

@@ -697,8 +797,8 @@ of them have to be prepended with the ``intel_pstate=`` prefix.
697797
Limits`_ for details).
698798

699799
``no_cas``
700-
Do not enable capacity-aware scheduling (CAS) which is enabled by
701-
default on hybrid systems.
800+
Do not enable `capacity-aware scheduling <CAS_>`_ which is enabled by
801+
default on hybrid systems without SMT.
702802

703803
Diagnostics and Tuning
704804
======================

0 commit comments

Comments
 (0)