@@ -329,6 +329,106 @@ information listed above is the same for all of the processors supporting the
329
329
HWP feature, which is why ``intel_pstate `` works with all of them.]
330
330
331
331
332
+ Support for Hybrid Processors
333
+ =============================
334
+
335
+ Some processors supported by ``intel_pstate `` contain two or more types of CPU
336
+ cores differing by the maximum turbo P-state, performance vs power characteristics,
337
+ cache sizes, and possibly other properties. They are commonly referred to as
338
+ hybrid processors. To support them, ``intel_pstate `` requires HWP to be enabled
339
+ and it assumes the HWP performance units to be the same for all CPUs in the
340
+ system, so a given HWP performance level always represents approximately the
341
+ same physical performance regardless of the core (CPU) type.
342
+
343
+ Hybrid Processors with SMT
344
+ --------------------------
345
+
346
+ On systems where SMT (Simultaneous Multithreading), also referred to as
347
+ HyperThreading (HT) in the context of Intel processors, is enabled on at least
348
+ one core, ``intel_pstate `` assigns performance-based priorities to CPUs. Namely,
349
+ the priority of a given CPU reflects its highest HWP performance level which
350
+ causes the CPU scheduler to generally prefer more performant CPUs, so the less
351
+ performant CPUs are used when the other ones are fully loaded. However, SMT
352
+ siblings (that is, logical CPUs sharing one physical core) are treated in a
353
+ special way such that if one of them is in use, the effective priority of the
354
+ other ones is lowered below the priorities of the CPUs located in the other
355
+ physical cores.
356
+
357
+ This approach maximizes performance in the majority of cases, but unfortunately
358
+ it also leads to excessive energy usage in some important scenarios, like video
359
+ playback, which is not generally desirable. While there is no other viable
360
+ choice with SMT enabled because the effective capacity and utilization of SMT
361
+ siblings are hard to determine, hybrid processors without SMT can be handled in
362
+ more energy-efficient ways.
363
+
364
+ .. _CAS :
365
+
366
+ Capacity-Aware Scheduling Support
367
+ ---------------------------------
368
+
369
+ The capacity-aware scheduling (CAS) support in the CPU scheduler is enabled by
370
+ ``intel_pstate `` by default on hybrid processors without SMT. CAS generally
371
+ causes the scheduler to put tasks on a CPU so long as there is a sufficient
372
+ amount of spare capacity on it, and if the utilization of a given task is too
373
+ high for it, the task will need to go somewhere else.
374
+
375
+ Since CAS takes CPU capacities into account, it does not require CPU
376
+ prioritization and it allows tasks to be distributed more symmetrically among
377
+ the more performant and less performant CPUs. Once placed on a CPU with enough
378
+ capacity to accommodate it, a task may just continue to run there regardless of
379
+ whether or not the other CPUs are fully loaded, so on average CAS reduces the
380
+ utilization of the more performant CPUs which causes the energy usage to be more
381
+ balanced because the more performant CPUs are generally less energy-efficient
382
+ than the less performant ones.
383
+
384
+ In order to use CAS, the scheduler needs to know the capacity of each CPU in
385
+ the system and it needs to be able to compute scale-invariant utilization of
386
+ CPUs, so ``intel_pstate `` provides it with the requisite information.
387
+
388
+ First of all, the capacity of each CPU is represented by the ratio of its highest
389
+ HWP performance level, multiplied by 1024, to the highest HWP performance level
390
+ of the most performant CPU in the system, which works because the HWP performance
391
+ units are the same for all CPUs. Second, the frequency-invariance computations,
392
+ carried out by the scheduler to always express CPU utilization in the same units
393
+ regardless of the frequency it is currently running at, are adjusted to take the
394
+ CPU capacity into account. All of this happens when ``intel_pstate `` has
395
+ registered itself with the ``CPUFreq `` core and it has figured out that it is
396
+ running on a hybrid processor without SMT.
397
+
398
+ Energy-Aware Scheduling Support
399
+ -------------------------------
400
+
401
+ If ``CONFIG_ENERGY_MODEL `` has been set during kernel configuration and
402
+ ``intel_pstate `` runs on a hybrid processor without SMT, in addition to enabling
403
+ `CAS <CAS _>`_ it registers an Energy Model for the processor. This allows the
404
+ Energy-Aware Scheduling (EAS) support to be enabled in the CPU scheduler if
405
+ ``schedutil `` is used as the ``CPUFreq `` governor which requires ``intel_pstate ``
406
+ to operate in the `passive mode <Passive Mode _>`_.
407
+
408
+ The Energy Model registered by ``intel_pstate `` is artificial (that is, it is
409
+ based on abstract cost values and it does not include any real power numbers)
410
+ and it is relatively simple to avoid unnecessary computations in the scheduler.
411
+ There is a performance domain in it for every CPU in the system and the cost
412
+ values for these performance domains have been chosen so that running a task on
413
+ a less performant (small) CPU appears to be always cheaper than running that
414
+ task on a more performant (big) CPU. However, for two CPUs of the same type,
415
+ the cost difference depends on their current utilization, and the CPU whose
416
+ current utilization is higher generally appears to be a more expensive
417
+ destination for a given task. This helps to balance the load among CPUs of the
418
+ same type.
419
+
420
+ Since EAS works on top of CAS, high-utilization tasks are always migrated to
421
+ CPUs with enough capacity to accommodate them, but thanks to EAS, low-utilization
422
+ tasks tend to be placed on the CPUs that look less expensive to the scheduler.
423
+ Effectively, this causes the less performant and less loaded CPUs to be
424
+ preferred as long as they have enough spare capacity to run the given task
425
+ which generally leads to reduced energy usage.
426
+
427
+ The Energy Model created by ``intel_pstate `` can be inspected by looking at
428
+ the ``energy_model `` directory in ``debugfs `` (typlically mounted on
429
+ ``/sys/kernel/debug/ ``).
430
+
431
+
332
432
User Space Interface in ``sysfs ``
333
433
=================================
334
434
@@ -697,8 +797,8 @@ of them have to be prepended with the ``intel_pstate=`` prefix.
697
797
Limits `_ for details).
698
798
699
799
``no_cas ``
700
- Do not enable capacity-aware scheduling (CAS) which is enabled by
701
- default on hybrid systems.
800
+ Do not enable ` capacity-aware scheduling < CAS _>`_ which is enabled by
801
+ default on hybrid systems without SMT .
702
802
703
803
Diagnostics and Tuning
704
804
======================
0 commit comments