Skip to content

Commit 02b82b0

Browse files
committed
Merge tag 'pm-5.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull power management updates from Rafael Wysocki: "These are mostly fixes and cleanups all over the code and a new piece of documentation for Intel uncore frequency scaling. Functionality-wise, the intel_idle driver will support Sapphire Rapids Xeons natively now (with some extra facilities for controlling C-states more precisely on those systems), virtual guests will take the ACPI S4 hardware signature into account by default, the intel_pstate driver will take the defualt EPP value from the firmware, cpupower utility will support the AMD P-state driver added in the previous cycle, and there is a new tracer utility for that driver. Specifics: - Allow device_pm_check_callbacks() to be called from interrupt context without issues (Dmitry Baryshkov). - Modify devm_pm_runtime_enable() to automatically handle pm_runtime_dont_use_autosuspend() at driver exit time (Douglas Anderson). - Make the schedutil cpufreq governor use to_gov_attr_set() instead of open coding it (Kevin Hao). - Replace acpi_bus_get_device() with acpi_fetch_acpi_dev() in the cpufreq longhaul driver (Rafael Wysocki). - Unify show() and store() naming in cpufreq and make it use __ATTR_XX (Lianjie Zhang). - Make the intel_pstate driver use the EPP value set by the firmware by default (Srinivas Pandruvada). - Re-order the init checks in the powernow-k8 cpufreq driver (Mario Limonciello). - Make the ACPI processor idle driver check for architectural support for LPI to avoid using it on x86 by mistake (Mario Limonciello). - Add Sapphire Rapids Xeon support to the intel_idle driver (Artem Bityutskiy). - Add 'preferred_cstates' module argument to the intel_idle driver to work around C1 and C1E handling issue on Sapphire Rapids (Artem Bityutskiy). - Add core C6 optimization on Sapphire Rapids to the intel_idle driver (Artem Bityutskiy). - Optimize the haltpoll cpuidle driver a bit (Li RongQing). - Remove leftover text from intel_idle() kerneldoc comment and fix up white space in intel_idle (Rafael Wysocki). - Fix load_image_and_restore() error path (Ye Bin). - Fix typos in comments in the system wakeup hadling code (Tom Rix). - Clean up non-kernel-doc comments in hibernation code (Jiapeng Chong). - Fix __setup handler error handling in system-wide suspend and hibernation core code (Randy Dunlap). - Add device name to suspend_report_result() (Youngjin Jang). - Make virtual guests honour ACPI S4 hardware signature by default (David Woodhouse). - Block power off of a parent PM domain unless child is in deepest state (Ulf Hansson). - Use dev_err_probe() to simplify error handling for generic PM domains (Ahmad Fatoum). - Fix sleep-in-atomic bug caused by genpd_debug_remove() (Shawn Guo). - Document Intel uncore frequency scaling (Srinivas Pandruvada). - Add DTPM hierarchy description (Daniel Lezcano). - Change the locking scheme in DTPM (Daniel Lezcano). - Fix dtpm_cpu cleanup at exit time and missing virtual DTPM pointer release (Daniel Lezcano). - Make dtpm_node_callback[] static (kernel test robot). - Fix spelling mistake "initialze" -> "initialize" in dtpm_create_hierarchy() (Colin Ian King). - Add tracer tool for the amd-pstate driver (Jinzhou Su). - Fix PC6 displaying in turbostat on some systems (Artem Bityutskiy). - Add AMD P-State support to the cpupower utility (Huang Rui)" * tag 'pm-5.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (58 commits) cpufreq: powernow-k8: Re-order the init checks cpuidle: intel_idle: Drop redundant backslash at line end cpuidle: intel_idle: Update intel_idle() kerneldoc comment PM: hibernate: Honour ACPI hardware signature by default for virtual guests cpufreq: intel_pstate: Use firmware default EPP cpufreq: unify show() and store() naming and use __ATTR_XX PM: core: keep irq flags in device_pm_check_callbacks() cpuidle: haltpoll: Call cpuidle_poll_state_init() later Documentation: amd-pstate: add tracer tool introduction tools/power/x86/amd_pstate_tracer: Add tracer tool for AMD P-state tools/power/x86/intel_pstate_tracer: make tracer as a module cpufreq: amd-pstate: Add more tracepoint for AMD P-State module PM: sleep: Add device name to suspend_report_result() turbostat: fix PC6 displaying on some systems intel_idle: add core C6 optimization for SPR intel_idle: add 'preferred_cstates' module argument intel_idle: add SPR support PM: runtime: Have devm_pm_runtime_enable() handle pm_runtime_dont_use_autosuspend() ACPI: processor idle: Check for architectural support for LPI cpuidle: PSCI: Move the `has_lpi` check to the beginning of the function ...
2 parents 242ba66 + ec3d8b8 commit 02b82b0

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

63 files changed

+1888
-418
lines changed

Documentation/admin-guide/pm/amd-pstate.rst

Lines changed: 26 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -369,6 +369,32 @@ governor (for the policies it is attached to), or by the ``CPUFreq`` core (for t
369369
policies with other scaling governors).
370370

371371

372+
Tracer Tool
373+
-------------
374+
375+
``amd_pstate_tracer.py`` can record and parse ``amd-pstate`` trace log, then
376+
generate performance plots. This utility can be used to debug and tune the
377+
performance of ``amd-pstate`` driver. The tracer tool needs to import intel
378+
pstate tracer.
379+
380+
Tracer tool located in ``linux/tools/power/x86/amd_pstate_tracer``. It can be
381+
used in two ways. If trace file is available, then directly parse the file
382+
with command ::
383+
384+
./amd_pstate_trace.py [-c cpus] -t <trace_file> -n <test_name>
385+
386+
Or generate trace file with root privilege, then parse and plot with command ::
387+
388+
sudo ./amd_pstate_trace.py [-c cpus] -n <test_name> -i <interval> [-m kbytes]
389+
390+
The test result can be found in ``results/test_name``. Following is the example
391+
about part of the output. ::
392+
393+
common_cpu common_secs common_usecs min_perf des_perf max_perf freq mperf apef tsc load duration_ms sample_num elapsed_time common_comm
394+
CPU_005 712 116384 39 49 166 0.7565 9645075 2214891 38431470 25.1 11.646 469 2.496 kworker/5:0-40
395+
CPU_006 712 116408 39 49 166 0.6769 8950227 1839034 37192089 24.06 11.272 470 2.496 kworker/6:0-1264
396+
397+
372398
Reference
373399
===========
374400

Lines changed: 60 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,60 @@
1+
.. SPDX-License-Identifier: GPL-2.0
2+
.. include:: <isonum.txt>
3+
4+
==============================
5+
Intel Uncore Frequency Scaling
6+
==============================
7+
8+
:Copyright: |copy| 2022 Intel Corporation
9+
10+
:Author: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
11+
12+
Introduction
13+
------------
14+
15+
The uncore can consume significant amount of power in Intel's Xeon servers based
16+
on the workload characteristics. To optimize the total power and improve overall
17+
performance, SoCs have internal algorithms for scaling uncore frequency. These
18+
algorithms monitor workload usage of uncore and set a desirable frequency.
19+
20+
It is possible that users have different expectations of uncore performance and
21+
want to have control over it. The objective is similar to allowing users to set
22+
the scaling min/max frequencies via cpufreq sysfs to improve CPU performance.
23+
Users may have some latency sensitive workloads where they do not want any
24+
change to uncore frequency. Also, users may have workloads which require
25+
different core and uncore performance at distinct phases and they may want to
26+
use both cpufreq and the uncore scaling interface to distribute power and
27+
improve overall performance.
28+
29+
Sysfs Interface
30+
---------------
31+
32+
To control uncore frequency, a sysfs interface is provided in the directory:
33+
`/sys/devices/system/cpu/intel_uncore_frequency/`.
34+
35+
There is one directory for each package and die combination as the scope of
36+
uncore scaling control is per die in multiple die/package SoCs or per
37+
package for single die per package SoCs. The name represents the
38+
scope of control. For example: 'package_00_die_00' is for package id 0 and
39+
die 0.
40+
41+
Each package_*_die_* contains the following attributes:
42+
43+
``initial_max_freq_khz``
44+
Out of reset, this attribute represent the maximum possible frequency.
45+
This is a read-only attribute. If users adjust max_freq_khz,
46+
they can always go back to maximum using the value from this attribute.
47+
48+
``initial_min_freq_khz``
49+
Out of reset, this attribute represent the minimum possible frequency.
50+
This is a read-only attribute. If users adjust min_freq_khz,
51+
they can always go back to minimum using the value from this attribute.
52+
53+
``max_freq_khz``
54+
This attribute is used to set the maximum uncore frequency.
55+
56+
``min_freq_khz``
57+
This attribute is used to set the minimum uncore frequency.
58+
59+
``current_freq_khz``
60+
This attribute is used to get the current uncore frequency.

Documentation/admin-guide/pm/working-state.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -15,3 +15,4 @@ Working-State Power Management
1515
cpufreq_drivers
1616
intel_epb
1717
intel-speed-select
18+
intel_uncore_frequency_scaling

MAINTAINERS

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1002,6 +1002,7 @@ L: linux-pm@vger.kernel.org
10021002
S: Supported
10031003
F: Documentation/admin-guide/pm/amd-pstate.rst
10041004
F: drivers/cpufreq/amd-pstate*
1005+
F: tools/power/x86/amd_pstate_tracer/amd_pstate_trace.py
10051006

10061007
AMD PTDMA DRIVER
10071008
M: Sanjay R Mehta <sanju.mehta@amd.com>

arch/arm64/kernel/cpuidle.c

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -54,16 +54,16 @@ static int psci_acpi_cpu_init_idle(unsigned int cpu)
5454
struct acpi_lpi_state *lpi;
5555
struct acpi_processor *pr = per_cpu(processors, cpu);
5656

57+
if (unlikely(!pr || !pr->flags.has_lpi))
58+
return -EINVAL;
59+
5760
/*
5861
* If the PSCI cpu_suspend function hook has not been initialized
5962
* idle states must not be enabled, so bail out
6063
*/
6164
if (!psci_ops.cpu_suspend)
6265
return -EOPNOTSUPP;
6366

64-
if (unlikely(!pr || !pr->flags.has_lpi))
65-
return -EINVAL;
66-
6767
count = pr->power.count - 1;
6868
if (count <= 0)
6969
return -ENODEV;

arch/x86/kernel/acpi/sleep.c

Lines changed: 21 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -15,6 +15,7 @@
1515
#include <asm/desc.h>
1616
#include <asm/cacheflush.h>
1717
#include <asm/realmode.h>
18+
#include <asm/hypervisor.h>
1819

1920
#include <linux/ftrace.h>
2021
#include "../../realmode/rm/wakeup.h"
@@ -140,9 +141,9 @@ static int __init acpi_sleep_setup(char *str)
140141
acpi_realmode_flags |= 4;
141142
#ifdef CONFIG_HIBERNATION
142143
if (strncmp(str, "s4_hwsig", 8) == 0)
143-
acpi_check_s4_hw_signature(1);
144+
acpi_check_s4_hw_signature = 1;
144145
if (strncmp(str, "s4_nohwsig", 10) == 0)
145-
acpi_check_s4_hw_signature(0);
146+
acpi_check_s4_hw_signature = 0;
146147
#endif
147148
if (strncmp(str, "nonvs", 5) == 0)
148149
acpi_nvs_nosave();
@@ -160,3 +161,21 @@ static int __init acpi_sleep_setup(char *str)
160161
}
161162

162163
__setup("acpi_sleep=", acpi_sleep_setup);
164+
165+
#if defined(CONFIG_HIBERNATION) && defined(CONFIG_HYPERVISOR_GUEST)
166+
static int __init init_s4_sigcheck(void)
167+
{
168+
/*
169+
* If running on a hypervisor, honour the ACPI specification
170+
* by default and trigger a clean reboot when the hardware
171+
* signature in FACS is changed after hibernation.
172+
*/
173+
if (acpi_check_s4_hw_signature == -1 &&
174+
!hypervisor_is_type(X86_HYPER_NATIVE))
175+
acpi_check_s4_hw_signature = 1;
176+
177+
return 0;
178+
}
179+
/* This must happen before acpi_init() which is a subsys initcall */
180+
arch_initcall(init_s4_sigcheck);
181+
#endif

drivers/acpi/processor_idle.c

Lines changed: 10 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1080,6 +1080,11 @@ static int flatten_lpi_states(struct acpi_processor *pr,
10801080
return 0;
10811081
}
10821082

1083+
int __weak acpi_processor_ffh_lpi_probe(unsigned int cpu)
1084+
{
1085+
return -EOPNOTSUPP;
1086+
}
1087+
10831088
static int acpi_processor_get_lpi_info(struct acpi_processor *pr)
10841089
{
10851090
int ret, i;
@@ -1088,6 +1093,11 @@ static int acpi_processor_get_lpi_info(struct acpi_processor *pr)
10881093
struct acpi_device *d = NULL;
10891094
struct acpi_lpi_states_array info[2], *tmp, *prev, *curr;
10901095

1096+
/* make sure our architecture has support */
1097+
ret = acpi_processor_ffh_lpi_probe(pr->id);
1098+
if (ret == -EOPNOTSUPP)
1099+
return ret;
1100+
10911101
if (!osc_pc_lpi_support_confirmed)
10921102
return -EOPNOTSUPP;
10931103

@@ -1139,11 +1149,6 @@ static int acpi_processor_get_lpi_info(struct acpi_processor *pr)
11391149
return 0;
11401150
}
11411151

1142-
int __weak acpi_processor_ffh_lpi_probe(unsigned int cpu)
1143-
{
1144-
return -ENODEV;
1145-
}
1146-
11471152
int __weak acpi_processor_ffh_lpi_enter(struct acpi_lpi_state *lpi)
11481153
{
11491154
return -ENODEV;

drivers/acpi/sleep.c

Lines changed: 3 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -871,12 +871,7 @@ static inline void acpi_sleep_syscore_init(void) {}
871871
#ifdef CONFIG_HIBERNATION
872872
static unsigned long s4_hardware_signature;
873873
static struct acpi_table_facs *facs;
874-
static int sigcheck = -1; /* Default behaviour is just to warn */
875-
876-
void __init acpi_check_s4_hw_signature(int check)
877-
{
878-
sigcheck = check;
879-
}
874+
int acpi_check_s4_hw_signature = -1; /* Default behaviour is just to warn */
880875

881876
static int acpi_hibernation_begin(pm_message_t stage)
882877
{
@@ -1001,7 +996,7 @@ static void acpi_sleep_hibernate_setup(void)
1001996
hibernation_set_ops(old_suspend_ordering ?
1002997
&acpi_hibernation_ops_old : &acpi_hibernation_ops);
1003998
sleep_states[ACPI_STATE_S4] = 1;
1004-
if (!sigcheck)
999+
if (!acpi_check_s4_hw_signature)
10051000
return;
10061001

10071002
acpi_get_table(ACPI_SIG_FACS, 1, (struct acpi_table_header **)&facs);
@@ -1013,7 +1008,7 @@ static void acpi_sleep_hibernate_setup(void)
10131008
*/
10141009
s4_hardware_signature = facs->hardware_signature;
10151010

1016-
if (sigcheck > 0) {
1011+
if (acpi_check_s4_hw_signature > 0) {
10171012
/*
10181013
* If we're actually obeying the ACPI specification
10191014
* then the signature is written out as part of the

drivers/base/power/domain.c

Lines changed: 26 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -636,6 +636,18 @@ static int genpd_power_off(struct generic_pm_domain *genpd, bool one_dev_on,
636636
atomic_read(&genpd->sd_count) > 0)
637637
return -EBUSY;
638638

639+
/*
640+
* The children must be in their deepest (powered-off) states to allow
641+
* the parent to be powered off. Note that, there's no need for
642+
* additional locking, as powering on a child, requires the parent's
643+
* lock to be acquired first.
644+
*/
645+
list_for_each_entry(link, &genpd->parent_links, parent_node) {
646+
struct generic_pm_domain *child = link->child;
647+
if (child->state_idx < child->state_count - 1)
648+
return -EBUSY;
649+
}
650+
639651
list_for_each_entry(pdd, &genpd->dev_list, list_node) {
640652
enum pm_qos_flags_status stat;
641653

@@ -1073,6 +1085,13 @@ static void genpd_sync_power_off(struct generic_pm_domain *genpd, bool use_lock,
10731085
|| atomic_read(&genpd->sd_count) > 0)
10741086
return;
10751087

1088+
/* Check that the children are in their deepest (powered-off) state. */
1089+
list_for_each_entry(link, &genpd->parent_links, parent_node) {
1090+
struct generic_pm_domain *child = link->child;
1091+
if (child->state_idx < child->state_count - 1)
1092+
return;
1093+
}
1094+
10761095
/* Choose the deepest state when suspending */
10771096
genpd->state_idx = genpd->state_count - 1;
10781097
if (_genpd_power_off(genpd, false))
@@ -2058,9 +2077,9 @@ static int genpd_remove(struct generic_pm_domain *genpd)
20582077
kfree(link);
20592078
}
20602079

2061-
genpd_debug_remove(genpd);
20622080
list_del(&genpd->gpd_list_node);
20632081
genpd_unlock(genpd);
2082+
genpd_debug_remove(genpd);
20642083
cancel_work_sync(&genpd->power_off_work);
20652084
if (genpd_is_cpu_domain(genpd))
20662085
free_cpumask_var(genpd->cpus);
@@ -2248,12 +2267,8 @@ int of_genpd_add_provider_simple(struct device_node *np,
22482267
/* Parse genpd OPP table */
22492268
if (genpd->set_performance_state) {
22502269
ret = dev_pm_opp_of_add_table(&genpd->dev);
2251-
if (ret) {
2252-
if (ret != -EPROBE_DEFER)
2253-
dev_err(&genpd->dev, "Failed to add OPP table: %d\n",
2254-
ret);
2255-
return ret;
2256-
}
2270+
if (ret)
2271+
return dev_err_probe(&genpd->dev, ret, "Failed to add OPP table\n");
22572272

22582273
/*
22592274
* Save table for faster processing while setting performance
@@ -2312,9 +2327,8 @@ int of_genpd_add_provider_onecell(struct device_node *np,
23122327
if (genpd->set_performance_state) {
23132328
ret = dev_pm_opp_of_add_table_indexed(&genpd->dev, i);
23142329
if (ret) {
2315-
if (ret != -EPROBE_DEFER)
2316-
dev_err(&genpd->dev, "Failed to add OPP table for index %d: %d\n",
2317-
i, ret);
2330+
dev_err_probe(&genpd->dev, ret,
2331+
"Failed to add OPP table for index %d\n", i);
23182332
goto error;
23192333
}
23202334

@@ -2672,12 +2686,8 @@ static int __genpd_dev_pm_attach(struct device *dev, struct device *base_dev,
26722686
ret = genpd_add_device(pd, dev, base_dev);
26732687
mutex_unlock(&gpd_list_lock);
26742688

2675-
if (ret < 0) {
2676-
if (ret != -EPROBE_DEFER)
2677-
dev_err(dev, "failed to add to PM domain %s: %d",
2678-
pd->name, ret);
2679-
return ret;
2680-
}
2689+
if (ret < 0)
2690+
return dev_err_probe(dev, ret, "failed to add to PM domain %s\n", pd->name);
26812691

26822692
dev->pm_domain->detach = genpd_dev_pm_detach;
26832693
dev->pm_domain->sync = genpd_dev_pm_sync;

drivers/base/power/main.c

Lines changed: 9 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -485,7 +485,7 @@ static int dpm_run_callback(pm_callback_t cb, struct device *dev,
485485
trace_device_pm_callback_start(dev, info, state.event);
486486
error = cb(dev);
487487
trace_device_pm_callback_end(dev, error);
488-
suspend_report_result(cb, error);
488+
suspend_report_result(dev, cb, error);
489489

490490
initcall_debug_report(dev, calltime, cb, error);
491491

@@ -1568,7 +1568,7 @@ static int legacy_suspend(struct device *dev, pm_message_t state,
15681568
trace_device_pm_callback_start(dev, info, state.event);
15691569
error = cb(dev, state);
15701570
trace_device_pm_callback_end(dev, error);
1571-
suspend_report_result(cb, error);
1571+
suspend_report_result(dev, cb, error);
15721572

15731573
initcall_debug_report(dev, calltime, cb, error);
15741574

@@ -1855,7 +1855,7 @@ static int device_prepare(struct device *dev, pm_message_t state)
18551855
device_unlock(dev);
18561856

18571857
if (ret < 0) {
1858-
suspend_report_result(callback, ret);
1858+
suspend_report_result(dev, callback, ret);
18591859
pm_runtime_put(dev);
18601860
return ret;
18611861
}
@@ -1960,10 +1960,10 @@ int dpm_suspend_start(pm_message_t state)
19601960
}
19611961
EXPORT_SYMBOL_GPL(dpm_suspend_start);
19621962

1963-
void __suspend_report_result(const char *function, void *fn, int ret)
1963+
void __suspend_report_result(const char *function, struct device *dev, void *fn, int ret)
19641964
{
19651965
if (ret)
1966-
pr_err("%s(): %pS returns %d\n", function, fn, ret);
1966+
dev_err(dev, "%s(): %pS returns %d\n", function, fn, ret);
19671967
}
19681968
EXPORT_SYMBOL_GPL(__suspend_report_result);
19691969

@@ -2018,7 +2018,9 @@ static bool pm_ops_is_empty(const struct dev_pm_ops *ops)
20182018

20192019
void device_pm_check_callbacks(struct device *dev)
20202020
{
2021-
spin_lock_irq(&dev->power.lock);
2021+
unsigned long flags;
2022+
2023+
spin_lock_irqsave(&dev->power.lock, flags);
20222024
dev->power.no_pm_callbacks =
20232025
(!dev->bus || (pm_ops_is_empty(dev->bus->pm) &&
20242026
!dev->bus->suspend && !dev->bus->resume)) &&
@@ -2027,7 +2029,7 @@ void device_pm_check_callbacks(struct device *dev)
20272029
(!dev->pm_domain || pm_ops_is_empty(&dev->pm_domain->ops)) &&
20282030
(!dev->driver || (pm_ops_is_empty(dev->driver->pm) &&
20292031
!dev->driver->suspend && !dev->driver->resume));
2030-
spin_unlock_irq(&dev->power.lock);
2032+
spin_unlock_irqrestore(&dev->power.lock, flags);
20312033
}
20322034

20332035
bool dev_pm_skip_suspend(struct device *dev)

0 commit comments

Comments
 (0)