Skip to content

Commit 7d20aa5

Browse files
committed
Merge tag 'pm-6.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull power management updates from Rafael Wysocki: "These are dominated by cpufreq updates which in turn are dominated by updates related to boost support in the core and drivers and amd-pstate driver optimizations. Apart from the above, there are some cpuidle updates including a rework of the most recent idle intervals handling in the venerable menu governor that leads to significant improvements in some performance benchmarks, as the governor is now more likely to predict a shorter idle duration in some cases, and there are updates of the core device power management code, mostly related to system suspend and resume, that should help to avoid potential issues arising when the drivers of devices depending on one another want to use different optimizations. There is also a usual collection of assorted fixes and cleanups, including removal of some unused code. Specifics: - Manage sysfs attributes and boost frequencies efficiently from cpufreq core to reduce boilerplate code in drivers (Viresh Kumar) - Minor cleanups to cpufreq drivers (Aaron Kling, Benjamin Schneider, Dhananjay Ugwekar, Imran Shaik, zuoqian) - Migrate some cpufreq drivers to using for_each_present_cpu() (Jacky Bai) - cpufreq-qcom-hw DT binding fixes (Krzysztof Kozlowski) - Use str_enable_disable() helper in cpufreq_online() (Lifeng Zheng) - Optimize the amd-pstate driver to avoid cases where call paths end up calling the same writes multiple times and needlessly caching variables through code reorganization, locking overhaul and tracing adjustments (Mario Limonciello, Dhananjay Ugwekar) - Make it possible to avoid enabling capacity-aware scheduling (CAS) in the intel_pstate driver and relocate a check for out-of-band (OOB) platform handling in it to make it detect OOB before checking HWP availability (Rafael Wysocki) - Fix dbs_update() to avoid inadvertent conversions of negative integer values to unsigned int which causes CPU frequency selection to be inaccurate in some cases when the "conservative" cpufreq governor is in use (Jie Zhan) - Update the handling of the most recent idle intervals in the menu cpuidle governor to prevent useful information from being discarded by it in some cases and improve the prediction accuracy (Rafael Wysocki) - Make it possible to tell the intel_idle driver to ignore its built-in table of idle states for the given processor, clean up the handling of auto-demotion disabling on Baytrail and Cherrytrail chips in it, and update its MAINTAINERS entry (David Arcari, Artem Bityutskiy, Rafael Wysocki) - Make some cpuidle drivers use for_each_present_cpu() instead of for_each_possible_cpu() during initialization to avoid issues occurring when nosmp or maxcpus=0 are used (Jacky Bai) - Clean up the Energy Model handling code somewhat (Rafael Wysocki) - Use kfree_rcu() to simplify the handling of runtime Energy Model updates (Li RongQing) - Add an entry for the Energy Model framework to MAINTAINERS as properly maintained (Lukasz Luba) - Address RCU-related sparse warnings in the Energy Model code (Rafael Wysocki) - Remove ENERGY_MODEL dependency on SMP and allow it to be selected when DEVFREQ is set without CPUFREQ so it can be used on a wider range of systems (Jeson Gao) - Unify error handling during runtime suspend and runtime resume in the core to help drivers to implement more consistent runtime PM error handling (Rafael Wysocki) - Drop a redundant check from pm_runtime_force_resume() and rearrange documentation related to __pm_runtime_disable() (Rafael Wysocki) - Rework the handling of the "smart suspend" driver flag in the PM core to avoid issues hat may occur when drivers using it depend on some other drivers and clean up the related PM core code (Rafael Wysocki, Colin Ian King) - Fix the handling of devices with the power.direct_complete flag set if device_suspend() returns an error for at least one device to avoid situations in which some of them may not be resumed (Rafael Wysocki) - Use mutex_trylock() in hibernate_compressor_param_set() to avoid a possible deadlock that may occur if the "compressor" hibernation module parameter is accessed during the registration of a new ieee80211 device (Lizhi Xu) - Suppress sleeping parent warning in device_pm_add() in the case when new children are added under a device with the power.direct_complete set after it has been processed by device_resume() (Xu Yang) - Remove needless return in three void functions related to system wakeup (Zijun Hu) - Replace deprecated kmap_atomic() with kmap_local_page() in the hibernation core code (David Reaver) - Remove unused helper functions related to system sleep (David Alan Gilbert) - Clean up s2idle_enter() so it does not lock and unlock CPU offline in vain and update comments in it (Ulf Hansson) - Clean up broken white space in dpm_wait_for_children() (Geert Uytterhoeven) - Update the cpupower utility to fix lib version-ing in it and memory leaks in error legs, remove hard-coded values, and implement CPU physical core querying (Thomas Renninger, John B. Wyatt IV, Shuah Khan, Yiwei Lin, Zhongqiu Han)" * tag 'pm-6.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (139 commits) PM: sleep: Fix bit masking operation dt-bindings: cpufreq: cpufreq-qcom-hw: Narrow properties on SDX75, SA8775p and SM8650 dt-bindings: cpufreq: cpufreq-qcom-hw: Drop redundant minItems:1 dt-bindings: cpufreq: cpufreq-qcom-hw: Add missing constraint for interrupt-names dt-bindings: cpufreq: cpufreq-qcom-hw: Add QCS8300 compatible cpufreq: Init cpufreq only for present CPUs PM: sleep: Fix handling devices with direct_complete set on errors cpuidle: Init cpuidle only for present CPUs PM: clk: Remove unused pm_clk_remove() PM: sleep: core: Fix indentation in dpm_wait_for_children() PM: s2idle: Extend comment in s2idle_enter() PM: s2idle: Drop redundant locks when entering s2idle PM: sleep: Remove unused pm_generic_ wrappers cpufreq: tegra186: Share policy per cluster cpupower: Make lib versioning scheme more obvious and fix version link PM: EM: Rework the depends on for CONFIG_ENERGY_MODEL PM: EM: Address RCU-related sparse warnings cpupower: Implement CPU physical core querying pm: cpupower: remove hard-coded topology depth values pm: cpupower: Fix cmd_monitor() error legs to free cpu_topology ...
2 parents 21e0ff5 + c5a55e4 commit 7d20aa5

File tree

93 files changed

+1144
-1191
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

93 files changed

+1144
-1191
lines changed

Documentation/admin-guide/kernel-parameters.txt

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2314,6 +2314,9 @@
23142314
per_cpu_perf_limits
23152315
Allow per-logical-CPU P-State performance control limits using
23162316
cpufreq sysfs interface
2317+
no_cas
2318+
Do not enable capacity-aware scheduling (CAS) on
2319+
hybrid systems
23172320

23182321
intremap= [X86-64,Intel-IOMMU,EARLY]
23192322
on enable Interrupt Remapping (default)

Documentation/admin-guide/pm/cpuidle.rst

Lines changed: 17 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -275,20 +275,25 @@ values and, when predicting the idle duration next time, it computes the average
275275
and variance of them. If the variance is small (smaller than 400 square
276276
milliseconds) or it is small relative to the average (the average is greater
277277
that 6 times the standard deviation), the average is regarded as the "typical
278-
interval" value. Otherwise, the longest of the saved observed idle duration
278+
interval" value. Otherwise, either the longest or the shortest (depending on
279+
which one is farther from the average) of the saved observed idle duration
279280
values is discarded and the computation is repeated for the remaining ones.
281+
280282
Again, if the variance of them is small (in the above sense), the average is
281283
taken as the "typical interval" value and so on, until either the "typical
282-
interval" is determined or too many data points are disregarded, in which case
283-
the "typical interval" is assumed to equal "infinity" (the maximum unsigned
284-
integer value).
285-
286-
If the "typical interval" computed this way is long enough, the governor obtains
287-
the time until the closest timer event with the assumption that the scheduler
288-
tick will be stopped. That time, referred to as the *sleep length* in what follows,
289-
is the upper bound on the time before the next CPU wakeup. It is used to determine
290-
the sleep length range, which in turn is needed to get the sleep length correction
291-
factor.
284+
interval" is determined or too many data points are disregarded. In the latter
285+
case, if the size of the set of data points still under consideration is
286+
sufficiently large, the next idle duration is not likely to be above the largest
287+
idle duration value still in that set, so that value is taken as the predicted
288+
next idle duration. Finally, if the set of data points still under
289+
consideration is too small, no prediction is made.
290+
291+
If the preliminary prediction of the next idle duration computed this way is
292+
long enough, the governor obtains the time until the closest timer event with
293+
the assumption that the scheduler tick will be stopped. That time, referred to
294+
as the *sleep length* in what follows, is the upper bound on the time before the
295+
next CPU wakeup. It is used to determine the sleep length range, which in turn
296+
is needed to get the sleep length correction factor.
292297

293298
The ``menu`` governor maintains an array containing several correction factor
294299
values that correspond to different sleep length ranges organized so that each
@@ -302,7 +307,7 @@ to 1 the correction factor becomes (it must fall between 0 and 1 inclusive).
302307
The sleep length is multiplied by the correction factor for the range that it
303308
falls into to obtain an approximation of the predicted idle duration that is
304309
compared to the "typical interval" determined previously and the minimum of
305-
the two is taken as the idle duration prediction.
310+
the two is taken as the final idle duration prediction.
306311

307312
If the "typical interval" value is small, which means that the CPU is likely
308313
to be woken up soon enough, the sleep length computation is skipped as it may

Documentation/admin-guide/pm/intel_idle.rst

Lines changed: 13 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -192,11 +192,19 @@ even if they have been enumerated (see :ref:`cpu-pm-qos` in
192192
Documentation/admin-guide/pm/cpuidle.rst).
193193
Setting ``max_cstate`` to 0 causes the ``intel_idle`` initialization to fail.
194194

195-
The ``no_acpi`` and ``use_acpi`` module parameters (recognized by ``intel_idle``
196-
if the kernel has been configured with ACPI support) can be set to make the
197-
driver ignore the system's ACPI tables entirely or use them for all of the
198-
recognized processor models, respectively (they both are unset by default and
199-
``use_acpi`` has no effect if ``no_acpi`` is set).
195+
The ``no_acpi``, ``use_acpi`` and ``no_native`` module parameters are
196+
recognized by ``intel_idle`` if the kernel has been configured with ACPI
197+
support. In the case that ACPI is not configured these flags have no impact
198+
on functionality.
199+
200+
``no_acpi`` - Do not use ACPI at all. Only native mode is available, no
201+
ACPI mode.
202+
203+
``use_acpi`` - No-op in ACPI mode, the driver will consult ACPI tables for
204+
C-states on/off status in native mode.
205+
206+
``no_native`` - Work only in ACPI mode, no native mode available (ignore
207+
all custom tables).
200208

201209
The value of the ``states_off`` module parameter (0 by default) represents a
202210
list of idle states to be disabled by default in the form of a bitmask.

Documentation/admin-guide/pm/intel_pstate.rst

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -696,6 +696,9 @@ of them have to be prepended with the ``intel_pstate=`` prefix.
696696
Use per-logical-CPU P-State limits (see `Coordination of P-state
697697
Limits`_ for details).
698698

699+
``no_cas``
700+
Do not enable capacity-aware scheduling (CAS) which is enabled by
701+
default on hybrid systems.
699702

700703
Diagnostics and Tuning
701704
======================

Documentation/devicetree/bindings/cpufreq/cpufreq-qcom-hw.yaml

Lines changed: 31 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -34,6 +34,7 @@ properties:
3434
- description: v2 of CPUFREQ HW (EPSS)
3535
items:
3636
- enum:
37+
- qcom,qcs8300-cpufreq-epss
3738
- qcom,qdu1000-cpufreq-epss
3839
- qcom,sa8255p-cpufreq-epss
3940
- qcom,sa8775p-cpufreq-epss
@@ -111,22 +112,20 @@ allOf:
111112
enum:
112113
- qcom,qcm2290-cpufreq-hw
113114
- qcom,sar2130p-cpufreq-epss
115+
- qcom,sdx75-cpufreq-epss
114116
then:
115117
properties:
116118
reg:
117-
minItems: 1
118119
maxItems: 1
119120

120121
reg-names:
121-
minItems: 1
122122
maxItems: 1
123123

124124
interrupts:
125-
minItems: 1
126125
maxItems: 1
127126

128127
interrupt-names:
129-
minItems: 1
128+
maxItems: 1
130129

131130
- if:
132131
properties:
@@ -135,6 +134,7 @@ allOf:
135134
enum:
136135
- qcom,qdu1000-cpufreq-epss
137136
- qcom,sa8255p-cpufreq-epss
137+
- qcom,sa8775p-cpufreq-epss
138138
- qcom,sc7180-cpufreq-hw
139139
- qcom,sc8180x-cpufreq-hw
140140
- qcom,sc8280xp-cpufreq-epss
@@ -160,12 +160,14 @@ allOf:
160160

161161
interrupt-names:
162162
minItems: 2
163+
maxItems: 2
163164

164165
- if:
165166
properties:
166167
compatible:
167168
contains:
168169
enum:
170+
- qcom,qcs8300-cpufreq-epss
169171
- qcom,sc7280-cpufreq-epss
170172
- qcom,sm8250-cpufreq-epss
171173
- qcom,sm8350-cpufreq-epss
@@ -187,6 +189,7 @@ allOf:
187189

188190
interrupt-names:
189191
minItems: 3
192+
maxItems: 3
190193

191194
- if:
192195
properties:
@@ -211,7 +214,31 @@ allOf:
211214

212215
interrupt-names:
213216
minItems: 2
217+
maxItems: 2
214218

219+
- if:
220+
properties:
221+
compatible:
222+
contains:
223+
enum:
224+
- qcom,sm8650-cpufreq-epss
225+
then:
226+
properties:
227+
reg:
228+
minItems: 4
229+
maxItems: 4
230+
231+
reg-names:
232+
minItems: 4
233+
maxItems: 4
234+
235+
interrupts:
236+
minItems: 4
237+
maxItems: 4
238+
239+
interrupt-names:
240+
minItems: 4
241+
maxItems: 4
215242

216243
examples:
217244
- |

MAINTAINERS

Lines changed: 14 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -8541,6 +8541,15 @@ M: Maxim Levitsky <maximlevitsky@gmail.com>
85418541
S: Maintained
85428542
F: drivers/media/rc/ene_ir.*
85438543

8544+
ENERGY MODEL
8545+
M: Lukasz Luba <lukasz.luba@arm.com>
8546+
M: "Rafael J. Wysocki" <rafael@kernel.org>
8547+
L: linux-pm@vger.kernel.org
8548+
S: Maintained
8549+
F: kernel/power/energy_model.c
8550+
F: include/linux/energy_model.h
8551+
F: Documentation/power/energy-model.rst
8552+
85448553
EPAPR HYPERVISOR BYTE CHANNEL DEVICE DRIVER
85458554
M: Laurentiu Tudor <laurentiu.tudor@nxp.com>
85468555
L: linuxppc-dev@lists.ozlabs.org
@@ -11690,12 +11699,14 @@ F: Documentation/driver-api/crypto/iaa/iaa-crypto.rst
1169011699
F: drivers/crypto/intel/iaa/*
1169111700

1169211701
INTEL IDLE DRIVER
11693-
M: Jacob Pan <jacob.jun.pan@linux.intel.com>
11694-
M: Len Brown <lenb@kernel.org>
11702+
M: Rafael J. Wysocki <rafael@kernel.org>
11703+
M: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
11704+
M: Artem Bityutskiy <dedekind1@gmail.com>
11705+
R: Len Brown <lenb@kernel.org>
1169511706
L: linux-pm@vger.kernel.org
1169611707
S: Supported
1169711708
B: https://bugzilla.kernel.org
11698-
T: git git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux.git
11709+
T: git git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git
1169911710
F: drivers/idle/intel_idle.c
1170011711

1170111712
INTEL IDXD DRIVER

arch/x86/include/asm/msr-index.h

Lines changed: 11 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -703,15 +703,17 @@
703703
#define MSR_AMD_CPPC_REQ 0xc00102b3
704704
#define MSR_AMD_CPPC_STATUS 0xc00102b4
705705

706-
#define AMD_CPPC_LOWEST_PERF(x) (((x) >> 0) & 0xff)
707-
#define AMD_CPPC_LOWNONLIN_PERF(x) (((x) >> 8) & 0xff)
708-
#define AMD_CPPC_NOMINAL_PERF(x) (((x) >> 16) & 0xff)
709-
#define AMD_CPPC_HIGHEST_PERF(x) (((x) >> 24) & 0xff)
710-
711-
#define AMD_CPPC_MAX_PERF(x) (((x) & 0xff) << 0)
712-
#define AMD_CPPC_MIN_PERF(x) (((x) & 0xff) << 8)
713-
#define AMD_CPPC_DES_PERF(x) (((x) & 0xff) << 16)
714-
#define AMD_CPPC_ENERGY_PERF_PREF(x) (((x) & 0xff) << 24)
706+
/* Masks for use with MSR_AMD_CPPC_CAP1 */
707+
#define AMD_CPPC_LOWEST_PERF_MASK GENMASK(7, 0)
708+
#define AMD_CPPC_LOWNONLIN_PERF_MASK GENMASK(15, 8)
709+
#define AMD_CPPC_NOMINAL_PERF_MASK GENMASK(23, 16)
710+
#define AMD_CPPC_HIGHEST_PERF_MASK GENMASK(31, 24)
711+
712+
/* Masks for use with MSR_AMD_CPPC_REQ */
713+
#define AMD_CPPC_MAX_PERF_MASK GENMASK(7, 0)
714+
#define AMD_CPPC_MIN_PERF_MASK GENMASK(15, 8)
715+
#define AMD_CPPC_DES_PERF_MASK GENMASK(23, 16)
716+
#define AMD_CPPC_EPP_PERF_MASK GENMASK(31, 24)
715717

716718
/* AMD Performance Counter Global Status and Control MSRs */
717719
#define MSR_AMD64_PERF_CNTR_GLOBAL_STATUS 0xc0000300

arch/x86/kernel/acpi/cppc.c

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -151,7 +151,7 @@ int amd_get_highest_perf(unsigned int cpu, u32 *highest_perf)
151151
if (ret)
152152
goto out;
153153

154-
val = AMD_CPPC_HIGHEST_PERF(val);
154+
val = FIELD_GET(AMD_CPPC_HIGHEST_PERF_MASK, val);
155155
} else {
156156
ret = cppc_get_highest_perf(cpu, &val);
157157
if (ret)

drivers/acpi/device_pm.c

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1161,7 +1161,7 @@ EXPORT_SYMBOL_GPL(acpi_subsys_complete);
11611161
*/
11621162
int acpi_subsys_suspend(struct device *dev)
11631163
{
1164-
if (!dev_pm_test_driver_flags(dev, DPM_FLAG_SMART_SUSPEND) ||
1164+
if (!dev_pm_smart_suspend(dev) ||
11651165
acpi_dev_needs_resume(dev, ACPI_COMPANION(dev)))
11661166
pm_runtime_resume(dev);
11671167

@@ -1320,7 +1320,7 @@ EXPORT_SYMBOL_GPL(acpi_subsys_restore_early);
13201320
*/
13211321
int acpi_subsys_poweroff(struct device *dev)
13221322
{
1323-
if (!dev_pm_test_driver_flags(dev, DPM_FLAG_SMART_SUSPEND) ||
1323+
if (!dev_pm_smart_suspend(dev) ||
13241324
acpi_dev_needs_resume(dev, ACPI_COMPANION(dev)))
13251325
pm_runtime_resume(dev);
13261326

drivers/base/power/clock_ops.c

Lines changed: 0 additions & 73 deletions
Original file line numberDiff line numberDiff line change
@@ -259,39 +259,6 @@ int pm_clk_add_clk(struct device *dev, struct clk *clk)
259259
}
260260
EXPORT_SYMBOL_GPL(pm_clk_add_clk);
261261

262-
263-
/**
264-
* of_pm_clk_add_clk - Start using a device clock for power management.
265-
* @dev: Device whose clock is going to be used for power management.
266-
* @name: Name of clock that is going to be used for power management.
267-
*
268-
* Add the clock described in the 'clocks' device-tree node that matches
269-
* with the 'name' provided, to the list of clocks used for the power
270-
* management of @dev. On success, returns 0. Returns a negative error
271-
* code if the clock is not found or cannot be added.
272-
*/
273-
int of_pm_clk_add_clk(struct device *dev, const char *name)
274-
{
275-
struct clk *clk;
276-
int ret;
277-
278-
if (!dev || !dev->of_node || !name)
279-
return -EINVAL;
280-
281-
clk = of_clk_get_by_name(dev->of_node, name);
282-
if (IS_ERR(clk))
283-
return PTR_ERR(clk);
284-
285-
ret = pm_clk_add_clk(dev, clk);
286-
if (ret) {
287-
clk_put(clk);
288-
return ret;
289-
}
290-
291-
return 0;
292-
}
293-
EXPORT_SYMBOL_GPL(of_pm_clk_add_clk);
294-
295262
/**
296263
* of_pm_clk_add_clks - Start using device clock(s) for power management.
297264
* @dev: Device whose clock(s) is going to be used for power management.
@@ -376,46 +343,6 @@ static void __pm_clk_remove(struct pm_clock_entry *ce)
376343
kfree(ce);
377344
}
378345

379-
/**
380-
* pm_clk_remove - Stop using a device clock for power management.
381-
* @dev: Device whose clock should not be used for PM any more.
382-
* @con_id: Connection ID of the clock.
383-
*
384-
* Remove the clock represented by @con_id from the list of clocks used for
385-
* the power management of @dev.
386-
*/
387-
void pm_clk_remove(struct device *dev, const char *con_id)
388-
{
389-
struct pm_subsys_data *psd = dev_to_psd(dev);
390-
struct pm_clock_entry *ce;
391-
392-
if (!psd)
393-
return;
394-
395-
pm_clk_list_lock(psd);
396-
397-
list_for_each_entry(ce, &psd->clock_list, node) {
398-
if (!con_id && !ce->con_id)
399-
goto remove;
400-
else if (!con_id || !ce->con_id)
401-
continue;
402-
else if (!strcmp(con_id, ce->con_id))
403-
goto remove;
404-
}
405-
406-
pm_clk_list_unlock(psd);
407-
return;
408-
409-
remove:
410-
list_del(&ce->node);
411-
if (ce->enabled_when_prepared)
412-
psd->clock_op_might_sleep--;
413-
pm_clk_list_unlock(psd);
414-
415-
__pm_clk_remove(ce);
416-
}
417-
EXPORT_SYMBOL_GPL(pm_clk_remove);
418-
419346
/**
420347
* pm_clk_remove_clk - Stop using a device clock for power management.
421348
* @dev: Device whose clock should not be used for PM any more.

0 commit comments

Comments
 (0)