Skip to content

Commit 923c256

Browse files
committed
Merge branches 'pm-cpuidle' and 'pm-em'
Merge cpuidle and Energy Model changes for 6.13-rc1: - Add a built-in idle states table for Granite Rapids Xeon D to the intel_idle driver (Artem Bityutskiy). - Fix some typos in comments in the cpuidle core and drivers (Shen Lichuan). - Remove iowait influence from the menu cpuidle governor (Christian Loehle). - Add min/max available performance state limits to the Energy Model management code (Lukasz Luba). * pm-cpuidle: intel_idle: add Granite Rapids Xeon D support cpuidle: Correct some typos in comments cpuidle: menu: Remove iowait influence * pm-em: PM: EM: Add min/max available performance state limits
3 parents a8aaea4 + f557e0d + 5609296 commit 923c256

File tree

8 files changed

+135
-80
lines changed

8 files changed

+135
-80
lines changed

drivers/cpuidle/cpuidle-arm.c

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -139,7 +139,7 @@ static int __init arm_idle_init_cpu(int cpu)
139139
*
140140
* Initializes arm cpuidle driver for all CPUs, if any CPU fails
141141
* to register cpuidle driver then rollback to cancel all CPUs
142-
* registeration.
142+
* registration.
143143
*/
144144
static int __init arm_idle_init(void)
145145
{

drivers/cpuidle/cpuidle-qcom-spm.c

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -48,7 +48,7 @@ static int qcom_cpu_spc(struct spm_driver_data *drv)
4848
ret = cpu_suspend(0, qcom_pm_collapse);
4949
/*
5050
* ARM common code executes WFI without calling into our driver and
51-
* if the SPM mode is not reset, then we may accidently power down the
51+
* if the SPM mode is not reset, then we may accidentally power down the
5252
* cpu when we intended only to gate the cpu clock.
5353
* Ensure the state is set to standby before returning.
5454
*/

drivers/cpuidle/cpuidle.c

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -406,7 +406,7 @@ void cpuidle_reflect(struct cpuidle_device *dev, int index)
406406
* Min polling interval of 10usec is a guess. It is assuming that
407407
* for most users, the time for a single ping-pong workload like
408408
* perf bench pipe would generally complete within 10usec but
409-
* this is hardware dependant. Actual time can be estimated with
409+
* this is hardware dependent. Actual time can be estimated with
410410
*
411411
* perf bench sched pipe -l 10000
412412
*

drivers/cpuidle/driver.c

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -261,7 +261,7 @@ static void __cpuidle_unregister_driver(struct cpuidle_driver *drv)
261261
* @drv: a pointer to a valid struct cpuidle_driver
262262
*
263263
* Register the driver under a lock to prevent concurrent attempts to
264-
* [un]register the driver from occuring at the same time.
264+
* [un]register the driver from occurring at the same time.
265265
*
266266
* Returns 0 on success, a negative error code (returned by
267267
* __cpuidle_register_driver()) otherwise.
@@ -296,7 +296,7 @@ EXPORT_SYMBOL_GPL(cpuidle_register_driver);
296296
* @drv: a pointer to a valid struct cpuidle_driver
297297
*
298298
* Unregisters the cpuidle driver under a lock to prevent concurrent attempts
299-
* to [un]register the driver from occuring at the same time. @drv has to
299+
* to [un]register the driver from occurring at the same time. @drv has to
300300
* match the currently registered driver.
301301
*/
302302
void cpuidle_unregister_driver(struct cpuidle_driver *drv)

drivers/cpuidle/governors/menu.c

Lines changed: 9 additions & 67 deletions
Original file line numberDiff line numberDiff line change
@@ -19,7 +19,7 @@
1919

2020
#include "gov.h"
2121

22-
#define BUCKETS 12
22+
#define BUCKETS 6
2323
#define INTERVAL_SHIFT 3
2424
#define INTERVALS (1UL << INTERVAL_SHIFT)
2525
#define RESOLUTION 1024
@@ -29,12 +29,11 @@
2929
/*
3030
* Concepts and ideas behind the menu governor
3131
*
32-
* For the menu governor, there are 3 decision factors for picking a C
32+
* For the menu governor, there are 2 decision factors for picking a C
3333
* state:
3434
* 1) Energy break even point
35-
* 2) Performance impact
36-
* 3) Latency tolerance (from pmqos infrastructure)
37-
* These three factors are treated independently.
35+
* 2) Latency tolerance (from pmqos infrastructure)
36+
* These two factors are treated independently.
3837
*
3938
* Energy break even point
4039
* -----------------------
@@ -75,30 +74,6 @@
7574
* intervals and if the stand deviation of these 8 intervals is below a
7675
* threshold value, we use the average of these intervals as prediction.
7776
*
78-
* Limiting Performance Impact
79-
* ---------------------------
80-
* C states, especially those with large exit latencies, can have a real
81-
* noticeable impact on workloads, which is not acceptable for most sysadmins,
82-
* and in addition, less performance has a power price of its own.
83-
*
84-
* As a general rule of thumb, menu assumes that the following heuristic
85-
* holds:
86-
* The busier the system, the less impact of C states is acceptable
87-
*
88-
* This rule-of-thumb is implemented using a performance-multiplier:
89-
* If the exit latency times the performance multiplier is longer than
90-
* the predicted duration, the C state is not considered a candidate
91-
* for selection due to a too high performance impact. So the higher
92-
* this multiplier is, the longer we need to be idle to pick a deep C
93-
* state, and thus the less likely a busy CPU will hit such a deep
94-
* C state.
95-
*
96-
* Currently there is only one value determining the factor:
97-
* 10 points are added for each process that is waiting for IO on this CPU.
98-
* (This value was experimentally determined.)
99-
* Utilization is no longer a factor as it was shown that it never contributed
100-
* significantly to the performance multiplier in the first place.
101-
*
10277
*/
10378

10479
struct menu_device {
@@ -112,19 +87,10 @@ struct menu_device {
11287
int interval_ptr;
11388
};
11489

115-
static inline int which_bucket(u64 duration_ns, unsigned int nr_iowaiters)
90+
static inline int which_bucket(u64 duration_ns)
11691
{
11792
int bucket = 0;
11893

119-
/*
120-
* We keep two groups of stats; one with no
121-
* IO pending, one without.
122-
* This allows us to calculate
123-
* E(duration)|iowait
124-
*/
125-
if (nr_iowaiters)
126-
bucket = BUCKETS/2;
127-
12894
if (duration_ns < 10ULL * NSEC_PER_USEC)
12995
return bucket;
13096
if (duration_ns < 100ULL * NSEC_PER_USEC)
@@ -138,19 +104,6 @@ static inline int which_bucket(u64 duration_ns, unsigned int nr_iowaiters)
138104
return bucket + 5;
139105
}
140106

141-
/*
142-
* Return a multiplier for the exit latency that is intended
143-
* to take performance requirements into account.
144-
* The more performance critical we estimate the system
145-
* to be, the higher this multiplier, and thus the higher
146-
* the barrier to go to an expensive C state.
147-
*/
148-
static inline int performance_multiplier(unsigned int nr_iowaiters)
149-
{
150-
/* for IO wait tasks (per cpu!) we add 10x each */
151-
return 1 + 10 * nr_iowaiters;
152-
}
153-
154107
static DEFINE_PER_CPU(struct menu_device, menu_devices);
155108

156109
static void menu_update(struct cpuidle_driver *drv, struct cpuidle_device *dev);
@@ -258,8 +211,6 @@ static int menu_select(struct cpuidle_driver *drv, struct cpuidle_device *dev,
258211
struct menu_device *data = this_cpu_ptr(&menu_devices);
259212
s64 latency_req = cpuidle_governor_latency_req(dev->cpu);
260213
u64 predicted_ns;
261-
u64 interactivity_req;
262-
unsigned int nr_iowaiters;
263214
ktime_t delta, delta_tick;
264215
int i, idx;
265216

@@ -268,8 +219,6 @@ static int menu_select(struct cpuidle_driver *drv, struct cpuidle_device *dev,
268219
data->needs_update = 0;
269220
}
270221

271-
nr_iowaiters = nr_iowait_cpu(dev->cpu);
272-
273222
/* Find the shortest expected idle interval. */
274223
predicted_ns = get_typical_interval(data) * NSEC_PER_USEC;
275224
if (predicted_ns > RESIDENCY_THRESHOLD_NS) {
@@ -283,7 +232,7 @@ static int menu_select(struct cpuidle_driver *drv, struct cpuidle_device *dev,
283232
}
284233

285234
data->next_timer_ns = delta;
286-
data->bucket = which_bucket(data->next_timer_ns, nr_iowaiters);
235+
data->bucket = which_bucket(data->next_timer_ns);
287236

288237
/* Round up the result for half microseconds. */
289238
timer_us = div_u64((RESOLUTION * DECAY * NSEC_PER_USEC) / 2 +
@@ -301,7 +250,7 @@ static int menu_select(struct cpuidle_driver *drv, struct cpuidle_device *dev,
301250
*/
302251
data->next_timer_ns = KTIME_MAX;
303252
delta_tick = TICK_NSEC / 2;
304-
data->bucket = which_bucket(KTIME_MAX, nr_iowaiters);
253+
data->bucket = which_bucket(KTIME_MAX);
305254
}
306255

307256
if (unlikely(drv->state_count <= 1 || latency_req == 0) ||
@@ -328,15 +277,8 @@ static int menu_select(struct cpuidle_driver *drv, struct cpuidle_device *dev,
328277
*/
329278
if (predicted_ns < TICK_NSEC)
330279
predicted_ns = data->next_timer_ns;
331-
} else {
332-
/*
333-
* Use the performance multiplier and the user-configurable
334-
* latency_req to determine the maximum exit latency.
335-
*/
336-
interactivity_req = div64_u64(predicted_ns,
337-
performance_multiplier(nr_iowaiters));
338-
if (latency_req > interactivity_req)
339-
latency_req = interactivity_req;
280+
} else if (latency_req > predicted_ns) {
281+
latency_req = predicted_ns;
340282
}
341283

342284
/*

drivers/idle/intel_idle.c

Lines changed: 48 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1069,6 +1069,47 @@ static struct cpuidle_state gnr_cstates[] __initdata = {
10691069
.enter = NULL }
10701070
};
10711071

1072+
static struct cpuidle_state gnrd_cstates[] __initdata = {
1073+
{
1074+
.name = "C1",
1075+
.desc = "MWAIT 0x00",
1076+
.flags = MWAIT2flg(0x00),
1077+
.exit_latency = 1,
1078+
.target_residency = 1,
1079+
.enter = &intel_idle,
1080+
.enter_s2idle = intel_idle_s2idle, },
1081+
{
1082+
.name = "C1E",
1083+
.desc = "MWAIT 0x01",
1084+
.flags = MWAIT2flg(0x01) | CPUIDLE_FLAG_ALWAYS_ENABLE,
1085+
.exit_latency = 4,
1086+
.target_residency = 4,
1087+
.enter = &intel_idle,
1088+
.enter_s2idle = intel_idle_s2idle, },
1089+
{
1090+
.name = "C6",
1091+
.desc = "MWAIT 0x20",
1092+
.flags = MWAIT2flg(0x20) | CPUIDLE_FLAG_TLB_FLUSHED |
1093+
CPUIDLE_FLAG_INIT_XSTATE |
1094+
CPUIDLE_FLAG_PARTIAL_HINT_MATCH,
1095+
.exit_latency = 220,
1096+
.target_residency = 650,
1097+
.enter = &intel_idle,
1098+
.enter_s2idle = intel_idle_s2idle, },
1099+
{
1100+
.name = "C6P",
1101+
.desc = "MWAIT 0x21",
1102+
.flags = MWAIT2flg(0x21) | CPUIDLE_FLAG_TLB_FLUSHED |
1103+
CPUIDLE_FLAG_INIT_XSTATE |
1104+
CPUIDLE_FLAG_PARTIAL_HINT_MATCH,
1105+
.exit_latency = 240,
1106+
.target_residency = 750,
1107+
.enter = &intel_idle,
1108+
.enter_s2idle = intel_idle_s2idle, },
1109+
{
1110+
.enter = NULL }
1111+
};
1112+
10721113
static struct cpuidle_state atom_cstates[] __initdata = {
10731114
{
10741115
.name = "C1E",
@@ -1508,6 +1549,12 @@ static const struct idle_cpu idle_cpu_gnr __initconst = {
15081549
.use_acpi = true,
15091550
};
15101551

1552+
static const struct idle_cpu idle_cpu_gnrd __initconst = {
1553+
.state_table = gnrd_cstates,
1554+
.disable_promotion_to_c1e = true,
1555+
.use_acpi = true,
1556+
};
1557+
15111558
static const struct idle_cpu idle_cpu_avn __initconst = {
15121559
.state_table = avn_cstates,
15131560
.disable_promotion_to_c1e = true,
@@ -1593,6 +1640,7 @@ static const struct x86_cpu_id intel_idle_ids[] __initconst = {
15931640
X86_MATCH_VFM(INTEL_SAPPHIRERAPIDS_X, &idle_cpu_spr),
15941641
X86_MATCH_VFM(INTEL_EMERALDRAPIDS_X, &idle_cpu_spr),
15951642
X86_MATCH_VFM(INTEL_GRANITERAPIDS_X, &idle_cpu_gnr),
1643+
X86_MATCH_VFM(INTEL_GRANITERAPIDS_D, &idle_cpu_gnrd),
15961644
X86_MATCH_VFM(INTEL_XEON_PHI_KNL, &idle_cpu_knl),
15971645
X86_MATCH_VFM(INTEL_XEON_PHI_KNM, &idle_cpu_knl),
15981646
X86_MATCH_VFM(INTEL_ATOM_GOLDMONT, &idle_cpu_bxt),

include/linux/energy_model.h

Lines changed: 21 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -55,6 +55,8 @@ struct em_perf_table {
5555
* struct em_perf_domain - Performance domain
5656
* @em_table: Pointer to the runtime modifiable em_perf_table
5757
* @nr_perf_states: Number of performance states
58+
* @min_perf_state: Minimum allowed Performance State index
59+
* @max_perf_state: Maximum allowed Performance State index
5860
* @flags: See "em_perf_domain flags"
5961
* @cpus: Cpumask covering the CPUs of the domain. It's here
6062
* for performance reasons to avoid potential cache
@@ -70,6 +72,8 @@ struct em_perf_table {
7072
struct em_perf_domain {
7173
struct em_perf_table __rcu *em_table;
7274
int nr_perf_states;
75+
int min_perf_state;
76+
int max_perf_state;
7377
unsigned long flags;
7478
unsigned long cpus[];
7579
};
@@ -173,13 +177,14 @@ void em_table_free(struct em_perf_table __rcu *table);
173177
int em_dev_compute_costs(struct device *dev, struct em_perf_state *table,
174178
int nr_states);
175179
int em_dev_update_chip_binning(struct device *dev);
180+
int em_update_performance_limits(struct em_perf_domain *pd,
181+
unsigned long freq_min_khz, unsigned long freq_max_khz);
176182

177183
/**
178184
* em_pd_get_efficient_state() - Get an efficient performance state from the EM
179185
* @table: List of performance states, in ascending order
180-
* @nr_perf_states: Number of performance states
186+
* @pd: performance domain for which this must be done
181187
* @max_util: Max utilization to map with the EM
182-
* @pd_flags: Performance Domain flags
183188
*
184189
* It is called from the scheduler code quite frequently and as a consequence
185190
* doesn't implement any check.
@@ -188,13 +193,16 @@ int em_dev_update_chip_binning(struct device *dev);
188193
* requirement.
189194
*/
190195
static inline int
191-
em_pd_get_efficient_state(struct em_perf_state *table, int nr_perf_states,
192-
unsigned long max_util, unsigned long pd_flags)
196+
em_pd_get_efficient_state(struct em_perf_state *table,
197+
struct em_perf_domain *pd, unsigned long max_util)
193198
{
199+
unsigned long pd_flags = pd->flags;
200+
int min_ps = pd->min_perf_state;
201+
int max_ps = pd->max_perf_state;
194202
struct em_perf_state *ps;
195203
int i;
196204

197-
for (i = 0; i < nr_perf_states; i++) {
205+
for (i = min_ps; i <= max_ps; i++) {
198206
ps = &table[i];
199207
if (ps->performance >= max_util) {
200208
if (pd_flags & EM_PERF_DOMAIN_SKIP_INEFFICIENCIES &&
@@ -204,7 +212,7 @@ em_pd_get_efficient_state(struct em_perf_state *table, int nr_perf_states,
204212
}
205213
}
206214

207-
return nr_perf_states - 1;
215+
return max_ps;
208216
}
209217

210218
/**
@@ -253,8 +261,7 @@ static inline unsigned long em_cpu_energy(struct em_perf_domain *pd,
253261
* requested performance.
254262
*/
255263
em_table = rcu_dereference(pd->em_table);
256-
i = em_pd_get_efficient_state(em_table->state, pd->nr_perf_states,
257-
max_util, pd->flags);
264+
i = em_pd_get_efficient_state(em_table->state, pd, max_util);
258265
ps = &em_table->state[i];
259266

260267
/*
@@ -391,6 +398,12 @@ static inline int em_dev_update_chip_binning(struct device *dev)
391398
{
392399
return -EINVAL;
393400
}
401+
static inline
402+
int em_update_performance_limits(struct em_perf_domain *pd,
403+
unsigned long freq_min_khz, unsigned long freq_max_khz)
404+
{
405+
return -EINVAL;
406+
}
394407
#endif
395408

396409
#endif

0 commit comments

Comments
 (0)