Skip to content

Commit 922efd2

Browse files
Frederic WeisbeckerKAGA-KOKO
authored andcommitted
timers/migration: Annotate accesses to ignore flag
The group's ignore flag is: _ read under the group's lock (idle entry, remote expiry) _ turned on/off under the group's lock (idle entry, remote expiry) _ turned on locklessly on idle exit When idle entry or remote expiry clear the "ignore" flag of a group, the operation must be synchronized against other concurrent idle entry or remote expiry to make sure the related group timer is never missed. To enforce this synchronization, both "ignore" clear and read are performed under the group lock. On the contrary, whether idle entry or remote expiry manage to observe the "ignore" flag turned on by a CPU exiting idle is a matter of optimization. If that flag set is missed or cleared concurrently, the worst outcome is a migrator wasting time remotely handling a "ghost" timer. This is why the ignore flag can be set locklessly. Unfortunately, the related lockless accesses are bare and miss appropriate annotations. KCSAN rightfully complains: BUG: KCSAN: data-race in __tmigr_cpu_activate / print_report write to 0xffff88842fc28004 of 1 bytes by task 0 on cpu 0: __tmigr_cpu_activate tmigr_cpu_activate timer_clear_idle tick_nohz_restart_sched_tick tick_nohz_idle_exit do_idle cpu_startup_entry kernel_init do_initcalls clear_bss reserve_bios_regions common_startup_64 read to 0xffff88842fc28004 of 1 bytes by task 0 on cpu 1: print_report kcsan_report_known_origin kcsan_setup_watchpoint tmigr_next_groupevt tmigr_update_events tmigr_inactive_up __walk_groups+0x50/0x77 walk_groups __tmigr_cpu_deactivate tmigr_cpu_deactivate __get_next_timer_interrupt timer_base_try_to_set_idle tick_nohz_stop_tick tick_nohz_idle_stop_tick cpuidle_idle_call do_idle Although the relevant accesses could be marked as data_race(), the "ignore" flag being read several times within the same tmigr_update_events() function is confusing and error prone. Prefer reading it once in that function and make use of similar/paired accesses elsewhere with appropriate comments when necessary. Reported-by: kernel test robot <oliver.sang@intel.com> Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/all/20250114231507.21672-4-frederic@kernel.org Closes: https://lore.kernel.org/oe-lkp/202501031612.62e0c498-lkp@intel.com
1 parent de3ced7 commit 922efd2

File tree

1 file changed

+15
-6
lines changed

1 file changed

+15
-6
lines changed

kernel/time/timer_migration.c

Lines changed: 15 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -569,7 +569,7 @@ static struct tmigr_event *tmigr_next_groupevt(struct tmigr_group *group)
569569
while ((node = timerqueue_getnext(&group->events))) {
570570
evt = container_of(node, struct tmigr_event, nextevt);
571571

572-
if (!evt->ignore) {
572+
if (!READ_ONCE(evt->ignore)) {
573573
WRITE_ONCE(group->next_expiry, evt->nextevt.expires);
574574
return evt;
575575
}
@@ -665,7 +665,7 @@ static bool tmigr_active_up(struct tmigr_group *group,
665665
* lock is held while updating the ignore flag in idle path. So this
666666
* state change will not be lost.
667667
*/
668-
group->groupevt.ignore = true;
668+
WRITE_ONCE(group->groupevt.ignore, true);
669669

670670
return walk_done;
671671
}
@@ -726,6 +726,7 @@ bool tmigr_update_events(struct tmigr_group *group, struct tmigr_group *child,
726726
union tmigr_state childstate, groupstate;
727727
bool remote = data->remote;
728728
bool walk_done = false;
729+
bool ignore;
729730
u64 nextexp;
730731

731732
if (child) {
@@ -744,11 +745,19 @@ bool tmigr_update_events(struct tmigr_group *group, struct tmigr_group *child,
744745
nextexp = child->next_expiry;
745746
evt = &child->groupevt;
746747

747-
evt->ignore = (nextexp == KTIME_MAX) ? true : false;
748+
/*
749+
* This can race with concurrent idle exit (activate).
750+
* If the current writer wins, a useless remote expiration may
751+
* be scheduled. If the activate wins, the event is properly
752+
* ignored.
753+
*/
754+
ignore = (nextexp == KTIME_MAX) ? true : false;
755+
WRITE_ONCE(evt->ignore, ignore);
748756
} else {
749757
nextexp = data->nextexp;
750758

751759
first_childevt = evt = data->evt;
760+
ignore = evt->ignore;
752761

753762
/*
754763
* Walking the hierarchy is required in any case when a
@@ -774,7 +783,7 @@ bool tmigr_update_events(struct tmigr_group *group, struct tmigr_group *child,
774783
* first event information of the group is updated properly and
775784
* also handled properly, so skip this fast return path.
776785
*/
777-
if (evt->ignore && !remote && group->parent)
786+
if (ignore && !remote && group->parent)
778787
return true;
779788

780789
raw_spin_lock(&group->lock);
@@ -788,7 +797,7 @@ bool tmigr_update_events(struct tmigr_group *group, struct tmigr_group *child,
788797
* queue when the expiry time changed only or when it could be ignored.
789798
*/
790799
if (timerqueue_node_queued(&evt->nextevt)) {
791-
if ((evt->nextevt.expires == nextexp) && !evt->ignore) {
800+
if ((evt->nextevt.expires == nextexp) && !ignore) {
792801
/* Make sure not to miss a new CPU event with the same expiry */
793802
evt->cpu = first_childevt->cpu;
794803
goto check_toplvl;
@@ -798,7 +807,7 @@ bool tmigr_update_events(struct tmigr_group *group, struct tmigr_group *child,
798807
WRITE_ONCE(group->next_expiry, KTIME_MAX);
799808
}
800809

801-
if (evt->ignore) {
810+
if (ignore) {
802811
/*
803812
* When the next child event could be ignored (nextexp is
804813
* KTIME_MAX) and there was no remote timer handling before or

0 commit comments

Comments
 (0)