-
Notifications
You must be signed in to change notification settings - Fork 7.7k
kernel: assert no spinlock is held on swap when !USE_SWITCH #93440
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
kernel: assert no spinlock is held on swap when !USE_SWITCH #93440
Conversation
The do_swap() routine used when CONFIG_USE_SWITCH=y asserts that caller thread does not hold any spinlock when CONFIG_SPIN_VALIDATE is enabled. However, there is no similar check in place when CONFIG_USE_SWITCH=n. Copy this assertion in the USE_SWITCH=n implementation of z_swap_irqlock(). Signed-off-by: Mathieu Choplain <mathieu.choplain@st.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For clarity: this is useful for architectures like arm32 that have an arch_swap() and miss the check elsewhere. The error being detected is always wrong.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, this caught the bug in my case where a thread holding the spinlock tries to sleep (mutex).
For the record, this patch breaks part of the Lines 242 to 251 in 1f69b91
Using QEMU, I verified that the existing assertion (when diff --git a/samples/hello_world/src/main.c b/samples/hello_world/src/main.c
index c550ab461cb..be87d15501c 100644
--- a/samples/hello_world/src/main.c
+++ b/samples/hello_world/src/main.c
@@ -5,10 +5,13 @@
*/
#include <stdio.h>
+#include <zephyr/kernel.h>
int main(void)
{
+ irq_lock();
printf("Hello World! %s\n", CONFIG_BOARD_TARGET);
+ k_msleep(1000);
return 0;
} $ # with upstream
$ west build -b qemu_x86_64 samples/hello_world/ -DCONFIG_USE_SWITCH=y -DCONFIG_ASSERT=y -DCONFIG_SPIN_VALIDATE=y -p
$ west build -t run
-- west build: running target run
[3/4] To exit from QEMU enter: 'CTRL+a, x'[QEMU] CPU: qemu64,+x2apic
qemu-system-x86_64: warning: TCG doesn't support requested feature: CPUID.01H:ECX.x2apic [bit 21]
qemu-system-x86_64: warning: TCG doesn't support requested feature: CPUID.01H:ECX.x2apic [bit 21]
SeaBIOS (version zephyr-v1.0.0-0-g31d4e0e-dirty-20200714_234759-fv-az50-zephyr)
Booting from ROM..
*** Booting Zephyr OS build v4.2.0-334-g124fb897b490 ***
Hello World! qemu_x86_64/atom
ASSERTION FAIL [arch_irq_unlocked(key) || z_smp_current_get()->base.thread_state & (((1UL << (0))) | ((1UL << (3))))] @ WEST_TOPDIR/zephyr/kernel/include/kswap.h:98
Context switching while holding lock!
RAX: 0x0000000000000004 RBX: 0x000000000011fd90 RCX: 0x0000000000000001 RDX: 0x0000000000000000
RSI: 0x0000000000000062 RDI: 0x00000000001090d2 RBP: 0x000000000012bd70 RSP: 0x000000000012bd48
R8: 0x0000000000000000 R9: 0x0000000000000000 R10: 0x0000000000000002 R11: 0x0000000000000000
R12: 0x000000000010b500 R13: 0x0000000000000000 R14: 0x0000000000000000 R15: 0x000000000012be48
RSP: 0x000000000012bd48 RFLAGS: 0x0000000000000002 CS: 0x0018 CR3: 0x0000000000136000
RIP: 0x000000000010046f
call trace:
0: 0x00000000001051d5
1: 0x000000000010524d
2: 0x0000000000100025
3: 0x0000000000102f2d
4: 0x000000000010045a On a related note, the zephyr/include/zephyr/spinlock.h Lines 154 to 181 in 1f69b91
The options I see are:
|
My vote would just be to remove that section of the docs and rip the bandaid off with the warning (which can always be disabled via kconfig anyway). There are precisely zero circumstances where breaking a critical section (!) lock with a context switch (!!) cannot lead to tears. Code that does that is not only breaking its own carefully curated lock in an immensely clever way that it clearly thought carefully about and knows has no unexpected interactions. Because of the recursive behavior of irq_lock() it's also breaking the locks taken by all the callers up the stack, who had no idea that calling into this obnoxiously clever subsystem was a synchronization trap. It just can't work. We never should have documented it (and I didn't even know we had!). |
The
do_swap()
routine used whenCONFIG_USE_SWITCH=y
asserts that caller thread does not hold any spinlock whenCONFIG_SPIN_VALIDATE
is enabled. However, there is no similar check in place whenCONFIG_USE_SWITCH=n
.Copy this assertion in the
USE_SWITCH=n
implementation ofz_swap_irqlock()
.