kernel: assert no spinlock is held on swap when !USE_SWITCH #93440

mathieuchopstm · 2025-07-21T12:10:47Z

The do_swap() routine used when CONFIG_USE_SWITCH=y asserts that caller thread does not hold any spinlock when CONFIG_SPIN_VALIDATE is enabled. However, there is no similar check in place when CONFIG_USE_SWITCH=n.

Copy this assertion in the USE_SWITCH=n implementation of z_swap_irqlock().

The do_swap() routine used when CONFIG_USE_SWITCH=y asserts that caller thread does not hold any spinlock when CONFIG_SPIN_VALIDATE is enabled. However, there is no similar check in place when CONFIG_USE_SWITCH=n. Copy this assertion in the USE_SWITCH=n implementation of z_swap_irqlock(). Signed-off-by: Mathieu Choplain <mathieu.choplain@st.com>

andyross

For clarity: this is useful for architectures like arm32 that have an arch_swap() and miss the check elsewhere. The error being detected is always wrong.

sonarqubecloud · 2025-07-21T12:14:29Z

Quality Gate passed

Issues
0 New issues
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

krish2718

Thanks, this caught the bug in my case where a thread holding the spinlock tries to sleep (mutex).

mathieuchopstm · 2025-07-22T08:39:31Z

For the record, this patch breaks part of the irq_lock() specification when CONFIG_SPIN_VALIDATE=y since there's no way to distinguish between k_spin_lock() and irq_lock():

zephyr/include/zephyr/irq.h

Lines 242 to 251 in 1f69b91

    
            * @note 
        
            * This routine can be called by ISRs or by threads. If it is called by a 
        
            * thread, the interrupt lock is thread-specific; this means that interrupts 
        
            * remain disabled only while the thread is running. If the thread performs an 
        
            * operation that allows another thread to run (for example, giving a semaphore 
        
            * or sleeping for N milliseconds), the interrupt lock no longer applies and 
        
            * interrupts may be re-enabled while other processing occurs. When the thread 
        
            * once again becomes the current thread, the kernel re-establishes its 
        
            * interrupt lock; this ensures the thread won't be interrupted until it has 
        
            * explicitly released the interrupt lock it established.

Using QEMU, I verified that the existing assertion (when CONFIG_USE_SWITCH=y, from bd07756) does panic if the irq_lock() is held upon context switch:

diff --git a/samples/hello_world/src/main.c b/samples/hello_world/src/main.c
index c550ab461cb..be87d15501c 100644
--- a/samples/hello_world/src/main.c
+++ b/samples/hello_world/src/main.c
@@ -5,10 +5,13 @@
  */
 
 #include <stdio.h>
+#include <zephyr/kernel.h>
 
 int main(void)
 {
+       irq_lock();
        printf("Hello World! %s\n", CONFIG_BOARD_TARGET);
+       k_msleep(1000);
 
        return 0;
 }

$ # with upstream
$ west build -b qemu_x86_64 samples/hello_world/ -DCONFIG_USE_SWITCH=y -DCONFIG_ASSERT=y -DCONFIG_SPIN_VALIDATE=y -p
$ west build -t run
-- west build: running target run
[3/4] To exit from QEMU enter: 'CTRL+a, x'[QEMU] CPU: qemu64,+x2apic
qemu-system-x86_64: warning: TCG doesn't support requested feature: CPUID.01H:ECX.x2apic [bit 21]
qemu-system-x86_64: warning: TCG doesn't support requested feature: CPUID.01H:ECX.x2apic [bit 21]
SeaBIOS (version zephyr-v1.0.0-0-g31d4e0e-dirty-20200714_234759-fv-az50-zephyr)
Booting from ROM..
*** Booting Zephyr OS build v4.2.0-334-g124fb897b490 ***
Hello World! qemu_x86_64/atom
ASSERTION FAIL [arch_irq_unlocked(key) || z_smp_current_get()->base.thread_state & (((1UL << (0))) | ((1UL << (3))))] @ WEST_TOPDIR/zephyr/kernel/include/kswap.h:98
        Context switching while holding lock!
RAX: 0x0000000000000004 RBX: 0x000000000011fd90 RCX: 0x0000000000000001 RDX: 0x0000000000000000
RSI: 0x0000000000000062 RDI: 0x00000000001090d2 RBP: 0x000000000012bd70 RSP: 0x000000000012bd48
 R8: 0x0000000000000000  R9: 0x0000000000000000 R10: 0x0000000000000002 R11: 0x0000000000000000
R12: 0x000000000010b500 R13: 0x0000000000000000 R14: 0x0000000000000000 R15: 0x000000000012be48
RSP: 0x000000000012bd48 RFLAGS: 0x0000000000000002 CS: 0x0018 CR3: 0x0000000000136000
RIP: 0x000000000010046f
call trace:
     0: 0x00000000001051d5
     1: 0x000000000010524d
     2: 0x0000000000100025
     3: 0x0000000000102f2d
     4: 0x000000000010045a

On a related note, the k_spin_lock() documentation does not mention whether or not holding a k_spinlock is allowed upon reschedule:

zephyr/include/zephyr/spinlock.h

Lines 154 to 181 in 1f69b91

    
           /** 
        
            * @brief Lock a spinlock 
        
            * 
        
            * This routine locks the specified spinlock, returning a key handle 
        
            * representing interrupt state needed at unlock time.  Upon 
        
            * returning, the calling thread is guaranteed not to be suspended or 
        
            * interrupted on its current CPU until it calls k_spin_unlock().  The 
        
            * implementation guarantees mutual exclusion: exactly one thread on 
        
            * one CPU will return from k_spin_lock() at a time.  Other CPUs 
        
            * trying to acquire a lock already held by another CPU will enter an 
        
            * implementation-defined busy loop ("spinning") until the lock is 
        
            * released. 
        
            * 
        
            * Separate spin locks may be nested. It is legal to lock an 
        
            * (unlocked) spin lock while holding a different lock.  Spin locks 
        
            * are not recursive, however: an attempt to acquire a spin lock that 
        
            * the CPU already holds will deadlock. 
        
            * 
        
            * In circumstances where only one CPU exists, the behavior of 
        
            * k_spin_lock() remains as specified above, though obviously no 
        
            * spinning will take place.  Implementations may be free to optimize 
        
            * in uniprocessor contexts such that the locking reduces to an 
        
            * interrupt mask operation. 
        
            * 
        
            * @param l A pointer to the spinlock to lock 
        
            * @return A key value that must be passed to k_spin_unlock() when the 
        
            *         lock is released. 
        
            */

The options I see are:

don't merge this patch ( 😢 )
merge patch as-is

Holders of irq_lock on context switch will inexplicably panic if CONFIG_SPIN_VALIDATE=y
...and it should be remarked that CONFIG_SPIN_VALIDATE usually defaults to y if CONFIG_ASSERT=y

update documentation of irq_lock() to indicate that holding irq_lock on context switch will panic when CONFIG_SPIN_VALIDATE=y
update documentation of irq_lock() to no longer allow holding irq_lock on context switch (breaking change?)

andyross · 2025-07-22T20:56:46Z

My vote would just be to remove that section of the docs and rip the bandaid off with the warning (which can always be disabled via kconfig anyway). There are precisely zero circumstances where breaking a critical section (!) lock with a context switch (!!) cannot lead to tears.

Code that does that is not only breaking its own carefully curated lock in an immensely clever way that it clearly thought carefully about and knows has no unexpected interactions. Because of the recursive behavior of irq_lock() it's also breaking the locks taken by all the callers up the stack, who had no idea that calling into this obnoxiously clever subsystem was a synchronization trap.

It just can't work. We never should have documented it (and I didn't even know we had!).

zephyrbot added the area: Kernel label Jul 21, 2025

zephyrbot requested review from andyross, ceolin, cfriedt, dcpleung, nashif, npitre, peter-mitsis and TaiJuWu July 21, 2025 12:11

zephyrbot assigned andyross and peter-mitsis Jul 21, 2025

andyross approved these changes Jul 21, 2025

View reviewed changes

krish2718 approved these changes Jul 21, 2025

View reviewed changes

peter-mitsis approved these changes Jul 21, 2025

View reviewed changes

bjarki-andreasen approved these changes Jul 21, 2025

View reviewed changes

TaiJuWu approved these changes Jul 22, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

kernel: assert no spinlock is held on swap when !USE_SWITCH #93440

kernel: assert no spinlock is held on swap when !USE_SWITCH #93440

Uh oh!

mathieuchopstm commented Jul 21, 2025

Uh oh!

andyross left a comment

Uh oh!

sonarqubecloud bot commented Jul 21, 2025

Uh oh!

krish2718 left a comment

Uh oh!

mathieuchopstm commented Jul 22, 2025

Uh oh!

andyross commented Jul 22, 2025

Uh oh!

Uh oh!

kernel: assert no spinlock is held on swap when !USE_SWITCH #93440

Are you sure you want to change the base?

kernel: assert no spinlock is held on swap when !USE_SWITCH #93440

Uh oh!

Conversation

mathieuchopstm commented Jul 21, 2025

Uh oh!

andyross left a comment

Choose a reason for hiding this comment

Uh oh!

sonarqubecloud bot commented Jul 21, 2025

Quality Gate passed

Uh oh!

krish2718 left a comment

Choose a reason for hiding this comment

Uh oh!

mathieuchopstm commented Jul 22, 2025

Uh oh!

andyross commented Jul 22, 2025

Uh oh!

Uh oh!