Description
Describe the bug
udc_submit_ep_event()
events are queued to usbd_msgq
inside usbd_event_carrier()
. The problem is that the function can silently fail to queue the event because k_msgq_put()
is called with K_NO_WAIT
:
zephyr/subsys/usb/device_next/usbd_core.c
Lines 34 to 38 in 0781caf
This can lead to all sort of issues, but two major issues have been notified so far:
- Endpoint events like
udc_submit_ep_event(dev, buf, -ECONNABORTED);
can be silently dropped, never making it to the class. Observed on MSC class that would not requeue the OUT transfer because it never received the-ECONNABORTED
on the request it submitted. The only solution was to reset the CPU (bus reset was not enough). - VBUS ready/removed sequence can fill the queue, leading to e.g. VBUS removed being processed last while the last attempted-to-submit-but-silently-dropped was VBUS ready (or vice versa).
To Reproduce
The following ways to fill the message queue were identified so far:
- A task prevents context switches for more than 1.25 ms while the device is connected to host at high-speed. The SOF events are submitted in interrupt handler routine, filling whole queue, leaving no room for other events. If
udc_submit_ep_event
is called before USB stack pops the SOFs fromusbd_msgq
, the events are silently dropped. - Rapid VBUS ready/removed sequence.
Expected behavior
Queued endpoint events are never dropped, i.e. there is always a matching struct usbd_class_api
request
call for every successful usbd_ep_enqueue()
.
Impact
If the issue trigger, the stack can cease to work until device reboot.
Logs and console output
Console output irrelevant, the underlying issue is that things that must never be silently dropped can be dropped.
Environment (please complete the following information):
- OS: Linux
- Toolchain: Zephyr SDK 0.16.5
- Commit SHA or Version used: 2efc447
Additional context
"Catching up" with not handled SOFs do not make much sense and can lead to avalanche effects if too many SOFs are queued for handling.