Skip to content

Commit 1c6e5cc

Browse files
raiden00placassis
authored andcommitted
Documentation: migrate "Work Queue Deadlocks" from wiki
link: https://cwiki.apache.org/confluence/display/NUTTX/Work+Queue+Deadlocks
1 parent 2d8c4e6 commit 1c6e5cc

File tree

2 files changed

+175
-0
lines changed

2 files changed

+175
-0
lines changed

Documentation/components/net/index.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -9,6 +9,7 @@ Network Support
99
socketcan.rst
1010
netguardsize.rst
1111
slip.rst
12+
wqueuedeadlocks.rst
1213

1314
``net`` Directory Structure ::
1415

Lines changed: 174 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,174 @@
1+
====================
2+
Work Queue Deadlocks
3+
====================
4+
5+
Use of Work Queues
6+
==================
7+
8+
Most network drivers use a work queue to handle network events. This is done for
9+
two reason: (1) Most of the example code to leverage from does it that way, and (2)
10+
it is easier and is a more efficient use memory resources to use the work queue
11+
rather than creating a dedicated task/thread to service the network.
12+
13+
High and Low Priority Work Queues
14+
=================================
15+
16+
There are two work queues: A single, high priority work queue that is intended
17+
only to service the back end interrupt processing in a semi-normal, tasking
18+
context. And low priority work queue(s) that are similar but as then name implies
19+
are lower in priority and not dedicated for time-critical back end interrupt
20+
processing.
21+
22+
Downsides of Work Queues
23+
========================
24+
25+
There are two important downsides to the use of work queues. First, the work queues
26+
are inherently non-deterministic. The time delay from the point at which you
27+
schedule work and the time at which the work is performed in highly random and
28+
that delay is due not only to the strict priority scheduling but also to what
29+
work as been queued ahead of you.
30+
31+
Why do you bother to use an RTOS if you rely on non-deterministic work queues to do
32+
most of the work?
33+
34+
A second problem is related: Only one work queue job can be performed at a time.
35+
That job should be brief so that it can make the work queue available again for
36+
the next work queue job as soon as possible. And that job should never block
37+
waiting for resources! If the job blocks, then it blocks the entire work queue
38+
and makes the whole work queue unavailable for the duration of the wait.
39+
40+
Networking on Work Queues
41+
=========================
42+
43+
As mentioned, most network drivers use a work queue to handle network events.
44+
(some are even configurable to use high priority work queue... YIKES!). Most
45+
network operations are not really suited for execution on a work queue: The
46+
networking operations can be quite extended and also can block waiting for for
47+
the availability of resources. So, at a minimum, networking should never use
48+
the high priority work queue.
49+
50+
Deadlocks
51+
=========
52+
53+
If there is only a single instance of a work queue, then it is easy to create a
54+
deadlock on the work queue if a work job blocks on the work queue. Here is the
55+
generic work queue deadlock scenario:
56+
57+
* A job runs on a work queue and waits for the availability of a resource.
58+
* The operation that provides that resource also runs on the same work queue.
59+
* But since the work queue is blocked waiting for the resource, the job that
60+
provides the resource cannot run and a deadlock results.
61+
62+
IOBs
63+
====
64+
65+
IOBs (I/O Blocks) are small I/O buffers that can be linked together in chains to
66+
efficiently buffer variable sized network packet data. This is a much more
67+
efficient use of buffering space than full packet buffers since the packets
68+
content is often much smaller than the full packet size (the MSS).
69+
70+
The network allocates IOBs to support TCP and UDP read-ahead buffering and write
71+
buffering. Read-head buffering is used when TCP/UDP data is received and there is
72+
no receiver in place waiting to accept the data. In this case, the received
73+
payload is buffered in the IOB-based, read-ahead buffers. When the application
74+
next calls ``revc()`` or ``recvfrom()``, the date will be removed from the read-ahead
75+
buffer and returned to the caller immediately.
76+
77+
Write-buffering refers to the similar feature on the outgoing side. When application
78+
calls ``send()`` or ``sendto()`` and the driver is not available to accept the new packet
79+
data, then data is buffered in IOBs in the write buffer chain. When the network
80+
driver is finally available to take more data, then packet data is removed from
81+
the write-buffer and provided to the driver.
82+
83+
The IOBs are allocated with a fixed size. A fixed number of IOBs are pre-allocated
84+
when the system starts. If the network runs out of IOBs, additional IOBs will not
85+
be allocated dynamically, rather, the IOB allocator, ``iob_alloc()`` will block waiting
86+
until an IOB is finally returned to pool of free IOBs. There is also a non-blocking
87+
IOB allocator, ``iob_tryalloc()``.
88+
89+
Under conditions of high utilization, such as sending large amount of data at high
90+
rates or receiving large amounts of data at high rates, it is inevitable that the
91+
system will run out of pre-allocated IOBs. For read-ahead buffering, the packets
92+
are simply dropped in this case. For TCP this means that there will be a subsequent
93+
timeout on the remote peer because no ACK will be received and the remote peer will
94+
eventually re-transmit the packet. UDP is a lossy transfer and handling of lost or
95+
dropped datagrams must be included in any UDP design.
96+
97+
For write-buffering, there are three possible behaviors that can occur when the
98+
IOB pool has been exhausted: First, if there are no available IOBs at the beginning
99+
of a ``send()`` or ``sendto()`` transfer, then the operation will block until IOBs are again
100+
available if ``O_NONBLOCK`` is not selected. This delay can can be a substantial amount
101+
of time.
102+
103+
Second, if ``O_NONBLOCK`` is selected, the send will, of course, return immediatly,
104+
failing with errno set ``EAGAIN`` if we cannot allocate the first IOB for the transfer.
105+
106+
The third behavior occurs if the we run out of IOBs in the middle of the transfer.
107+
Then the send operation will not wait but will instead send then number of bytes that
108+
it has successfully buffered. Applications should always check the return value from
109+
``send()`` or ``sendto()``. If it a is a byte count less then the requested transfer
110+
size, then the send function should be called again.
111+
112+
The blocking iob_alloc() call is also the a common cause of work queue deadlocks.
113+
The scenario again is:
114+
115+
* Some logic in the OS runs on a work queue and blocks waiting for an IOB to
116+
become available,
117+
* The logic that releases the IOB also runs on the same work queue, but
118+
* That logic that provides the IOB cannot execute, however, because the other job
119+
is blocked waiting for the IOB on the same work queue.
120+
121+
Alternatives to Work Queues
122+
===========================
123+
124+
To avoid network deadlocks here is the rule: Never run the network on a singleton
125+
work queue!
126+
127+
Most network implementation do just that! Here are a couple of alternatives:
128+
129+
#. Use Multiple Low Priority Work Queues
130+
Unlike the high priority work queues, the low priority work queues utilize a
131+
thread pool. The number of threads in the pool is controlled by the
132+
``CONFIG_SCHED_LPNTHREADS``. If ``CONFIG_SCHED_LPNTHREADS`` is greater than one,
133+
then such deadlocks should not be possible: In that case, if a thread is busy with
134+
some other job (even if it is only waiting for a resource), then the job will be
135+
assigned to a different thread and the deadlock will be broken. The cost of the
136+
additional low priority work queue thread is primarily the memory set aside for
137+
the thread's stack.
138+
139+
#. Use a Dedicated Network Thread
140+
The best solution would be to write a custom kernel thread to handle driver
141+
network operations. This would be the highest performing and the most manageable.
142+
It would also, however, but substantially more work.
143+
144+
#. Interactions with Network Locks
145+
The network lock is a re-entrant mutex that enforces mutually exclusive access to
146+
the network. The network lock can also cause deadlocks and can also interact with
147+
the work queues to degrade performance. Consider this scenario:
148+
149+
* Some network logic, perhaps running on on the application thread, takes the network
150+
lock then waits for an IOB to become available (on the application thread, not a
151+
work queue).
152+
* Some network related event runs on the work queue but is blocked waiting for
153+
the network lock.
154+
* Another job is queued behind that network job. This is the one that provides the
155+
IOB, but it cannot run because the other thread is blocked waiting for the network
156+
lock on the work queue.
157+
158+
But the network will not be unlocked because the application logic holds the network
159+
lock and is waiting for the IOB which can never be released.
160+
161+
Within the network, this deadlock condition is avoided using a special function
162+
``net_ioballoc()``. ``net_ioballoc()`` is a wrapper around the blocking ``iob_alloc()``
163+
that momentarily releases the network lock while waiting for the IOB to become available.
164+
165+
Similarly, the network functions ``net_lockedait()`` and ``net_timedait()`` are wrappers
166+
around ``nxsem_wait()`` ``nxsem_timedwait()``, respectively, and also release the network
167+
lock for the duration of the wait.
168+
169+
Caution should be used with any of these wrapper functions. Because the network lock is
170+
relinquished during the wait, there could changes in the network state that occur before
171+
the lock is recovered. Your design should account for this possibility.
172+
173+
174+

0 commit comments

Comments
 (0)