Documentation: migrate "Work Queue Deadlocks" from wiki

raiden00pl · acassis · commit 1c6e5cc0c47b · 2023-11-06T18:42:17.000-03:00
link: https://cwiki.apache.org/confluence/display/NUTTX/Work+Queue+Deadlocks
diff --git a/Documentation/components/net/index.rst b/Documentation/components/net/index.rst
@@ -9,6 +9,7 @@ Network Support
   socketcan.rst
   netguardsize.rst
   slip.rst
+  wqueuedeadlocks.rst
 
 ``net`` Directory Structure ::
 
diff --git a/Documentation/components/net/wqueuedeadlocks.rst b/Documentation/components/net/wqueuedeadlocks.rst
@@ -0,0 +1,174 @@
+====================
+Work Queue Deadlocks
+====================
+
+Use of Work Queues
+==================
+
+Most network drivers use a work queue to handle network events. This is done for
+two reason: (1) Most of the example code to leverage from does it that way, and (2)
+it is easier and is a more efficient use memory resources to use the work queue
+rather than creating a dedicated task/thread to service the network.
+
+High and Low Priority Work Queues
+=================================
+
+There are two work queues: A single, high priority work queue that is intended
+only to service the back end interrupt processing in a semi-normal, tasking
+context. And low priority work queue(s) that are similar but as then name implies
+are lower in priority and not dedicated for time-critical back end interrupt
+processing.
+
+Downsides of Work Queues
+========================
+
+There are two important downsides to the use of work queues. First, the work queues
+are inherently non-deterministic. The time delay from the point at which you
+schedule work and the time at which the work is performed in highly random and
+that delay is due not only to the strict priority scheduling but also to what
+work as been queued ahead of you.
+
+Why do you bother to use an RTOS if you rely on non-deterministic work queues to do
+most of the work?
+
+A second problem is related: Only one work queue job can be performed at a time.
+That job should be brief so that it can make the work queue available again for
+the next work queue job as soon as possible. And that job should never block
+waiting for resources! If the job blocks, then it blocks the entire work queue
+and makes the whole work queue unavailable for the duration of the wait.
+
+Networking on Work Queues
+=========================
+
+As mentioned, most network drivers use a work queue to handle network events.
+(some are even configurable to use high priority work queue... YIKES!). Most
+network operations are not really suited for execution on a work queue: The
+networking operations can be quite extended and also can block waiting for for
+the availability of resources. So, at a minimum, networking should never use
+the high priority work queue.
+
+Deadlocks
+=========
+
+If there is only a single instance of a work queue, then it is easy to create a
+deadlock on the work queue if a work job blocks on the work queue. Here is the
+generic work queue deadlock scenario:
+
+* A job runs on a work queue and waits for the availability of a resource.
+* The operation that provides that resource also runs on the same work queue.
+* But since the work queue is blocked waiting for the resource, the job that
+  provides the resource cannot run and a deadlock results.
+
+IOBs
+====
+
+IOBs (I/O Blocks) are small I/O buffers that can be linked together in chains to
+efficiently buffer variable sized network packet data. This is a much more
+efficient use of buffering space than full packet buffers since the packets
+content is often much smaller than the full packet size (the MSS).
+
+The network allocates IOBs to support TCP and UDP read-ahead buffering and write
+buffering. Read-head buffering is used when TCP/UDP data is received and there is
+no receiver in place waiting to accept the data. In this case, the received
+payload is buffered in the IOB-based, read-ahead buffers. When the application
+next calls ``revc()`` or ``recvfrom()``, the date will be removed from the read-ahead
+buffer and returned to the caller immediately.
+
+Write-buffering refers to the similar feature on the outgoing side. When application
+calls ``send()`` or ``sendto()`` and the driver is not available to accept the new packet
+data, then data is buffered in IOBs in the write buffer chain. When the network
+driver is finally available to take more data, then packet data is removed from
+the write-buffer and provided to the driver.
+
+The IOBs are allocated with a fixed size. A fixed number of IOBs are pre-allocated
+when the system starts. If the network runs out of IOBs, additional IOBs will not
+be allocated dynamically, rather, the IOB allocator, ``iob_alloc()`` will block waiting
+until an IOB is finally returned to pool of free IOBs. There is also a non-blocking
+IOB allocator, ``iob_tryalloc()``.
+
+Under conditions of high utilization, such as sending large amount of data at high
+rates or receiving large amounts of data at high rates, it is inevitable that the
+system will run out of pre-allocated IOBs. For read-ahead buffering, the packets
+are simply dropped in this case. For TCP this means that there will be a subsequent
+timeout on the remote peer because no ACK will be received and the remote peer will
+eventually re-transmit the packet. UDP is a lossy transfer and handling of lost or
+dropped datagrams must be included in any UDP design.
+
+For write-buffering, there are three possible behaviors that can occur when the
+IOB pool has been exhausted: First, if there are no available IOBs at the beginning
+of a ``send()`` or ``sendto()`` transfer, then the operation will block until IOBs are again
+available if ``O_NONBLOCK`` is not selected. This delay can can be a substantial amount
+of time.
+
+Second, if ``O_NONBLOCK`` is selected, the send will, of course, return immediatly,
+failing with errno set ``EAGAIN`` if we cannot allocate the first IOB for the transfer.
+
+The third behavior occurs if the we run out of IOBs in the middle of the transfer.
+Then the send operation will not wait but will instead send then number of bytes that
+it has successfully buffered. Applications should always check the return value from
+``send()`` or ``sendto()``. If it a is a byte count less then the requested transfer
+size, then the send function should be called again.
+
+The blocking iob_alloc() call is also the a common cause of work queue deadlocks.
+The scenario again is:
+
+* Some logic in the OS runs on a work queue and blocks waiting for an IOB to
+  become available,
+* The logic that releases the IOB also runs on the same work queue, but
+* That logic that provides the IOB cannot execute, however, because the other job
+  is blocked waiting for the IOB on the same work queue.
+
+Alternatives to Work Queues
+===========================
+
+To avoid network deadlocks here is the rule: Never run the network on a singleton
+work queue!
+
+Most network implementation do just that! Here are a couple of alternatives:
+
+#. Use Multiple Low Priority Work Queues
+   Unlike the high priority work queues, the low priority work queues utilize a
+   thread pool. The number of threads in the pool is controlled by the
+   ``CONFIG_SCHED_LPNTHREADS``. If ``CONFIG_SCHED_LPNTHREADS`` is greater than one,
+   then such deadlocks should not be possible: In that case, if a thread is busy with
+   some other job (even if it is only waiting for a resource), then the job will be
+   assigned to a different thread and the deadlock will be broken. The cost of the
+   additional low priority work queue thread is primarily the memory set aside for
+   the thread's stack.
+
+#. Use a Dedicated Network Thread
+   The best solution would be to write a custom kernel thread to handle driver
+   network operations. This would be the highest performing and the most manageable.
+   It would also, however, but substantially more work.
+
+#. Interactions with Network Locks
+   The network lock is a re-entrant mutex that enforces mutually exclusive access to
+   the network. The network lock can also cause deadlocks and can also interact with
+   the work queues to degrade performance. Consider this scenario:
+
+     * Some network logic, perhaps running on on the application thread, takes the network
+       lock then waits for an IOB to become available (on the application thread, not a
+       work queue).
+     * Some network related event runs on the work queue but is blocked waiting for
+       the network lock.
+     * Another job is queued behind that network job. This is the one that provides the
+       IOB, but it cannot run because the other thread is blocked waiting for the network
+       lock on the work queue.
+
+   But the network will not be unlocked because the application logic holds the network
+   lock and is waiting for the IOB which can never be released.
+
+   Within the network, this deadlock condition is avoided using a special function
+   ``net_ioballoc()``. ``net_ioballoc()`` is a wrapper around the blocking ``iob_alloc()``
+   that momentarily releases the network lock while waiting for the IOB to become available.
+
+   Similarly, the network functions ``net_lockedait()`` and ``net_timedait()`` are wrappers
+   around ``nxsem_wait()`` ``nxsem_timedwait()``, respectively, and also release the network
+   lock for the duration of the wait.
+
+   Caution should be used with any of these wrapper functions. Because the network lock is
+   relinquished during the wait, there could changes in the network state that occur before
+   the lock is recovered. Your design should account for this possibility.
+
+
+