|
| 1 | +==================== |
| 2 | +Work Queue Deadlocks |
| 3 | +==================== |
| 4 | + |
| 5 | +Use of Work Queues |
| 6 | +================== |
| 7 | + |
| 8 | +Most network drivers use a work queue to handle network events. This is done for |
| 9 | +two reason: (1) Most of the example code to leverage from does it that way, and (2) |
| 10 | +it is easier and is a more efficient use memory resources to use the work queue |
| 11 | +rather than creating a dedicated task/thread to service the network. |
| 12 | + |
| 13 | +High and Low Priority Work Queues |
| 14 | +================================= |
| 15 | + |
| 16 | +There are two work queues: A single, high priority work queue that is intended |
| 17 | +only to service the back end interrupt processing in a semi-normal, tasking |
| 18 | +context. And low priority work queue(s) that are similar but as then name implies |
| 19 | +are lower in priority and not dedicated for time-critical back end interrupt |
| 20 | +processing. |
| 21 | + |
| 22 | +Downsides of Work Queues |
| 23 | +======================== |
| 24 | + |
| 25 | +There are two important downsides to the use of work queues. First, the work queues |
| 26 | +are inherently non-deterministic. The time delay from the point at which you |
| 27 | +schedule work and the time at which the work is performed in highly random and |
| 28 | +that delay is due not only to the strict priority scheduling but also to what |
| 29 | +work as been queued ahead of you. |
| 30 | + |
| 31 | +Why do you bother to use an RTOS if you rely on non-deterministic work queues to do |
| 32 | +most of the work? |
| 33 | + |
| 34 | +A second problem is related: Only one work queue job can be performed at a time. |
| 35 | +That job should be brief so that it can make the work queue available again for |
| 36 | +the next work queue job as soon as possible. And that job should never block |
| 37 | +waiting for resources! If the job blocks, then it blocks the entire work queue |
| 38 | +and makes the whole work queue unavailable for the duration of the wait. |
| 39 | + |
| 40 | +Networking on Work Queues |
| 41 | +========================= |
| 42 | + |
| 43 | +As mentioned, most network drivers use a work queue to handle network events. |
| 44 | +(some are even configurable to use high priority work queue... YIKES!). Most |
| 45 | +network operations are not really suited for execution on a work queue: The |
| 46 | +networking operations can be quite extended and also can block waiting for for |
| 47 | +the availability of resources. So, at a minimum, networking should never use |
| 48 | +the high priority work queue. |
| 49 | + |
| 50 | +Deadlocks |
| 51 | +========= |
| 52 | + |
| 53 | +If there is only a single instance of a work queue, then it is easy to create a |
| 54 | +deadlock on the work queue if a work job blocks on the work queue. Here is the |
| 55 | +generic work queue deadlock scenario: |
| 56 | + |
| 57 | +* A job runs on a work queue and waits for the availability of a resource. |
| 58 | +* The operation that provides that resource also runs on the same work queue. |
| 59 | +* But since the work queue is blocked waiting for the resource, the job that |
| 60 | + provides the resource cannot run and a deadlock results. |
| 61 | + |
| 62 | +IOBs |
| 63 | +==== |
| 64 | + |
| 65 | +IOBs (I/O Blocks) are small I/O buffers that can be linked together in chains to |
| 66 | +efficiently buffer variable sized network packet data. This is a much more |
| 67 | +efficient use of buffering space than full packet buffers since the packets |
| 68 | +content is often much smaller than the full packet size (the MSS). |
| 69 | + |
| 70 | +The network allocates IOBs to support TCP and UDP read-ahead buffering and write |
| 71 | +buffering. Read-head buffering is used when TCP/UDP data is received and there is |
| 72 | +no receiver in place waiting to accept the data. In this case, the received |
| 73 | +payload is buffered in the IOB-based, read-ahead buffers. When the application |
| 74 | +next calls ``revc()`` or ``recvfrom()``, the date will be removed from the read-ahead |
| 75 | +buffer and returned to the caller immediately. |
| 76 | + |
| 77 | +Write-buffering refers to the similar feature on the outgoing side. When application |
| 78 | +calls ``send()`` or ``sendto()`` and the driver is not available to accept the new packet |
| 79 | +data, then data is buffered in IOBs in the write buffer chain. When the network |
| 80 | +driver is finally available to take more data, then packet data is removed from |
| 81 | +the write-buffer and provided to the driver. |
| 82 | + |
| 83 | +The IOBs are allocated with a fixed size. A fixed number of IOBs are pre-allocated |
| 84 | +when the system starts. If the network runs out of IOBs, additional IOBs will not |
| 85 | +be allocated dynamically, rather, the IOB allocator, ``iob_alloc()`` will block waiting |
| 86 | +until an IOB is finally returned to pool of free IOBs. There is also a non-blocking |
| 87 | +IOB allocator, ``iob_tryalloc()``. |
| 88 | + |
| 89 | +Under conditions of high utilization, such as sending large amount of data at high |
| 90 | +rates or receiving large amounts of data at high rates, it is inevitable that the |
| 91 | +system will run out of pre-allocated IOBs. For read-ahead buffering, the packets |
| 92 | +are simply dropped in this case. For TCP this means that there will be a subsequent |
| 93 | +timeout on the remote peer because no ACK will be received and the remote peer will |
| 94 | +eventually re-transmit the packet. UDP is a lossy transfer and handling of lost or |
| 95 | +dropped datagrams must be included in any UDP design. |
| 96 | + |
| 97 | +For write-buffering, there are three possible behaviors that can occur when the |
| 98 | +IOB pool has been exhausted: First, if there are no available IOBs at the beginning |
| 99 | +of a ``send()`` or ``sendto()`` transfer, then the operation will block until IOBs are again |
| 100 | +available if ``O_NONBLOCK`` is not selected. This delay can can be a substantial amount |
| 101 | +of time. |
| 102 | + |
| 103 | +Second, if ``O_NONBLOCK`` is selected, the send will, of course, return immediatly, |
| 104 | +failing with errno set ``EAGAIN`` if we cannot allocate the first IOB for the transfer. |
| 105 | + |
| 106 | +The third behavior occurs if the we run out of IOBs in the middle of the transfer. |
| 107 | +Then the send operation will not wait but will instead send then number of bytes that |
| 108 | +it has successfully buffered. Applications should always check the return value from |
| 109 | +``send()`` or ``sendto()``. If it a is a byte count less then the requested transfer |
| 110 | +size, then the send function should be called again. |
| 111 | + |
| 112 | +The blocking iob_alloc() call is also the a common cause of work queue deadlocks. |
| 113 | +The scenario again is: |
| 114 | + |
| 115 | +* Some logic in the OS runs on a work queue and blocks waiting for an IOB to |
| 116 | + become available, |
| 117 | +* The logic that releases the IOB also runs on the same work queue, but |
| 118 | +* That logic that provides the IOB cannot execute, however, because the other job |
| 119 | + is blocked waiting for the IOB on the same work queue. |
| 120 | + |
| 121 | +Alternatives to Work Queues |
| 122 | +=========================== |
| 123 | + |
| 124 | +To avoid network deadlocks here is the rule: Never run the network on a singleton |
| 125 | +work queue! |
| 126 | + |
| 127 | +Most network implementation do just that! Here are a couple of alternatives: |
| 128 | + |
| 129 | +#. Use Multiple Low Priority Work Queues |
| 130 | + Unlike the high priority work queues, the low priority work queues utilize a |
| 131 | + thread pool. The number of threads in the pool is controlled by the |
| 132 | + ``CONFIG_SCHED_LPNTHREADS``. If ``CONFIG_SCHED_LPNTHREADS`` is greater than one, |
| 133 | + then such deadlocks should not be possible: In that case, if a thread is busy with |
| 134 | + some other job (even if it is only waiting for a resource), then the job will be |
| 135 | + assigned to a different thread and the deadlock will be broken. The cost of the |
| 136 | + additional low priority work queue thread is primarily the memory set aside for |
| 137 | + the thread's stack. |
| 138 | + |
| 139 | +#. Use a Dedicated Network Thread |
| 140 | + The best solution would be to write a custom kernel thread to handle driver |
| 141 | + network operations. This would be the highest performing and the most manageable. |
| 142 | + It would also, however, but substantially more work. |
| 143 | + |
| 144 | +#. Interactions with Network Locks |
| 145 | + The network lock is a re-entrant mutex that enforces mutually exclusive access to |
| 146 | + the network. The network lock can also cause deadlocks and can also interact with |
| 147 | + the work queues to degrade performance. Consider this scenario: |
| 148 | + |
| 149 | + * Some network logic, perhaps running on on the application thread, takes the network |
| 150 | + lock then waits for an IOB to become available (on the application thread, not a |
| 151 | + work queue). |
| 152 | + * Some network related event runs on the work queue but is blocked waiting for |
| 153 | + the network lock. |
| 154 | + * Another job is queued behind that network job. This is the one that provides the |
| 155 | + IOB, but it cannot run because the other thread is blocked waiting for the network |
| 156 | + lock on the work queue. |
| 157 | + |
| 158 | + But the network will not be unlocked because the application logic holds the network |
| 159 | + lock and is waiting for the IOB which can never be released. |
| 160 | + |
| 161 | + Within the network, this deadlock condition is avoided using a special function |
| 162 | + ``net_ioballoc()``. ``net_ioballoc()`` is a wrapper around the blocking ``iob_alloc()`` |
| 163 | + that momentarily releases the network lock while waiting for the IOB to become available. |
| 164 | + |
| 165 | + Similarly, the network functions ``net_lockedait()`` and ``net_timedait()`` are wrappers |
| 166 | + around ``nxsem_wait()`` ``nxsem_timedwait()``, respectively, and also release the network |
| 167 | + lock for the duration of the wait. |
| 168 | + |
| 169 | + Caution should be used with any of these wrapper functions. Because the network lock is |
| 170 | + relinquished during the wait, there could changes in the network state that occur before |
| 171 | + the lock is recovered. Your design should account for this possibility. |
| 172 | + |
| 173 | + |
| 174 | + |
0 commit comments