-
-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Description
After investigating it in depth we have found a bad design in DepUsrSCTP
class that allows a thread A make calls directly in thread B.
Overview of the usrsctp library integration in mediasoup
usrsctp was integrated in mediasoup once it merged the "Single thread" feature (PR sctplab/usrsctp#339). That PR added the usrsctp_init_nothreads()
API to run usrsctp in mono thread mode, meaning that usrsctp no longer runs its own timer into an internal thread but instead exposes an API for the application to be aware of when it should "ping" usrsctp so it can send pending SCTP messages (basically SCTP retransmissions due to non received SCTP ACK from the remote endpoint).
The way we integrated it was by wrapping the usrsctp API within the DepUsrSCTP
and SctpAssociation
class in mediasoup.
It's important to notice that this was designed having mediasoup for Node usage, this is: each mediasoup worker runs in a separate process with a single thread. From now on, this overview section describes the behavior in mediasoup for Node:
- The application creates a
Worker
so a new process is created with a single libuv loop. - During the
Worker
creation,DepUsrSCTP::ClassInit()
is created which callsusrsctp_init_nothreads()
by passing a staticonSendSctpData()
callback as argument, which will be later invoked by usrsctp when it needs to send SCTP retransmissions to remote endpoints. - The the
Worker
C++ constructor calls toDepUsrSctp::CreateChecker()
which creates a singleton instance of theDepUsrSCTP::Checker
class which is stored as astatic
member inDepUsrSCTP
. Such aChecker
inherits fromTimerHandler
since it's intended to "ping" usrsctp every 10ms so usrsctp can send pending SCTP retransmissions. - Of course, since mediasoup for Node launches a separate process for each mediasoup worker, there is no multi threading at all.
- When the first
XxxxTransport
with SCTP support is created in the worker, itsSctpAssociation
instance calls toDepUsrSCTP::RegisterSctpAssociation(this)
to register itself inDepUsrSCTP
. If it's the first SCTP association then theChecker
is started, meaning that it runs a periodic timer every 10ms. - So every 10ms
usrsctp_handle_timers()
is called and usrsctp may invoke theonSendSctpData
callback to send pending SCTP retransmissions. In such a callback, the address of the associatedSctpAssociation
is given so we callsctpAssociation->OnUsrSctpSendSctpData(data, len)
to send the SCTP data to the endpoint.
There is a single thread in the whole process (this is mediasoup for Node, remember) so everything is good. And it doesn't matter that the Node app creates many Workers
since each Worker
is a separate process with a single thread.
Problem in Rust with multi threaded Workers
When in mefiasoup for Rust, each Worker
doesn't run in a separate process but in a separate native thread. However usrsctp has global state so problems start showing up. Here an example:
- The Rust app creates first
Worker
1 which creates a new thread 1 with its own libuv loop 1. DepUsrSCTP::ClassInit()
is called. Here aGlobalInstances
static member was added which is incremented every timeClassInit()
is called. If it was 0 thenusrsctp_init_nothreads()
is called by passing the staticonSendSctpData
callback as argument.- Then when the first
SctpAssociation
is created,DepUsrSCTP::RegisterSctpAssociation()
is called from it. It increments a staticnumSctpAssociations
member and, if it was 0, then theChecker
is started. - Here the thing: The
Checker::Start()
method runs a periodic timer. In which thread? In the thread from where the method was called. In this case the thread 1 and its associated libuv loop 1. - But later the Rust app may create more
Workers
. It createsWorker
2 which creates a new thread 2 with its own libuv loop 2. DepUsrSCTP::ClassInit()
is called but sinceGlobalInstances
static member was not 0 it won't callusrsctp_init_nothreads()
again.- Then the first
SctpAssociation
inWorker
2 is created,DepUsrSCTP::RegisterSctpAssociation()
is called from it. It increments a staticnumSctpAssociations
member and, since it was not 0, theChecker
is not started again (it's already started). - Now there are SCTP traffic in both
Worker
1 andWorker 2
. Let's imagine there is packet loss so some SCTP messages created by usrsctp didn't reach the remote endpoints. - Eventually the periodic timer (which is running in thread 1 under libuv loop 1 under
Worker
1) fires, sousrsctp_handle_timers()
is called. usrsctp_handle_timers()
checks pending SCTP messages to be sent or resent and invokes theonSendSctpData()
static callback. Again: such a callback is invoked within thread 1.- Such a callback may be called multiple times and within its function body it checks the associated
StcpAssociation
to which the SCTP message should be sent. - Problem here is that such a
SctpAssociation
could belong to aWebRtcTransport
inWorker
2. In that case, it callssctpAssociation->OnUsrSctpSendSctpData(data, len)
from thread 1, but such aSctpAssociation
belongs to thread 2 (which runs the libuv loop 2). This is a violation of libuv design since no other threads should directly invoke libuv methods on another thread. And it's also a problem in mediasoup C++ code since now we have thread 1 and thread 2 invoking methods in parallel in mediasoup C++ class instances inWorker
2. Memory corruption or whatever can happen in this case.
So this is the problem.
Solution
- When the first
Worker
is created (and hence its thread 2)DepUsrSCTP::ClassInit()
its called and it must create a separate thread X, create a separate libuv loop X into it, callusrsctp_init_nothreads(onSendSctpData)
from that thread X and create aChecker
singleton in that thread X. - Later more
Workers
could be created (so we have thread 2, 3, etc). - When the first
SctpAssociation
is created in any of thoseWorkers
and it callsDepUsrSCTP::RegisterSctpAssociation()
, such a method callsDepUsrSCTP::checker->Start()
as usual, but it should useuv_async_send()
to make such aStart()
method run within the thread X (instead than running it within the callling thread 1 or 2 or whatever). - Eventually the static callback
onSendSctpData()
will be called by usrsctp (from different threads X, 1, 2, etc). - Within
onSendSctpData()
callback, we get the targetSctpAssociation
and from it we need to get its associated libuv loop, or perhaps create auv_async_t
handle within the previously calledDepUsrSCTP::RegisterSctpAssociation()
, and retrieve it among theSctpAssociation
instance itself. - And we should use
uv_async_send()
to callsctpAssociation->OnUsrSctpSendSctpData(data, len)
within the thread associated to thatuv_async_t
handle. This way, each call toSctpAssociation::OnUsrSctpSendSctpData()
is guaranteed to be called within the same thread in which such aSctpAssociation
lives. - When
DepUsrSCTP::ClassDestroy()
is called by the last remainingWorker
(soGlobalInstances
was 1), we must again useuv_async_send()
to invokeusrsctp_finish()
and callchecker->Stop()
and should also destroy/close/free the thread X. All those things will be created again if a newWorker
is created later.