-
Notifications
You must be signed in to change notification settings - Fork 26
Description
Current Behavior:
RGS randomly hangs after sometime.
Expected Behavior:
RGS does not hang if the database is slow.
Steps To Reproduce:
- Build a version with 2 worker threads (or run on a machine with two cores)
#[tokio::main(flavor = "multi_thread", worker_threads = 2)]
- Run on testnet with
LN_PEERS=02eadbd9e7557375161df8b646776a547c5cbc2e95b3071ec81553f8ec2cea3b8c@18.191.253.246:9735,03bae2db4b57738c1ec1ffa1c5e5a4423968cc592b3b39cddf7d495e72919d6431@18.202.91.172:9735,038863cf8ab91046230f561cd5b386cbff8309fa02e3f0c3ed161a3aeb64a643b9@203.132.94.196:9735
(more peers — higher the changes to hangs)
What Happens
The problem is with mpsc::channel(100)
for GossipMessage
. There is a task which receives elements in GossipPersister
and tasks which send elements in GossipRouter
.
GossipRouter
uses try_send()
method to send without blocking the thread, but if it fails (when the channel is full) it uses blocking send()
, what blocks the thread. But this thread is a tokio executor thread, if all tokio executor threads get blocked in GossipRouter::new_channel_announcement()
or GossipRouter::new_channel_update()
then the task to receive elements from the channel will never be executed. Deadlock.
Note that there is a code in GossipRouter
which tries to minimize the risk of deadlock, but unfortunate it does not eliminate it:
tokio::task::block_in_place(move || { tokio::runtime::Handle::current().block_on(async move {
self.sender.send(gossip_message).await.unwrap();
})});