Skip to content

RGS randomly hangs after sometime #32

@andrei-21

Description

@andrei-21

Current Behavior:

RGS randomly hangs after sometime.

Expected Behavior:

RGS does not hang if the database is slow.

Steps To Reproduce:

  1. Build a version with 2 worker threads (or run on a machine with two cores)
    #[tokio::main(flavor = "multi_thread", worker_threads = 2)]
  2. Run on testnet with
LN_PEERS=02eadbd9e7557375161df8b646776a547c5cbc2e95b3071ec81553f8ec2cea3b8c@18.191.253.246:9735,03bae2db4b57738c1ec1ffa1c5e5a4423968cc592b3b39cddf7d495e72919d6431@18.202.91.172:9735,038863cf8ab91046230f561cd5b386cbff8309fa02e3f0c3ed161a3aeb64a643b9@203.132.94.196:9735

(more peers — higher the changes to hangs)

What Happens

The problem is with mpsc::channel(100) for GossipMessage. There is a task which receives elements in GossipPersister and tasks which send elements in GossipRouter.
GossipRouter uses try_send() method to send without blocking the thread, but if it fails (when the channel is full) it uses blocking send(), what blocks the thread. But this thread is a tokio executor thread, if all tokio executor threads get blocked in GossipRouter::new_channel_announcement() or GossipRouter::new_channel_update() then the task to receive elements from the channel will never be executed. Deadlock.

Note that there is a code in GossipRouter which tries to minimize the risk of deadlock, but unfortunate it does not eliminate it:

tokio::task::block_in_place(move || { tokio::runtime::Handle::current().block_on(async move {
				self.sender.send(gossip_message).await.unwrap();
			})});

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions