-
Notifications
You must be signed in to change notification settings - Fork 104
Description
Hello Paul,
I have an Erlang cluster of 3 nodes.
A few seconds after startup my code calls GenServer.call({:via, :swarm, "echo-be"}, ...
where "echo-be"
process does not exist yet.
In 24% of new node startups, this leads to GenServer.call
hanging forever (actually inside Swarm.whereis_name
called internally), and Swarm never becomes functional on this node.
I'm using Swarm version 3.4.0.
Attached is a full :erlang.dbg
trace of the Swarm.Tracker
process where the hang happens:
repro.log
I would appreciate it if you investigate this issue and come up with a fix or a workaround. We were very close to adopting Swarm before this issue was discovered.
Please let me know if you need any additional information/traces. I can reliably repro this.
Thank you, Dmitry.
P.S. The 24% number was derived from 910 test runs, where only 691 were successful (no hang).