-
Notifications
You must be signed in to change notification settings - Fork 1
Open
Labels
bugSomething isn't workingSomething isn't working
Description
Description
When running convergence test case in Testground with more than 48 nodes, the execution fails. There are times during the execution that nodes are not able to dial another peer.
Testground command: testground run single --plan=casm --testcase="pex-convergence" --runner=local:docker --builder=docker:go --instances=48
Error output is:
failed to dial QmbwgXDBnCvDC1XYP7SzfckdScYWFpQNhPUJQmEPgcQwsK:
* [/ip4/192.18.0.41/tcp/46245] dial tcp4 192.18.0.41:46245: connect: network is unreachable
* [/ip4/16.0.0.41/tcp/46245] dial tcp4 0.0.0.0:34309->16.0.0.41:46245: i/o timeout
* [/ip4/127.0.0.1/tcp/46245] dial tcp4 0.0.0.0:34309->127.0.0.1:46245: i/o timeout
Ideas
The main idea is that the network and containers get overloaded. Convergence test case makes use of redis barriers in every iteration, in order to syncrhonize nodes. However, redis barriers are known to have a big overhead.
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working