Skip to content

Unable to run Testground pex-convergence with 48 nodes #2

@aratz-lasa

Description

@aratz-lasa

Description

When running convergence test case in Testground with more than 48 nodes, the execution fails. There are times during the execution that nodes are not able to dial another peer.

Testground command: testground run single --plan=casm --testcase="pex-convergence" --runner=local:docker --builder=docker:go --instances=48

Error output is:

failed to dial QmbwgXDBnCvDC1XYP7SzfckdScYWFpQNhPUJQmEPgcQwsK:
  * [/ip4/192.18.0.41/tcp/46245] dial tcp4 192.18.0.41:46245: connect: network is unreachable
  * [/ip4/16.0.0.41/tcp/46245] dial tcp4 0.0.0.0:34309->16.0.0.41:46245: i/o timeout
  * [/ip4/127.0.0.1/tcp/46245] dial tcp4 0.0.0.0:34309->127.0.0.1:46245: i/o timeout

Ideas

The main idea is that the network and containers get overloaded. Convergence test case makes use of redis barriers in every iteration, in order to syncrhonize nodes. However, redis barriers are known to have a big overhead.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions