Node version v0.8.13 as leader gets stuck every 2-3 hours after restart 

**Describe the bug**
My node doesn't run longer than 2-3 hours stable. I have tried tweaking my node-config to get an optimal result, I tried different configurations in order to solve this, but I don't think it's the node-config. 

**Mandatory Information**
1. `jcli --full-version` jcli 0.8.13 (HEAD-241b3a59, release, linux [x86_64]) - [rustc 1.41.0 (5e1a79984 2020-01-27)];
2. `jormungandr --full-version` jormungandr 0.8.13 (HEAD-241b3a59, release, linux [x86_64]) - [rustc 1.41.0 (5e1a79984 2020-01-27)];

**To Reproduce**
Steps to reproduce the behavior:
1. When the process of `jormungandr` has stopped, I'll run it with `start_leader` with the following configuration. 
```
log:
- output: stderr
  format: plain
  level: info
http_fetch_block0_service:
  - "https://github.com/input-output-hk/jormungandr-block0/raw/master/data/"
skip_bootstrap: false
bootstrap_from_trusted_peers: false
p2p:
  listen_address: "/ip4/0.0.0.0/tcp/17900"
  public_address: "/ip4/x.x.x.x/tcp/17900"
  topics_of_interest:
    blocks: high
    messages: high
  max_connections: 600
  max_bootstrap_attempts: 3
  max_unreachable_nodes_to_connect_per_event: 18
  gossip_interval: 5s
  policy:
    quarantine_duration: 10m
  trusted_peers:
    - address: "/ip4/3.231.168.222/tcp/3000"
      id: ff2aaaac6cab77d3fb72bf3cb9079246eca323c60b2fd68a
    - address: "/ip4/13.56.0.226/tcp/3000"
      id: 7ddf203c86a012e8863ef19d96aabba23d2445c492d86267
    - address: "/ip4/52.28.91.178/tcp/3000"
      id: 23b3ca09c644fe8098f64c24d75d9f79c8e058642e63a28c
    - address: "/ip4/3.125.75.156/tcp/3000"
      id: 22fb117f9f72f38b21bca5c0f069766c0d4327925d967791
    - address: "/ip4/13.112.181.42/tcp/3000"
      id: 52762c49a84699d43c96fdfe6de18079fb2512077d6aa5bc
    - address: "/ip4/13.114.196.228/tcp/3000"
      id: 7e1020c2e2107a849a8353876d047085f475c9bc646e42e9
    - address: "/ip4/52.8.15.52/tcp/3000"
      id: 18bf81a75e5b15a49b843a66f61602e14d4261fb5595b5f5
    - address: "/ip4/52.9.132.248/tcp/3000"
      id: 671a9e7a5c739532668511bea823f0f5c5557c99b813456c
    - address: "/ip4/3.125.183.71/tcp/3000"
      id: 9d15a9e2f1336c7acda8ced34e929f697dc24ea0910c3e67
    - address: "/ip4/18.184.35.137/tcp/3000"
      id: 06aa98b0ab6589f464d08911717115ef354161f0dc727858
    - address: "/ip4/18.182.115.51/tcp/3000"
      id: 8529e334a39a5b6033b698be2040b1089d8f67e0102e2575
    - address: "/ip4/3.115.154.161/tcp/3000"
      id: 35bead7d45b3b8bda5e74aa12126d871069e7617b7f4fe62
    - address: "/ip4/18.177.78.96/tcp/3000"
      id: fc89bff08ec4e054b4f03106f5312834abdf2fcb444610e9
    - address: "/ip4/52.9.77.197/tcp/3000"
      id: fcdf302895236d012635052725a0cdfc2e8ee394a1935b63
    - address: "/ip4/54.183.149.167/tcp/3000"
      id: df02383863ae5e14fea5d51a092585da34e689a73f704613
    - address: "/ip4/3.124.116.145/tcp/3000"
      id: 99cb10f53185fbef110472d45a36082905ee12df8a049b74
rest:
  listen: "127.0.0.1:3100"
storage: /home/ada/storage
explorer:
  enabled: false
mempool:
    pool_max_entries: 10000
    log_max_entries: 100000
leadership:
    logs_capacity: 4096
```
2. The node runs for some time and get's stuck very quickly and shoots up the amount of max_connections, that is usally a good indication that it's not registering any blocks. 
3. In the logs I see the following stuck-notifier messages:
```
Mar 05 16:02:43.091 WARN blockchain is not moving up, system-date=82.37473, the last tip c19b40e7-000426db-82.24939 was 25068 seconds ago, task: stuck_notifier
Mar 05 16:03:43.093 WARN blockchain is not moving up, system-date=82.37503, the last tip c19b40e7-000426db-82.24939 was 25128 seconds ago, task: stuck_notifier
Mar 05 16:04:43.093 WARN blockchain is not moving up, system-date=82.37533, the last tip c19b40e7-000426db-82.24939 was 25188 seconds ago, task: stuck_notifier
Mar 05 16:05:43.091 WARN blockchain is not moving up, system-date=82.37563, the last tip c19b40e7-000426db-82.24939 was 25248 seconds ago, task: stuck_notifier
Mar 05 16:06:43.092 WARN blockchain is not moving up, system-date=82.37593, the last tip c19b40e7-000426db-82.24939 was 25308 seconds ago, task: stuck_notifier
Mar 05 16:07:43.092 WARN blockchain is not moving up, system-date=82.37623, the last tip c19b40e7-000426db-82.24939 was 25368 seconds ago, task: stuck_notifier
Mar 05 16:08:43.093 WARN blockchain is not moving up, system-date=82.37653, the last tip c19b40e7-000426db-82.24939 was 25428 seconds ago, task: stuck_notifier
```
4. With a restart-script that checks if the node is stuck, that process also get clogged as sometimes it stops at bootstrapping process. 
5. As a result: 
```
Has this node been scheduled to be leader?
---
- created_at_time: "2020-03-05T07:33:47.290214282+00:00"
  enclave_leader_id: 1
  finished_at_time: "2020-03-05T12:50:41.000850225+00:00"
  scheduled_at_date: "82.31711"
  scheduled_at_time: "2020-03-05T12:50:39+00:00"
  status:
    Rejected:
      reason: Failed to compute the schedule within time boundaries
  wake_at_time: "2020-03-05T12:50:39.001462736+00:00"
```

**Expected behavior**
A stable node as a leader, running on 4cpu, 16gb machine with ubuntu server and doesn't do anything else. 




Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Node version v0.8.13 as leader gets stuck every 2-3 hours after restart #1887

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Node version v0.8.13 as leader gets stuck every 2-3 hours after restart #1887

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions