-
Notifications
You must be signed in to change notification settings - Fork 132
Open
Labels
A-jormungandrArea: Issues affecting jörmungandrArea: Issues affecting jörmungandrPriority - HighbugSomething isn't workingSomething isn't workingsubsys-threadingcode related issue with threadingcode related issue with threading

Description
Describe the bug
My node doesn't run longer than 2-3 hours stable. I have tried tweaking my node-config to get an optimal result, I tried different configurations in order to solve this, but I don't think it's the node-config.
Mandatory Information
jcli --full-version
jcli 0.8.13 (HEAD-241b3a59, release, linux [x86_64]) - [rustc 1.41.0 (5e1a79984 2020-01-27)];jormungandr --full-version
jormungandr 0.8.13 (HEAD-241b3a59, release, linux [x86_64]) - [rustc 1.41.0 (5e1a79984 2020-01-27)];
To Reproduce
Steps to reproduce the behavior:
- When the process of
jormungandr
has stopped, I'll run it withstart_leader
with the following configuration.
log:
- output: stderr
format: plain
level: info
http_fetch_block0_service:
- "https://github.com/input-output-hk/jormungandr-block0/raw/master/data/"
skip_bootstrap: false
bootstrap_from_trusted_peers: false
p2p:
listen_address: "/ip4/0.0.0.0/tcp/17900"
public_address: "/ip4/x.x.x.x/tcp/17900"
topics_of_interest:
blocks: high
messages: high
max_connections: 600
max_bootstrap_attempts: 3
max_unreachable_nodes_to_connect_per_event: 18
gossip_interval: 5s
policy:
quarantine_duration: 10m
trusted_peers:
- address: "/ip4/3.231.168.222/tcp/3000"
id: ff2aaaac6cab77d3fb72bf3cb9079246eca323c60b2fd68a
- address: "/ip4/13.56.0.226/tcp/3000"
id: 7ddf203c86a012e8863ef19d96aabba23d2445c492d86267
- address: "/ip4/52.28.91.178/tcp/3000"
id: 23b3ca09c644fe8098f64c24d75d9f79c8e058642e63a28c
- address: "/ip4/3.125.75.156/tcp/3000"
id: 22fb117f9f72f38b21bca5c0f069766c0d4327925d967791
- address: "/ip4/13.112.181.42/tcp/3000"
id: 52762c49a84699d43c96fdfe6de18079fb2512077d6aa5bc
- address: "/ip4/13.114.196.228/tcp/3000"
id: 7e1020c2e2107a849a8353876d047085f475c9bc646e42e9
- address: "/ip4/52.8.15.52/tcp/3000"
id: 18bf81a75e5b15a49b843a66f61602e14d4261fb5595b5f5
- address: "/ip4/52.9.132.248/tcp/3000"
id: 671a9e7a5c739532668511bea823f0f5c5557c99b813456c
- address: "/ip4/3.125.183.71/tcp/3000"
id: 9d15a9e2f1336c7acda8ced34e929f697dc24ea0910c3e67
- address: "/ip4/18.184.35.137/tcp/3000"
id: 06aa98b0ab6589f464d08911717115ef354161f0dc727858
- address: "/ip4/18.182.115.51/tcp/3000"
id: 8529e334a39a5b6033b698be2040b1089d8f67e0102e2575
- address: "/ip4/3.115.154.161/tcp/3000"
id: 35bead7d45b3b8bda5e74aa12126d871069e7617b7f4fe62
- address: "/ip4/18.177.78.96/tcp/3000"
id: fc89bff08ec4e054b4f03106f5312834abdf2fcb444610e9
- address: "/ip4/52.9.77.197/tcp/3000"
id: fcdf302895236d012635052725a0cdfc2e8ee394a1935b63
- address: "/ip4/54.183.149.167/tcp/3000"
id: df02383863ae5e14fea5d51a092585da34e689a73f704613
- address: "/ip4/3.124.116.145/tcp/3000"
id: 99cb10f53185fbef110472d45a36082905ee12df8a049b74
rest:
listen: "127.0.0.1:3100"
storage: /home/ada/storage
explorer:
enabled: false
mempool:
pool_max_entries: 10000
log_max_entries: 100000
leadership:
logs_capacity: 4096
- The node runs for some time and get's stuck very quickly and shoots up the amount of max_connections, that is usally a good indication that it's not registering any blocks.
- In the logs I see the following stuck-notifier messages:
Mar 05 16:02:43.091 WARN blockchain is not moving up, system-date=82.37473, the last tip c19b40e7-000426db-82.24939 was 25068 seconds ago, task: stuck_notifier
Mar 05 16:03:43.093 WARN blockchain is not moving up, system-date=82.37503, the last tip c19b40e7-000426db-82.24939 was 25128 seconds ago, task: stuck_notifier
Mar 05 16:04:43.093 WARN blockchain is not moving up, system-date=82.37533, the last tip c19b40e7-000426db-82.24939 was 25188 seconds ago, task: stuck_notifier
Mar 05 16:05:43.091 WARN blockchain is not moving up, system-date=82.37563, the last tip c19b40e7-000426db-82.24939 was 25248 seconds ago, task: stuck_notifier
Mar 05 16:06:43.092 WARN blockchain is not moving up, system-date=82.37593, the last tip c19b40e7-000426db-82.24939 was 25308 seconds ago, task: stuck_notifier
Mar 05 16:07:43.092 WARN blockchain is not moving up, system-date=82.37623, the last tip c19b40e7-000426db-82.24939 was 25368 seconds ago, task: stuck_notifier
Mar 05 16:08:43.093 WARN blockchain is not moving up, system-date=82.37653, the last tip c19b40e7-000426db-82.24939 was 25428 seconds ago, task: stuck_notifier
- With a restart-script that checks if the node is stuck, that process also get clogged as sometimes it stops at bootstrapping process.
- As a result:
Has this node been scheduled to be leader?
---
- created_at_time: "2020-03-05T07:33:47.290214282+00:00"
enclave_leader_id: 1
finished_at_time: "2020-03-05T12:50:41.000850225+00:00"
scheduled_at_date: "82.31711"
scheduled_at_time: "2020-03-05T12:50:39+00:00"
status:
Rejected:
reason: Failed to compute the schedule within time boundaries
wake_at_time: "2020-03-05T12:50:39.001462736+00:00"
Expected behavior
A stable node as a leader, running on 4cpu, 16gb machine with ubuntu server and doesn't do anything else.
bobdobs
Metadata
Metadata
Assignees
Labels
A-jormungandrArea: Issues affecting jörmungandrArea: Issues affecting jörmungandrPriority - HighbugSomething isn't workingSomething isn't workingsubsys-threadingcode related issue with threadingcode related issue with threading