Unusually high number of inactive channels after upgrade to LND v0.19.0-beta #9870

MegalithicBTC · 2025-05-27T13:45:33Z

MegalithicBTC
May 27, 2025

In the last few days there have been some feedback from node-runners about v0.19.0-beta, where in this version some channels are showing is_active: false where in previous versions, the channel was reliably active.

We have seen this with two of our nodes which we have updated to v.0.19-0-beta, but trying to diagnose it has been tricky.

After first restarting with the new version, we found that a channel between two of our own nodes was down, despite both nodes living in the same datacenter. We solved this by restarting one of the nodes, and now that channel is up.

There are however persistent issues we have been unable to solve.... A few channels with "clearnet" nodes, where we can see (for example on amboss.space) that their other channels are up, but ours are is_active: false.

We've already set no-disconnect-on-pong-failure: true, in lnd.conf, but that didn't fix the issue.

We're now looking for a way to try to get logs from our impacted node to try to understand why the channels have not come back online.

So far, this is what we've tried

lncli debuglevel --level=CMGR=debug
lncli debuglevel --level=PEER=debug

Then...

lncli connect REDACTED@REDACTED:9735
{
    "status":  "connection to REDACTED@REDACTED:9735 initiated"
}

After doing this, nothing of interest appears in LND's log -- even with these subsystems set to debug, there is no mention of this peer's public key on the logs.

So right now we don't have much beyond this anecdotal "it's not working".... But I wanted to open this discussion to see if anyone else could provide more useful evidence about what might or might not be going wrong.... or suggest certain commands we could run to help debug this.

faket0shi · 2025-05-27T14:44:36Z

faket0shi
May 27, 2025

This has happened to me with two channels since the update to 0.19.0, with one of them the problem has been solved when his node has been restarted and I suppose that with the second peer it will be solved as soon as he restarts his node.

my log, message repeats every 3minutes

[INF] SRVR: Established connection to: 03c5528c628681aa17ab9e117aa3ee6f06c750dfb17df758ecabcd68f1567ad8c1@82.16.4.160:9735
[INF] SRVR: Finalizing connection to 03c5528c628681aa17ab9e117aa3ee6f06c750dfb17df758ecabcd68f1567ad8c1@82.16.4.160:9735, inbound=false
[WRN] SRVR: Starting peer=03c5528c628681aa17ab9e117aa3ee6f06c750dfb17df758ecabcd68f1567ad8c1 got error: unable to read init msg: read next header: read tcp 192.168.1.221:48294->82.16.4.160:9735: read: connection reset by peer
[INF] PEER: Peer(03c5528c628681aa17ab9e117aa3ee6f06c750dfb17df758ecabcd68f1567ad8c1): disconnecting 03c5528c628681aa17ab9e117aa3ee6f06c750dfb17df758ecabcd68f1567ad8c1@82.16.4.160:9735, reason: unable to start peer: unable to read init msg: read next header: read tcp 192.168.1.221:48294->82.16.4.160:9735: read: connection reset by peer
[INF] MSGX: Stopping Router
[WRN] SRVR: Already have 1 persistent connection requests for 03c5528c628681aa17ab9e117aa3ee6f06c750dfb17df758ecabcd68f1567ad8c1@[::1]:50456, connecting anyway.
[INF] SRVR: Established connection to: 03c5528c628681aa17ab9e117aa3ee6f06c750dfb17df758ecabcd68f1567ad8c1@82.16.4.160:9735
[INF] SRVR: Finalizing connection to 03c5528c628681aa17ab9e117aa3ee6f06c750dfb17df758ecabcd68f1567ad8c1@82.16.4.160:9735, inbound=false
[WRN] SRVR: Starting peer=03c5528c628681aa17ab9e117aa3ee6f06c750dfb17df758ecabcd68f1567ad8c1 got error: unable to read init msg: read next header: EOF
[INF] PEER: Peer(03c5528c628681aa17ab9e117aa3ee6f06c750dfb17df758ecabcd68f1567ad8c1): disconnecting 03c5528c628681aa17ab9e117aa3ee6f06c750dfb17df758ecabcd68f1567ad8c1@82.16.4.160:9735, reason: unable to start peer: unable to read init msg: read next header: EOF

peer log

[INF] PEER: Peer(037c65e34444c37deaccde1f61f03b93e22b3d7451894d836f60132ee5a6f486ed): unable to read message from peer: read next header: EOF
[INF] PEER: Peer(037c65e34444c37deaccde1f61f03b93e22b3d7451894d836f60132ee5a6f486ed): disconnecting 037c65e34444c37deaccde1f61f03b93e22b3d7451894d836f60132ee5a6f486ed@23.95.140.49:9735, reason: read handler closed

0 replies

Filouman · 2025-05-27T16:08:23Z

Filouman
May 27, 2025

My node has been on all 0.19.0-RCs and upgraded to release a few days ago without issue. The above mentioned issues only occurred after channel peers also upgraded to LND 0.19 – and those channels then remained as active: false until a restart of LND on my side allowed reconnection

Adding my logs:

Similar from my side as above

May 25 10:28:06 nodelou lnd[2215]: 2025-05-25 10:28:06.726 [INF] PEER: Peer(037c65e34444c37deaccde1f61f03b93e22b3d7451894d836f60132ee5a6f486ed): unable to read message from peer: read next body: read tcp 10.8.0.2:9735->10.8.0.1:49696: i/o timeout
May 25 10:28:06 nodelou lnd[2215]: 2025-05-25 10:28:06.726 [INF] PEER: Peer(037c65e34444c37deaccde1f61f03b93e22b3d7451894d836f60132ee5a6f486ed): disconnecting 037c65e34444c37deaccde1f61f03b93e22b3d7451894d836f60132ee5a6f486ed@10.8.0.1:49696, reason: read handler closed

This looks like the last time the channel connected before the downtime:

May 25 03:15:17 nodelou lnd[2215]: 2025-05-25 03:15:17.998 [INF] PEER: Peer(037c65e34444c37deaccde1f61f03b93e22b3d7451894d836f60132ee5a6f486ed): Loading ChannelPoint(9e799e201dca527da9f6dab07c36a8be403ad09a5f3cb69310fac3b278bfdd41:2), isPending=false
May 25 03:15:18 nodelou lnd[2215]: 2025-05-25 03:15:18.006 [INF] HSWC: ChannelLink(9e799e201dca527da9f6dab07c36a8be403ad09a5f3cb69310fac3b278bfdd41:2): starting
May 25 03:15:18 nodelou lnd[2215]: 2025-05-25 03:15:18.025 [INF] HSWC: ChannelLink(9e799e201dca527da9f6dab07c36a8be403ad09a5f3cb69310fac3b278bfdd41:2): HTLC manager started, bandwidth=2094883046 mSAT
May 25 03:15:18 nodelou lnd[2215]: 2025-05-25 03:15:18.025 [INF] HSWC: ChannelLink(9e799e201dca527da9f6dab07c36a8be403ad09a5f3cb69310fac3b278bfdd41:2): Attempting to re-synchronize channel: SCID=893400:3047:2, status=ChanStatusDefault, initiator=false, pending=false, local commitment has height=5332, local_htlc_index=296, local_log_index=2539, remote_htlc_index=2243, remote_log_index=2848, remote commitment has height=5333, local_htlc_index=296, local_log_index=2539, remote_htlc_index=2243, remote_log_index=2848
May 25 03:15:18 nodelou lnd[2215]: 2025-05-25 03:15:18.025 [INF] CNCT: Attempting to update ContractSignals for ChannelPoint(9e799e201dca527da9f6dab07c36a8be403ad09a5f3cb69310fac3b278bfdd41:2)
May 25 03:15:19 nodelou lnd[2215]: 2025-05-25 03:15:19.144 [INF] MSGX: MsgRouter: unregistering Endpoint(rbf_chan_closer(9e799e201dca527da9f6dab07c36a8be403ad09a5f3cb69310fac3b278bfdd41:2))
May 25 03:15:19 nodelou lnd[2215]: 2025-05-25 03:15:19.144 [INF] MSGX: MsgRouter: registering new Endpoint(rbf_chan_closer(9e799e201dca527da9f6dab07c36a8be403ad09a5f3cb69310fac3b278bfdd41:2))
May 25 03:15:19 nodelou lnd[2215]: 2025-05-25 03:15:19.145 [INF] HSWC: ChannelLink(9e799e201dca527da9f6dab07c36a8be403ad09a5f3cb69310fac3b278bfdd41:2): received re-establishment message from remote side
May 25 03:15:19 nodelou lnd[2215]: 2025-05-25 03:15:19.162 [INF] HSWC: ChannelLink(9e799e201dca527da9f6dab07c36a8be403ad09a5f3cb69310fac3b278bfdd41:2): sending 2 updates to synchronize the state
May 25 13:24:58 nodelou lnd[2215]: 2025-05-25 13:24:58.571 [ERR] HSWC: ChannelLink(9e799e201dca527da9f6dab07c36a8be403ad09a5f3cb69310fac3b278bfdd41:2): failing link: unable to complete dance with error: remote unresponsive
May 25 13:24:58 nodelou lnd[2215]: 2025-05-25 13:24:58.571 [INF] HSWC: ChannelLink(9e799e201dca527da9f6dab07c36a8be403ad09a5f3cb69310fac3b278bfdd41:2): exited

1 reply

Filouman May 28, 2025

My node has been on all 0.19.0-RCs and upgraded to release a few days ago without issue. The above mentioned issues only occurred after channel peers also upgraded to LND 0.19 – and those channels then remained as active: false until a restart of LND on my side allowed reconnection

Adding my logs:

Similar from my side as above

May 25 10:28:06 nodelou lnd[2215]: 2025-05-25 10:28:06.726 [INF] PEER: Peer(037c65e34444c37deaccde1f61f03b93e22b3d7451894d836f60132ee5a6f486ed): unable to read message from peer: read next body: read tcp 10.8.0.2:9735->10.8.0.1:49696: i/o timeout
May 25 10:28:06 nodelou lnd[2215]: 2025-05-25 10:28:06.726 [INF] PEER: Peer(037c65e34444c37deaccde1f61f03b93e22b3d7451894d836f60132ee5a6f486ed): disconnecting 037c65e34444c37deaccde1f61f03b93e22b3d7451894d836f60132ee5a6f486ed@10.8.0.1:49696, reason: read handler closed

This looks like the last time the channel connected before the downtime:

May 25 03:15:17 nodelou lnd[2215]: 2025-05-25 03:15:17.998 [INF] PEER: Peer(037c65e34444c37deaccde1f61f03b93e22b3d7451894d836f60132ee5a6f486ed): Loading ChannelPoint(9e799e201dca527da9f6dab07c36a8be403ad09a5f3cb69310fac3b278bfdd41:2), isPending=false
May 25 03:15:18 nodelou lnd[2215]: 2025-05-25 03:15:18.006 [INF] HSWC: ChannelLink(9e799e201dca527da9f6dab07c36a8be403ad09a5f3cb69310fac3b278bfdd41:2): starting
May 25 03:15:18 nodelou lnd[2215]: 2025-05-25 03:15:18.025 [INF] HSWC: ChannelLink(9e799e201dca527da9f6dab07c36a8be403ad09a5f3cb69310fac3b278bfdd41:2): HTLC manager started, bandwidth=2094883046 mSAT
May 25 03:15:18 nodelou lnd[2215]: 2025-05-25 03:15:18.025 [INF] HSWC: ChannelLink(9e799e201dca527da9f6dab07c36a8be403ad09a5f3cb69310fac3b278bfdd41:2): Attempting to re-synchronize channel: SCID=893400:3047:2, status=ChanStatusDefault, initiator=false, pending=false, local commitment has height=5332, local_htlc_index=296, local_log_index=2539, remote_htlc_index=2243, remote_log_index=2848, remote commitment has height=5333, local_htlc_index=296, local_log_index=2539, remote_htlc_index=2243, remote_log_index=2848
May 25 03:15:18 nodelou lnd[2215]: 2025-05-25 03:15:18.025 [INF] CNCT: Attempting to update ContractSignals for ChannelPoint(9e799e201dca527da9f6dab07c36a8be403ad09a5f3cb69310fac3b278bfdd41:2)
May 25 03:15:19 nodelou lnd[2215]: 2025-05-25 03:15:19.144 [INF] MSGX: MsgRouter: unregistering Endpoint(rbf_chan_closer(9e799e201dca527da9f6dab07c36a8be403ad09a5f3cb69310fac3b278bfdd41:2))
May 25 03:15:19 nodelou lnd[2215]: 2025-05-25 03:15:19.144 [INF] MSGX: MsgRouter: registering new Endpoint(rbf_chan_closer(9e799e201dca527da9f6dab07c36a8be403ad09a5f3cb69310fac3b278bfdd41:2))
May 25 03:15:19 nodelou lnd[2215]: 2025-05-25 03:15:19.145 [INF] HSWC: ChannelLink(9e799e201dca527da9f6dab07c36a8be403ad09a5f3cb69310fac3b278bfdd41:2): received re-establishment message from remote side
May 25 03:15:19 nodelou lnd[2215]: 2025-05-25 03:15:19.162 [INF] HSWC: ChannelLink(9e799e201dca527da9f6dab07c36a8be403ad09a5f3cb69310fac3b278bfdd41:2): sending 2 updates to synchronize the state
May 25 13:24:58 nodelou lnd[2215]: 2025-05-25 13:24:58.571 [ERR] HSWC: ChannelLink(9e799e201dca527da9f6dab07c36a8be403ad09a5f3cb69310fac3b278bfdd41:2): failing link: unable to complete dance with error: remote unresponsive
May 25 13:24:58 nodelou lnd[2215]: 2025-05-25 13:24:58.571 [INF] HSWC: ChannelLink(9e799e201dca527da9f6dab07c36a8be403ad09a5f3cb69310fac3b278bfdd41:2): exited

This channel stayed up for about a day, and at some point was offline again remained as active: false for over 12 hours again.

I have now removed protocol.rbf-coop-close=true from my lnd.conf and restarted LND again– The channel is now active again. I will monitor to see if this solves the issue or not.

MegalithicBTC · 2025-05-27T16:11:08Z

MegalithicBTC
May 27, 2025
Author

Interesting. The very surprising issue I saw on my end -- where two of my OWN nodes, in the SAME datacenter, showed channels between them as active: false.... these two nodes were BOTH running v.0.19.0-beta... and, interestingly, one of my nodes still on 0.18.x was not impacted and those channels never went down.... I wonder if this issue is only limited to situations where BOTH side of a channel have upgraded to 0.19....?

1 reply

Filouman May 27, 2025

Interesting. The very surprising issue I saw on my end -- where two of my OWN nodes, in the SAME datacenter, showed channels between them as active: false.... these two nodes were BOTH running v.0.19.0-beta... and, interestingly, one of my nodes still on 0.18.x was not impacted and those channels never went down.... I wonder if this issue is only limited to situations where BOTH side of a channel have upgraded to 0.19....?

Yes, exactly– That seems to be the case that simply updating to 0.19 does not cause the issue, but when a peer also updates, it can potentially cause this. However, imposrtant to note that it's not every channel either; some other peers have also updated to 0.19 since I have, and our channels have remained online and active.

MegalithicBTC · 2025-05-27T16:52:08Z

MegalithicBTC
May 27, 2025
Author

I'm just focusing now on two computers, in the same datacenter, both running LND 0.19-beta. Both have a clearnet URI which have been stable for many months. These two LND nodes have 5 (five) channels between them, which have also been very stable for months.

Checking just a few minutes ago, I'm finding the once again these 5 (five) channels are is_active: false. One of the nodes, despite CMGR=debug & PEER=debug, shows no mention of the other node in the logs.

However the other node DOES show some interesting logs, which DO mention the partner node:

2025-05-27 06:34:51.229 [INF] SRVR: Established connection to: 02a98c86ef366ce226aad6e7706959456e1701058915c3cbf527b37da143bb1441@146.190.169.210:9735
2025-05-27 06:34:51.229 [INF] SRVR: Finalizing connection to 02a98c86ef366ce226aad6e7706959456e1701058915c3cbf527b37da143bb1441@146.190.169.210:9735, inbound=false
2025-05-27 06:34:51.252 [WRN] SRVR: Starting peer=02a98c86ef366ce226aad6e7706959456e1701058915c3cbf527b37da143bb1441 got error: unable to read init msg: read next header: EOF
2025-05-27 06:34:51.252 [INF] PEER: Peer(02a98c86ef366ce226aad6e7706959456e1701058915c3cbf527b37da143bb1441): disconnecting 02a98c86ef366ce226aad6e7706959456e1701058915c3cbf527b37da143bb1441@146.190.169.210:9735, reason: unable to start peer: unable to read init msg: read next header: EOF
2025-05-27 07:37:55.202 [INF] SRVR: Established connection to: 02a98c86ef366ce226aad6e7706959456e1701058915c3cbf527b37da143bb1441@cv3gfm5psg7jm2usffttijfysymnaz5rzbtqto54vngkwhjp542mdzyd.onion:9735
2025-05-27 07:37:55.203 [INF] SRVR: Finalizing connection to 02a98c86ef366ce226aad6e7706959456e1701058915c3cbf527b37da143bb1441@cv3gfm5psg7jm2usffttijfysymnaz5rzbtqto54vngkwhjp542mdzyd.onion:9735, inbound=false
2025-05-27 07:37:55.886 [WRN] SRVR: Starting peer=02a98c86ef366ce226aad6e7706959456e1701058915c3cbf527b37da143bb1441 got error: unable to read init msg: read next header: EOF
2025-05-27 07:37:55.887 [INF] PEER: Peer(02a98c86ef366ce226aad6e7706959456e1701058915c3cbf527b37da143bb1441): disconnecting 02a98c86ef366ce226aad6e7706959456e1701058915c3cbf527b37da143bb1441@cv3gfm5psg7jm2usffttijfysymnaz5rzbtqto54vngkwhjp542mdzyd.onion:9735, reason: unable to start peer: unable to read init msg: read next header: EOF
2025-05-27 08:40:55.996 [INF] SRVR: Established connection to: 02a98c86ef366ce226aad6e7706959456e1701058915c3cbf527b37da143bb1441@146.190.169.210:9735
2025-05-27 08:40:55.996 [INF] SRVR: Finalizing connection to 02a98c86ef366ce226aad6e7706959456e1701058915c3cbf527b37da143bb1441@146.190.169.210:9735, inbound=false
2025-05-27 08:40:56.019 [WRN] SRVR: Starting peer=02a98c86ef366ce226aad6e7706959456e1701058915c3cbf527b37da143bb1441 got error: unable to read init msg: read next header: EOF
2025-05-27 08:40:56.019 [INF] PEER: Peer(02a98c86ef366ce226aad6e7706959456e1701058915c3cbf527b37da143bb1441): disconnecting 02a98c86ef366ce226aad6e7706959456e1701058915c3cbf527b37da143bb1441@146.190.169.210:9735, reason: unable to start peer: unable to read init msg: read next header: EOF
2025-05-27 09:43:56.084 [INF] SRVR: Established connection to: 02a98c86ef366ce226aad6e7706959456e1701058915c3cbf527b37da143bb1441@146.190.169.210:9735
2025-05-27 09:43:56.084 [INF] SRVR: Finalizing connection to 02a98c86ef366ce226aad6e7706959456e1701058915c3cbf527b37da143bb1441@146.190.169.210:9735, inbound=false
2025-05-27 09:43:56.108 [WRN] SRVR: Starting peer=02a98c86ef366ce226aad6e7706959456e1701058915c3cbf527b37da143bb1441 got error: unable to read init msg: read next header: EOF
2025-05-27 09:43:56.108 [INF] PEER: Peer(02a98c86ef366ce226aad6e7706959456e1701058915c3cbf527b37da143bb1441): disconnecting 02a98c86ef366ce226aad6e7706959456e1701058915c3cbf527b37da143bb1441@146.190.169.210:9735, reason: unable to start peer: unable to read init msg: read next header: EOF
2025-05-27 10:46:56.177 [INF] SRVR: Established connection to: 02a98c86ef366ce226aad6e7706959456e1701058915c3cbf527b37da143bb1441@146.190.169.210:9735
2025-05-27 10:46:56.177 [INF] SRVR: Finalizing connection to 02a98c86ef366ce226aad6e7706959456e1701058915c3cbf527b37da143bb1441@146.190.169.210:9735, inbound=false
2025-05-27 10:46:56.201 [WRN] SRVR: Starting peer=02a98c86ef366ce226aad6e7706959456e1701058915c3cbf527b37da143bb1441 got error: unable to read init msg: read next header: EOF
2025-05-27 10:46:56.201 [INF] PEER: Peer(02a98c86ef366ce226aad6e7706959456e1701058915c3cbf527b37da143bb1441): disconnecting 02a98c86ef366ce226aad6e7706959456e1701058915c3cbf527b37da143bb1441@146.190.169.210:9735, reason: unable to start peer: unable to read init msg: read next header: EOF
2025-05-27 11:49:56.266 [INF] SRVR: Established connection to: 02a98c86ef366ce226aad6e7706959456e1701058915c3cbf527b37da143bb1441@146.190.169.210:9735
2025-05-27 11:49:56.266 [INF] SRVR: Finalizing connection to 02a98c86ef366ce226aad6e7706959456e1701058915c3cbf527b37da143bb1441@146.190.169.210:9735, inbound=false
2025-05-27 11:49:56.289 [WRN] SRVR: Starting peer=02a98c86ef366ce226aad6e7706959456e1701058915c3cbf527b37da143bb1441 got error: unable to read init msg: read next header: EOF
2025-05-27 11:49:56.289 [INF] PEER: Peer(02a98c86ef366ce226aad6e7706959456e1701058915c3cbf527b37da143bb1441): disconnecting 02a98c86ef366ce226aad6e7706959456e1701058915c3cbf527b37da143bb1441@146.190.169.210:9735, reason: unable to start peer: unable to read init msg: read next header: EOF
2025-05-27 12:52:56.356 [INF] SRVR: Established connection to: 02a98c86ef366ce226aad6e7706959456e1701058915c3cbf527b37da143bb1441@146.190.169.210:9735
2025-05-27 12:52:56.356 [INF] SRVR: Finalizing connection to 02a98c86ef366ce226aad6e7706959456e1701058915c3cbf527b37da143bb1441@146.190.169.210:9735, inbound=false
2025-05-27 12:52:56.380 [WRN] SRVR: Starting peer=02a98c86ef366ce226aad6e7706959456e1701058915c3cbf527b37da143bb1441 got error: unable to read init msg: read next header: EOF
2025-05-27 12:52:56.380 [INF] PEER: Peer(02a98c86ef366ce226aad6e7706959456e1701058915c3cbf527b37da143bb1441): disconnecting 02a98c86ef366ce226aad6e7706959456e1701058915c3cbf527b37da143bb1441@146.190.169.210:9735, reason: unable to start peer: unable to read init msg: read next header: EOF
2025-05-27 13:55:56.447 [INF] SRVR: Established connection to: 02a98c86ef366ce226aad6e7706959456e1701058915c3cbf527b37da143bb1441@146.190.169.210:9735
2025-05-27 13:55:56.447 [INF] SRVR: Finalizing connection to 02a98c86ef366ce226aad6e7706959456e1701058915c3cbf527b37da143bb1441@146.190.169.210:9735, inbound=false
2025-05-27 13:55:56.470 [WRN] SRVR: Starting peer=02a98c86ef366ce226aad6e7706959456e1701058915c3cbf527b37da143bb1441 got error: unable to read init msg: read next header: EOF
2025-05-27 13:55:56.470 [INF] PEER: Peer(02a98c86ef366ce226aad6e7706959456e1701058915c3cbf527b37da143bb1441): disconnecting 02a98c86ef366ce226aad6e7706959456e1701058915c3cbf527b37da143bb1441@146.190.169.210:9735, reason: unable to start peer: unable to read init msg: read next header: EOF
2025-05-27 14:58:56.587 [INF] SRVR: Established connection to: 02a98c86ef366ce226aad6e7706959456e1701058915c3cbf527b37da143bb1441@146.190.169.210:9735
2025-05-27 14:58:56.587 [INF] SRVR: Finalizing connection to 02a98c86ef366ce226aad6e7706959456e1701058915c3cbf527b37da143bb1441@146.190.169.210:9735, inbound=false
2025-05-27 14:58:56.610 [WRN] SRVR: Starting peer=02a98c86ef366ce226aad6e7706959456e1701058915c3cbf527b37da143bb1441 got error: unable to read init msg: read next header: EOF
2025-05-27 14:58:56.610 [INF] PEER: Peer(02a98c86ef366ce226aad6e7706959456e1701058915c3cbf527b37da143bb1441): disconnecting 02a98c86ef366ce226aad6e7706959456e1701058915c3cbf527b37da143bb1441@146.190.169.210:9735, reason: unable to start peer: unable to read init msg: read next header: EOF
2025-05-27 16:01:56.681 [INF] SRVR: Established connection to: 02a98c86ef366ce226aad6e7706959456e1701058915c3cbf527b37da143bb1441@146.190.169.210:9735
2025-05-27 16:01:56.682 [INF] SRVR: Finalizing connection to 02a98c86ef366ce226aad6e7706959456e1701058915c3cbf527b37da143bb1441@146.190.169.210:9735, inbound=false
2025-05-27 16:01:56.705 [WRN] SRVR: Starting peer=02a98c86ef366ce226aad6e7706959456e1701058915c3cbf527b37da143bb1441 got error: unable to read init msg: read next header: EOF
2025-05-27 16:01:56.705 [INF] PEER: Peer(02a98c86ef366ce226aad6e7706959456e1701058915c3cbf527b37da143bb1441): disconnecting 02a98c86ef366ce226aad6e7706959456e1701058915c3cbf527b37da143bb1441@146.190.169.210:9735, reason: unable to start peer: unable to read init msg: read next header: EOF

1 reply

Roasbeef May 27, 2025
Maintainer

unable to read init msg: read next header: EOF

What do the logs of the other peer show?

EOF here just means they hung up the connection while we were trying to read.

In the other threads above, I see some connection timeouts, which means they didn't read anything for 5 minutes. The other instance with the "remote commitment dance" means that they never responded to our commit_sig message.

MegalithicBTC · 2025-05-27T17:42:03Z

MegalithicBTC
May 27, 2025
Author

One of the nodes, despite CMGR=debug & PEER=debug, showed no mention of the other node in the logs.

I have now restarted both the underlying machine and the LND & other docker containers, and the channels came back online, which is the same behavior I have been seeing for days.... the channels operate well at first, but then after some number of hours the channels go active: false.

This stuff is tricky to debug, so ..... I think we need some OTHER people to report similar problems (ideally in this thread) to really dig into this. If so far it's only myself and @Filouman reporting issues, it still could be something flaky with our networking or machines.

0 replies

Roasbeef · 2025-05-27T17:48:55Z

Roasbeef
May 27, 2025
Maintainer

the channels operate well at first, but then after some number of hours the channels go active: false.

For these channels, are you able to find any sort of errors on the HSWC sub-system?

A channel won't ever go to active if we don't get the channel reest message for it.

A channel may go from active to inactive, if we get some error at the link (channel) level, that may have us send an error/warning, but not disconnect.

0 replies

MegalithicBTC · 2025-05-27T17:51:50Z

MegalithicBTC
May 27, 2025
Author

OK, good, I will put HSWC=debug and then see if I can catch any useful logs when/if the channels go offline again.

0 replies

MegalithicBTC · 2025-05-27T18:11:11Z

MegalithicBTC
May 27, 2025
Author

@faket0shi are you aware of what node implementation your partner with the problem channel is running? It seems like you are running 0.19, but what about your partner node?

1 reply

faket0shi May 27, 2025

both run lnd 0.19.0, the weird thing is that I am using 0.19.0 since RC3 and the channel with @Filouman was always enabled until I upgraded to final version, at that point channel was disabled until he restarted his node.

ziggie1984 · 2025-05-27T18:40:31Z

ziggie1984
May 27, 2025
Collaborator

Seeing something similar on my node, one node sees the channel as active the other one as inactive trying to pin point the bug.

1 reply

Roasbeef May 27, 2025
Maintainer

If one side sends and receive the chan reest, but the other thinks they've only sent it, then you might see something like this.

MegalithicBTC · 2025-05-27T18:42:15Z

MegalithicBTC
May 27, 2025
Author

Here is some history of one of the "problem" channels, as seen from one side:

[{"created_at"=>Tue, 27 May 2025 17:45:08.110314000 UTC +00:00, "setting"=>"our_is_disabled", "old_value"=>"true", "new_value"=>"false"},
 {"created_at"=>Tue, 27 May 2025 17:27:33.287394000 UTC +00:00, "setting"=>"is_active", "old_value"=>"false", "new_value"=>"true"},
 {"created_at"=>Tue, 27 May 2025 05:07:33.756699000 UTC +00:00, "setting"=>"our_is_disabled", "old_value"=>"false", "new_value"=>"true"},
 {"created_at"=>Tue, 27 May 2025 04:47:17.070149000 UTC +00:00, "setting"=>"is_active", "old_value"=>"true", "new_value"=>"false"},
 {"created_at"=>Mon, 26 May 2025 14:25:04.276244000 UTC +00:00, "setting"=>"our_is_disabled", "old_value"=>"true", "new_value"=>"false"},
 {"created_at"=>Mon, 26 May 2025 14:04:47.575583000 UTC +00:00, "setting"=>"is_active", "old_value"=>"false", "new_value"=>"true"},
 {"created_at"=>Mon, 26 May 2025 13:40:28.075681000 UTC +00:00, "setting"=>"our_is_disabled", "old_value"=>"false", "new_value"=>"true"},
 {"created_at"=>Mon, 26 May 2025 13:19:12.062640000 UTC +00:00, "setting"=>"is_active", "old_value"=>"true", "new_value"=>"false"},
 {"created_at"=>Sun, 25 May 2025 03:00:38.353676000 UTC +00:00, "setting"=>"our_is_disabled", "old_value"=>"true", "new_value"=>"false"},
 {"created_at"=>Sun, 25 May 2025 02:40:30.741366000 UTC +00:00, "setting"=>"is_active", "old_value"=>"false", "new_value"=>"true"},
 {"created_at"=>Sat, 24 May 2025 14:30:39.113721000 UTC +00:00, "setting"=>"our_is_disabled", "old_value"=>"false", "new_value"=>"true"},
 {"created_at"=>Sat, 24 May 2025 14:10:24.514851000 UTC +00:00, "setting"=>"is_active", "old_value"=>"true", "new_value"=>"false"},

The channel is clearly going up and down a few times per day... and a similar channel with an 0.18.x node, in the same datacenter, doesn't show any intermittency like this.

2 replies

Roasbeef May 27, 2025
Maintainer

A disabled channel is related, but a different state from an active channel. A channel goes to disabled (marked as such on channel update on the gossip layer), once a channel inactive for enough time. Once a channel is back active, it needs to be active for a duration before we enable it again.

This is what determines if we mark a channel as active or not:

lnd/htlcswitch/link.go

Lines 669 to 704 in 3b2af90

    
           // EligibleToForward returns a bool indicating if the channel is able to 
        
           // actively accept requests to forward HTLC's. We're able to forward HTLC's if 
        
           // we are eligible to update AND the channel isn't currently flushing the 
        
           // outgoing half of the channel. 
        
           // 
        
           // NOTE: MUST NOT be called from the main event loop. 
        
           func (l *channelLink) EligibleToForward() bool { 
        
           	l.RLock() 
        
           	defer l.RUnlock() 
        
           	return l.eligibleToForward() 
        
           } 
        
           // eligibleToForward returns a bool indicating if the channel is able to 
        
           // actively accept requests to forward HTLC's. We're able to forward HTLC's if 
        
           // we are eligible to update AND the channel isn't currently flushing the 
        
           // outgoing half of the channel. 
        
           // 
        
           // NOTE: MUST be called from the main event loop. 
        
           func (l *channelLink) eligibleToForward() bool { 
        
           	return l.eligibleToUpdate() && !l.IsFlushing(Outgoing) 
        
           } 
        
           // eligibleToUpdate returns a bool indicating if the channel is able to update 
        
           // channel state. We're able to update channel state if we know the remote 
        
           // party's next revocation point. Otherwise, we can't initiate new channel 
        
           // state. We also require that the short channel ID not be the all-zero source 
        
           // ID, meaning that the channel has had its ID finalized. 
        
           // 
        
           // NOTE: MUST be called from the main event loop. 
        
           func (l *channelLink) eligibleToUpdate() bool { 
        
           	return l.channel.RemoteNextRevocation() != nil && 
        
           		l.channel.ShortChanID() != hop.Source && 
        
           		l.isReestablished() && 
        
           		l.quiescer.CanSendUpdates() 
        
           }

.

What I think might be happening here is:

If one side sends and receive the chan reest, but the other thinks they've only sent it, then you might see something like this (active for one, but not the other).

Roasbeef May 27, 2025
Maintainer

For the channels that are going back up, then down again -- do you see anything in the logs re a read/write timeout?

MegalithicBTC · 2025-05-27T20:42:12Z

MegalithicBTC
May 27, 2025
Author

@Roasbeef Please see log file attached, showings logs where the problem peer (which is my own node, in the same datacenter) is mentioned. I do see a lot of reason: unable to start peer: unable to read init msg: read next header: EOF
logs-with-small-channels.txt

4 replies

MegalithicBTC May 27, 2025
Author

Note that there are 5 parallell (redundant) channels between these two nodes, that's why you see Loading ChannelPoint(...) five times in a row.

Roasbeef May 27, 2025
Maintainer

Looks like you still have the pong disconnection active?

lnd.log.1290:2025-05-24 14:07:42.474 [INF] PEER: Peer(X): disconnecting 02a98c86ef366ce226aad6e7706959456e1701058915c3cbf527b37da143bb1441@X:9735, reason: pong failure: timeout while waiting for pong response
lnd.log.1290:2025-05-24 14:07:42.474 [INF] PEER: Peer(X): unable to read message from peer: read next header: read tcp 10.8.0.2:53792-X:9735: use of closed network connection
lnd.log.1290:2025-05-24 14:07:42.474 [WRN] PEER: Peer(X): pong response failure for 02a98c86ef366ce226aad6e7706959456e1701058915c3cbf527b37da143bb1441@X:9735: timeout while waiting for pong response. Time waited for this pong: 30.000722701s. Last successful RTT: 24.293662ms. -- disconnecting

Which version is this node on? The v0.18.x series does this pong disconnection, but doesn't have a flag to disable it. The v0.19 implements more conservative practices when it comes to buffering incoming messages off the wire.

Then Is see this with a Tor peer:

X@cv3gfm5psg7jm2usffttijfysymnaz5rzbtqto54vngkwhjp542mdzyd.onion:9735, inbound=false
lnd.log.1290:2025-05-24 14:08:02.520 [INF] PEER: Peer(X): disconnecting X@cv3gfm5psg7jm2usffttijfysymnaz5rzbtqto54vngkwhjp542mdzyd.onion:9735, reason: unable to start peer: peer did not complete handshake within 15s
lnd.log.1290:2025-05-24 14:08:02.520 [WRN] SRVR: Starting peer=X got error: peer did not complete handshake within 15s

If the peer doesn't send an init message within 15 seconds, we'll trigger a disconenct as you see above. Perhaps this was just a very slow Tor peer?

From there I see the EOF messages. Do you have the logs from the PoV of that 02a98c86ef3 node? We'd be able to see why it's disconnecting.

One thing to note is that v0.19 will start to throttle its own outbound gossip messages. By default this is set to 100 KB/s, this can be tuned with these new flags:

lnd/sample-lnd.conf

Lines 1754 to 1763 in 3b2af90

    
           ; The allotted bandwidth rate expressed in bytes/second that will be allocated 
        
           ; towards outbound gossip messages. Realized rates above this value will be  
        
           ; throttled. This value is shared across all peers. 
        
           ; gossip.msg-rate-bytes=102400 
        
           ; The amount of bytes of gossip messages that can be sent at a given time. This 
        
           ; is used as the amount of tokens in the token bucket algorithm. This value 
        
           ; MUST be set to something about 65 KB, otherwise a single max sized message 
        
           ; can never be sent. 
        
           ; gossip.msg-burst-bytes=204800

ziggie1984 May 28, 2025
Collaborator

could you provide also the other node's log corresponding to the log file above ?

ziggie1984 May 28, 2025
Collaborator

can you also set SRVR=debug, ACSM=debug log ?

ziggie1984 · 2025-05-27T22:15:38Z

ziggie1984
May 27, 2025
Collaborator

@faket0shi @MegalithicBTC do you have the setting

protocol.rbf-coop-close=true

active ?

2 replies

faket0shi May 27, 2025

yes, it is active

ziggie1984 May 28, 2025
Collaborator

ok I think your problem will be solved with this one: #9872

MegalithicBTC · 2025-05-27T22:42:56Z

MegalithicBTC
May 27, 2025
Author

@ziggie1984 I have not set rbf-coop-close on either of these 0.19 nodes
@Roasbeef
-- Yes, you are right, on one of the nodes, the one I provided logs from, pong disconnection WAS still active...
-- My guess is that it was trying TOR because the clearnet connection to the same node failed?
-- I don't have useful logs from the other side

I've now restarted both nodes with..

gossip.msg-rate-bytes=12500000
no-disconnect-on-pong-failure=true
maxlogfiles=50
maxlogfilesize=20

So this should capture sufficient logs that I could see useful logs from both sides next time this happens.

thanks

4 replies

Roasbeef May 27, 2025
Maintainer

One other thing in v0.19 is that by default, we'll now limit the number of non-channel peers we maintain. Similar to the maxconns setting for bitcoind. The default is 30 peers:

lnd/sample-lnd.conf

Lines 569 to 570 in 3b2af90

    
           ; The number of restricted slots the server will allocate for peers. 
        
           ; num-restricted-slots=30

This might explain some of the EOF's you're seeing, as during the encrypted transport handshake, we'll decide if we even want to accept the peer or not (after they send their pubkey).

MegalithicBTC May 27, 2025
Author

Thanks.... My reading of the code (which could be wrong) is that in the case that I have a long-established channel with a peer, then LND should never use the maxRestrictedSlots config variable when reasoning about connections with that peer. That peer should get protected access here ..

lnd/accessman.go

Line 118 in 93a6ab8

if count.HasOpenOrClosedChan {

... and NEVER have the opportunity to get a restricted slot ( here https://github.com/lightningnetwork/lnd/blob/93a6ab8759391ff70ba6a98196973d17df546667/accessman.go#L138C31-L138C49 ) -- am I wrong about this?

Roasbeef May 27, 2025
Maintainer

So if we currently have, or have had a channel in the past with a peer, they'll go to protected. Protected is always allowed, restricted peers eat into that slot limit.

You're right that this shouldn't affect peers with a long established channel with you.

I mentioned that as some of the logs might be peers w/o any channels at all trying to connect, in which case we'll disconnect them. You can look at this log level on debug or higher for more information: ACSM.

MegalithicBTC May 27, 2025
Author

Yes. Thanks for that heads up. ACSM logging is totally critical for my use case and I was unware of its existence, it's not yet listed here: https://docs.lightning.engineering/lightning-network-tools/lnd/debugging_lnd

MegalithicBTC · 2025-05-28T22:17:34Z

MegalithicBTC
May 28, 2025
Author

This pretty interesting, my experience very closely matches @Filouman ...

My node ....upgraded to release a few days ago without issue.

exact same experience here

The above mentioned issues only occurred after channel peers also upgraded to LND 0.19 – and those channels then remained as active: false until a restart of LND on my side allowed reconnection

exact same experience, it was only when I had ANOTHER node ALSO upgraded to 0.19 that I saw the problem...

This channel stayed up for about a day, and at some point was offline again remained as active: false for over 12 hours again.

Exact same experience... the channel went offline again

3 replies

yyforyongyu May 29, 2025
Collaborator

Could you share some debug logs? Just DM'ed you on tg

MegalithicBTC May 29, 2025
Author

Hi, best logs I could come up with are already in this discussion.

yyforyongyu May 29, 2025
Collaborator

no debug logs?

MegalithicBTC · 2025-07-04T12:31:57Z

MegalithicBTC
Jul 4, 2025
Author

We now have no-disconnect-on-pong-failure=false -- the default setting -- on all of our nodes, ... And we are seeing more persistently offline channels than we did under 0.18.x ....but I think actually this is a feature and not a bug -- I think the idea is that LND will "give up" on constantly trying to connect when a pong is not received. I think this is the intended behavior. One interesting question is this -- what if both nodes -- or, indeed, many nodes on the network, all have no-disconnect-on-pong-failure=false -- there could be a buildup of offline channels across the entire network, right? Because many nodes will "give up" and not try to reconnect until a restart?

0 replies

Unusually high number of inactive channels after upgrade to LND v0.19.0-beta #9870

Uh oh!

MegalithicBTC May 27, 2025

Replies: 15 comments · 20 replies

Uh oh!

Uh oh!

faket0shi May 27, 2025

Uh oh!

Filouman May 27, 2025

Uh oh!

Filouman May 28, 2025

Uh oh!

MegalithicBTC May 27, 2025 Author

Uh oh!

Filouman May 27, 2025

Uh oh!

Uh oh!

MegalithicBTC May 27, 2025 Author

Uh oh!

Roasbeef May 27, 2025 Maintainer

Uh oh!

Uh oh!

MegalithicBTC May 27, 2025 Author

Uh oh!

Roasbeef May 27, 2025 Maintainer

Uh oh!

MegalithicBTC May 27, 2025 Author

Uh oh!

MegalithicBTC May 27, 2025 Author

Uh oh!

faket0shi May 27, 2025

Uh oh!

ziggie1984 May 27, 2025 Collaborator

Uh oh!

Roasbeef May 27, 2025 Maintainer

Uh oh!

Uh oh!

MegalithicBTC May 27, 2025 Author

Uh oh!

Roasbeef May 27, 2025 Maintainer

Uh oh!

Roasbeef May 27, 2025 Maintainer

Uh oh!

MegalithicBTC May 27, 2025 Author

Uh oh!

Uh oh!

MegalithicBTC May 27, 2025 Author

Uh oh!

Roasbeef May 27, 2025 Maintainer

Uh oh!

ziggie1984 May 28, 2025 Collaborator

Uh oh!

ziggie1984 May 28, 2025 Collaborator

Uh oh!

ziggie1984 May 27, 2025 Collaborator

Uh oh!

faket0shi May 27, 2025

Uh oh!

ziggie1984 May 28, 2025 Collaborator

Uh oh!

Uh oh!

MegalithicBTC May 27, 2025 Author

Uh oh!

Roasbeef May 27, 2025 Maintainer

Uh oh!

MegalithicBTC May 27, 2025 Author

Uh oh!

MegalithicBTC
May 27, 2025

Replies: 15 comments 20 replies

faket0shi
May 27, 2025

Filouman
May 27, 2025

MegalithicBTC
May 27, 2025
Author

MegalithicBTC
May 27, 2025
Author

Roasbeef May 27, 2025
Maintainer

MegalithicBTC
May 27, 2025
Author

Roasbeef
May 27, 2025
Maintainer

MegalithicBTC
May 27, 2025
Author

MegalithicBTC
May 27, 2025
Author

ziggie1984
May 27, 2025
Collaborator

Roasbeef May 27, 2025
Maintainer

MegalithicBTC
May 27, 2025
Author

Roasbeef May 27, 2025
Maintainer

Roasbeef May 27, 2025
Maintainer

MegalithicBTC
May 27, 2025
Author

MegalithicBTC May 27, 2025
Author

Roasbeef May 27, 2025
Maintainer

ziggie1984 May 28, 2025
Collaborator

ziggie1984 May 28, 2025
Collaborator

ziggie1984
May 27, 2025
Collaborator

ziggie1984 May 28, 2025
Collaborator

MegalithicBTC
May 27, 2025
Author

Roasbeef May 27, 2025
Maintainer

MegalithicBTC May 27, 2025
Author