Skip to content

Conversation

madelinevibes
Copy link
Collaborator

Release 25.09.1

rustyrussell and others added 30 commits October 13, 2025 01:33
…migration.

When we migrate from accounts.db, we use the `account_nonchannel_id`
field.  But we can replay the block chain and the channel involved is
still open, we will use the `account_channel_id` field, and our duplicate
detection fails.

As a result, we can end up with duplicate entries in the database, which
make accounting incorrect.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Changelog-Fixed: JSON-RPC: `listchainmoves` could contain bogus duplicate entries after 25.09 bookkeeper migration.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This is based on a real database, which values changed.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This can happen with other subdaemons too, on ZFS on Linux:

```
2025-09-24T13:51:22.703Z **BROKEN** connectd: Bad checksum on gossmap record @9850670/9851114 should be 3379961343 (01009411e26cd56d68aabc285ee1c8ee43d59be6f939b0ce353d80213918680a7438356b9c5ea6bb001a6bb37a4dea93776f4abc8cd371525b4d1605a74b89d7cb1bfc8865ddf22288c7ea08b9d98b34155b4aed159eb81732957e6bf79b996752bf2a9995aaead1d65e7889e826ea0ba42f7746c176fe12f2fe6c04af1a74b4f0a262d20efd57133eb32693c789eb3f09caf4f4c6ecd2f734b3b36e751ffcc2748c58feabce4173c4ce6098a2c5397aabf1be5442cb67b5030be11ebd8b9841838dae127fe30000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
```

Reported-by: @grubles
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We might have not read the final entry.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
It only gets called for diagnostics when something goes wrong (and we
were going to exit anyway), and it's only useful with mmap (which we now disable
on error) but it shouldn't crash:

```
**BROKEN** gossipd: Truncated gossmap record @7991501/7991523 (len 0): waiting
**BROKEN** gossipd: FATAL SIGNAL 6 (version v25.09)                                            
**BROKEN** gossipd: backtrace: common/daemon.c:41 (send_backtrace) 0x6506817cc529
**BROKEN** gossipd: backtrace: common/daemon.c:78 (crashdump) 0x6506817cc578
**BROKEN** gossipd: backtrace: ./signal/../sysdeps/unix/sysv/linux/x86_64/libc_sigaction.c:0 ((null)) 0x75e8267a032f
**BROKEN** gossipd: backtrace: ./nptl/pthread_kill.c:44 (__pthread_kill_implementation) 0x75e8267f9b2c
**BROKEN** gossipd: backtrace: ./nptl/pthread_kill.c:78 (__pthread_kill_internal) 0x75e8267f9b2c
**BROKEN** gossipd: backtrace: ./nptl/pthread_kill.c:89 (__GI___pthread_kill) 0x75e8267f9b2c
**BROKEN** gossipd: backtrace: ../sysdeps/posix/raise.c:26 (__GI_raise) 0x75e8267a027d
**BROKEN** gossipd: backtrace: ./stdlib/abort.c:79 (__GI_abort) 0x75e8267838fe
**BROKEN** gossipd: backtrace: ./assert/assert.c:96 (__assert_fail_base) 0x75e82678381a
**BROKEN** gossipd: backtrace: ./assert/assert.c:105 (__assert_fail) 0x75e826796516
**BROKEN** gossipd: backtrace: common/gossmap.c:111 (map_copy) 0x6506817cea77
**BROKEN** gossipd: backtrace: common/gossmap.c:1870 (gossmap_fetch_tail) 0x6506817d1f93
**BROKEN** gossipd: backtrace: gossipd/gossmap_manage.c:1442 (gossmap_manage_get_gossmap) 0x6506817c45fb
**BROKEN** gossipd: backtrace: gossipd/gossmap_manage.c:753 (gossmap_manage_handle_get_txout_reply) 0x6506817c5850
**BROKEN** gossipd: backtrace: gossipd/gossipd.c:574 (recv_req) 0x6506817c172b
```

Reported-by: @grubles
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This should detect partial writes more robustly, since we make a
separate pwrite() call to update this flag after the record is written.

Previously we were playing a bit loose with synchronization assumptions,
which seemed to work on Linux ext4, but not so well elsewhere.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
It was still using private channel announcements, which were removed
in v13.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
…D_BIT set.

Mostly this meant running them, then running devtools/convert-gossmap and replacing the code.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
…E_COMPLETED_BIT set.

Simply ran them through devtools/convert-gossmap, thought for gossip_store-part2 it
had to be appended to gossip_store-part1, converted, then cut off again.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
…et a read issue.

This is a last resort, but what else are we supposed to do when we wrote
something and it didn't appear?

In particular, ZFS doesn't just "fix itself":

```
remaining_fd=200001b0c9761dff0000000001009411e26cd56d68aabc285ee1c8ee43d59be6f939b0ce353d80213918680a7438356b9c5ea6bb001a6
bb37a4dea93776f4abc8cd371525b4d1605a74b89d7cb1bfc8865ddf22288c7ea08b9d98b34155b4aed159eb81732957e6bf79b996752bf2a9995aae
ad1d65e7889e826ea0ba42f7746c176fe12f2fe6c04af1a74b4f0a262d20efd57133eb32693c789eb3f09caf4f4c6ecd2f734b3b36e751ffcc2748c5
8feabce4173c4ce6098a2c5397aabf1be5442cb67b5030be11ebd8b9841838dae127fe30000000000000000000000000000000000000000000000000
000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
000000000000000000000000000000000000000000000000000000000002000000a218b9d93000000001005000000000000c060
```

Note the record appended on the end *after all the zeroes*.

Changelog-Changed: gossipd: add gossip_store recovery for filesystems which do not synchronize read and write (e.g. ZFS on Linux), by disabling mmap reads and rewriting the last records.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
gossipd now uses pwrite(), which is more broadly supported.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This doesn't happen yet, since we delete all HTLCs when we close a channel.  But we're
about to change that, so update the wallet_htlcs_first() code to avoid them.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
…rtup.

For old channels, this can take a while, and it stops everything.  But
we are only doing this to save space; it's not a *functional* necessity.

A quick and dirty test with 50,000 htlcs shows the htlc deletion took
450msec.  I tried adding an index, and changing it to set hstate to
HTLC_STATE_INVALID instead of deleting entries, but it still took about 350ms.

Whereas the "COUNT(*)" only took 1.7msec, so it's worth keeping.

Reported-by: @michael1011
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Changelog-Changed: lightningd: we defer deletion of old htlcs on channel close, to avoid pausing for a long time (we clean them on startup)
Fixes: ElementsProject#7962
…it for restart.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
```
lightningd-1 2025-09-22T02:10:10.978Z **BROKEN** plugin-bookkeeper: Unparsable datastore ["bookkeeper","rebalances","1-2"]
```

And, indeed, rebalance is missing:

```
>       outbound_ev = only_one([ev for ev in inc_evs if ev['tag'] == 'rebalance_fee'])

tests/test_bookkeeper.py:825: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

arr = []

    def only_one(arr):
        """Many JSON RPC calls return an array; often we only expect a single entry
        """
>       assert len(arr) == 1
E       AssertionError

```

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Parse key correctly.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Changelog-Fixed: bookkeeper: failed reload of rebalances on restart.
Cargo utilizes `git ls-remote` to resolve git dependencies specified by commit hashes. GitHub only advertises commits that are reachable from branches, tags, or PR references. The `bip353-plugin` was referencing an orphaned commit in the `bitcoin-payment-instructions` dependency that was unreachable through any advertised reference. This can be resolved by installing the tarball release v0.5.0.

Changelog-None.
all these changelogs only apply to the Docker image.

Changelog-Added: added verification of GPG keys for the bitcoin and litecoin tarballs.
Changelog-Fixed: fixed compilation on all target architectures; each had their own bugs (poetry, missing packages...).
Changelog-Fixed: fixed cargo cross compilation. it was mistakenly using QEMU before.
Changelog-Fixed: fixed CPU compatibility bug described in issue 8456
Changelog-Changed: improve build time by 8.8x
Changelog-Changed: improve image size by 2.07x

more detailed changelog can be found on the PR: ElementsProject#8429
Show the work we're doing (at debug level) and every 10 seconds print
progress (at INFO level):x

```
lightningd-1 2025-10-08T05:13:07.973Z INFO    lightningd: Creating database
lightningd-1 2025-10-08T05:13:10.987Z DEBUG   lightningd: Transferring 6166 chain_events
lightningd-1 2025-10-08T05:13:11.780Z DEBUG   lightningd: Transferring 1660043 channel_events
```

It's the inserting channel_events which takes a long time, slowing
down exponentially:

```
lightningd-1 2025-10-08T05:13:18.034Z INFO    lightningd: Inserted 26690/1660043 channel_events
lightningd-1 2025-10-08T05:13:28.034Z INFO    lightningd: Inserted 47086/1660043 channel_events
lightningd-1 2025-10-08T05:13:38.035Z INFO    lightningd: Inserted 61699/1660043 channel_events
lightningd-1 2025-10-08T05:13:48.035Z INFO    lightningd: Inserted 73743/1660043 channel_events
lightningd-1 2025-10-08T05:13:58.035Z INFO    lightningd: Inserted 83244/1660043 channel_events
...
lightningd-1 2025-10-08T05:35:18.286Z INFO    lightningd: Inserted 466720/1660043 channel_events
lightningd-1 2025-10-08T05:35:29.074Z INFO    lightningd: Inserted 468437/1660043 channel_events
lightningd-1 2025-10-08T05:35:39.079Z INFO    lightningd: Inserted 470130/1660043 channel_events
lightningd-1 2025-10-08T05:35:49.081Z INFO    lightningd: Inserted 471871/1660043 channel_events
```

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
…n migrations.

Before db is complete, ld->wallet->db is NULL.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Testing a large db shows Postgres slowing down exponentially as
it inserts the channel_events.  Rather than updating the index in the
db every time, do it at the end, for spectacular speedup:

```
lightningd-1 2025-10-08T05:39:44.333Z INFO    lightningd: Creating database
lightningd-1 2025-10-08T05:39:47.581Z DEBUG   lightningd: Transferring 6166 chain_events
lightningd-1 2025-10-08T05:39:48.455Z DEBUG   lightningd: Transferring 1660043 channel_events
lightningd-1 2025-10-08T05:39:54.390Z INFO    lightningd: Inserted 103100/1660043 channel_events
lightningd-1 2025-10-08T05:40:04.390Z INFO    lightningd: Inserted 283280/1660043 channel_events
lightningd-1 2025-10-08T05:40:14.390Z INFO    lightningd: Inserted 464065/1660043 channel_events
lightningd-1 2025-10-08T05:40:24.390Z INFO    lightningd: Inserted 629559/1660043 channel_events
lightningd-1 2025-10-08T05:40:34.390Z INFO    lightningd: Inserted 800659/1660043 channel_events
lightningd-1 2025-10-08T05:40:44.390Z INFO    lightningd: Inserted 975433/1660043 channel_events
lightningd-1 2025-10-08T05:40:54.390Z INFO    lightningd: Inserted 1134719/1660043 channel_events
lightningd-1 2025-10-08T05:41:04.390Z INFO    lightningd: Inserted 1290549/1660043 channel_events
lightningd-1 2025-10-08T05:41:14.390Z INFO    lightningd: Inserted 1443304/1660043 channel_events
lightningd-1 2025-10-08T05:41:24.390Z INFO    lightningd: Inserted 1590013/1660043 channel_events
lightningd-1 2025-10-08T05:41:29.148Z INFO    lightningd: bookkeeper migration complete: migrated 6166 chainmoves, 1660043 channelmoves, 132481 descriptions
```

Now we complete the entire migration in 1 minute 45 seconds.

Thanks to @Michael1101 for reporting this.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Changelog-Fixed: db: migration from v25.09 on a reasonable size account database could take almost infinite time.
…entonion.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
…nionmessage.

In this case we have a failmsg, so we should use that.  Otherwise we can have
both failmsg and failonion NULL in the call to injectonion_fail, which is not
valid.

```
DEBUG   022d223620a359a47ff7f7ac447c85c46c923da53389221a0054c11c1e3ca31d59-chan#1: Removing out HTLC 1 state RCVD_REMOVE_ACK_REVOCATION WIRE_INVALID_ONION_HMAC
**BROKEN** lightningd: FATAL SIGNAL 11 (version v25.09-135-g19a3bbc-modded)
**BROKEN** lightningd: backtrace: common/daemon.c:41 (send_backtrace) 0x6220e8fe0080
**BROKEN** lightningd: backtrace: common/daemon.c:78 (crashdump) 0x6220e8fe00cf
**BROKEN** lightningd: backtrace: ./signal/../sysdeps/unix/sysv/linux/x86_64/libc_sigaction.c:0 ((null)) 0x73614bc4532f
**BROKEN** lightningd: backtrace: lightningd/pay.c:1701 (injectonion_fail) 0x6220e8f951c0
**BROKEN** lightningd: backtrace: lightningd/pay.c:330 (tell_waiters_failed) 0x6220e8f943be
**BROKEN** lightningd: backtrace: lightningd/pay.c:656 (payment_failed) 0x6220e8f98db1
**BROKEN** lightningd: backtrace: lightningd/peer_htlcs.c:313 (fail_out_htlc) 0x6220e8fa1d04
**BROKEN** lightningd: backtrace: lightningd/peer_htlcs.c:1988 (remove_htlc_out) 0x6220e8fa271b
**BROKEN** lightningd: backtrace: lightningd/peer_htlcs.c:2086 (update_out_htlc) 0x6220e8fa2904
**BROKEN** lightningd: backtrace: lightningd/peer_htlcs.c:2095 (changed_htlc) 0x6220e8fa2c24
**BROKEN** lightningd: backtrace: lightningd/peer_htlcs.c:2608 (peer_got_revoke) 0x6220e8fa6e5a
**BROKEN** lightningd: backtrace: lightningd/channel_control.c:1555 (channel_msg) 0x6220e8f62725
**BROKEN** lightningd: backtrace: lightningd/subd.c:560 (sd_msg_read) 0x6220e8fb2eed
**BROKEN** lightningd: backtrace: ccan/ccan/io/io.c:60 (next_plan) 0x6220e90a3335
**BROKEN** lightningd: backtrace: ccan/ccan/io/io.c:422 (do_plan) 0x6220e90a3806
**BROKEN** lightningd: backtrace: ccan/ccan/io/io.c:439 (io_ready) 0x6220e90a38c3
**BROKEN** lightningd: backtrace: ccan/ccan/io/poll.c:455 (io_loop) 0x6220e90a524f
**BROKEN** lightningd: backtrace: lightningd/io_loop_with_timers.c:22 (io_loop_with_timers) 0x6220e8f7d1c7
**BROKEN** lightningd: backtrace: lightningd/lightningd.c:1496 (main) 0x6220e8f82db2
**BROKEN** lightningd: backtrace: ../sysdeps/nptl/libc_start_call_main.h:58 (__libc_start_call_main) 0x73614bc2a1c9
**BROKEN** lightningd: backtrace: ../csu/libc-start.c:360 (__libc_start_main_impl) 0x73614bc2a28a
**BROKEN** lightningd: backtrace: (null):0 ((null)) 0x6220e8f53b64
**BROKEN** lightningd: backtrace: (null):0 ((null)) 0xffffffffffffffff
```

Reported-by: @michael1011
Changelog-Fixed: lightningd: potential crash when we receive a malformed onion complain from our first peer when using sendonion / injectpaymentonion.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
@madelinevibes madelinevibes added the 25.09.1 Point release for 25.09 label Oct 13, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

25.09.1 Point release for 25.09

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants