Avoid persisting ChannelManager in some cases and separate event from persist notifies #2521

TheBlueMatt · 2023-08-24T21:33:00Z

We currently use the updatable-future from ChannelManager and
ChainMonitor to inform the background-processor that it needs
to wake to process events and persist the ChannelManager. These,
however, are two separate concepts, and just because we have some
messages to send to peers or events to process doesn't necessarily
mean that the ChannelManager needs to be persisted.

This is particularly bad during startup, where we persist a ChannelManager twice for each peer connection - once on peer_connected and once on channel_reestablish, however there's also excess persists in general cases like update_add_htlc.

Fixes #2502

TheBlueMatt · 2023-08-24T21:33:31Z

Tagging 117 since we have users suffering substantially from overly-aggressive channelmanager persistence.

codecov-commenter · 2023-08-28T01:28:34Z

Codecov Report

Patch coverage: 93.54% and no project coverage change.

Comparison is base (448b191) 90.45% compared to head (6d99f0a) 90.45%.
Report is 15 commits behind head on main.

❗ Current head 6d99f0a differs from pull request most recent head 32e5903. Consider uploading reports for the commit 32e5903 to get more accurate results

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files

@@           Coverage Diff           @@
##             main    #2521   +/-   ##
=======================================
  Coverage   90.45%   90.45%           
=======================================
  Files         112      112           
  Lines       58564    58653   +89     
  Branches    58564    58653   +89     
=======================================
+ Hits        52976    53057   +81     
- Misses       5588     5596    +8

Files Changed	Coverage Δ
lightning/src/ln/channelmanager.rs	`86.83% <93.28%> (+0.11%)`	⬆️
lightning-background-processor/src/lib.rs	`81.41% <100.00%> (ø)`

... and 6 files with indirect coverage changes

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

TheBlueMatt · 2023-08-28T01:29:47Z

Pushed some additional test coverage via the fuzzer (and rebased).

TheBlueMatt · 2023-09-01T20:16:12Z

Rebased.

lightning/src/util/wakers.rs

lightning-background-processor/src/lib.rs

lightning/src/ln/channelmanager.rs

jkczyz · 2023-09-07T23:24:16Z

Instead of adding a flag, would using two Notifiers suffice?

TheBlueMatt · 2023-09-08T00:01:04Z

Maybe. If we move to separate notifiers that's really saying we'll want to be able to capture the case of waking the background processor to persist without waking it to check for events. I don't think we care about that, and even the separation of waking and persisting here sucks, and I wouldn't suggest if we didn't have a good way to test it (yay fuzzers!) and didn't really need it.

That said, if we do want to avoid including the flag in the notifier, we could move to storing the "needs persistence" flag separately in the ChannelManager itself, exposing the (presumably) atomic bool directly. We'd still tie the notifier to it in the sense that we'd guarantee it doesn't flip on unless the notifier also fires, but would be exposed separately. I'd be okay with this approach too, though disconnecting the two in the API kinda sucks IMO.

jkczyz · 2023-09-08T03:31:29Z

Maybe. If we move to separate notifiers that's really saying we'll want to be able to capture the case of waking the background processor to persist without waking it to check for events. I don't think we care about that, and even the separation of waking and persisting here sucks, and I wouldn't suggest if we didn't have a good way to test it (yay fuzzers!) and didn't really need it.

I wasn't thinking about using separate notifiers for persisting and checking for events. Rather simply using one notifier for NotifyOption::DoPersist and another notifier for NotifyOption::SkipPersistHandleEvents.

I suppose currently there's some interaction required between the two with logically OR-ing the flag. Even if we stick with one notifier, it seems having that logic in the notifier is a layer violation, IIUC -- which I may not lol. If we took @wpaulino's approach and parameterized the type (using bool and ()), how would the OR-ing be implemented. Or would simply setting the flag instead of |= be sufficient?

TheBlueMatt · 2023-09-08T03:58:16Z

I wasn't thinking about using separate notifiers for persisting and checking for events. Rather simply using one notifier for NotifyOption::DoPersist and another notifier for NotifyOption::SkipPersistHandleEvents.

Right, that's one notifier for handle-events + persist and a separate one for events. Not quite sure I understand your suggestion here.

I suppose currently there's some interaction required between the two with logically OR-ing the flag. Even if we stick with one notifier, it seems having that logic in the notifier is a layer violation

Kinda? I mean yea, the notifier knowing that its about persisting is def a layer violation, but having an "arbitrary boolean" less so, kinda.

If we took @wpaulino's approach and parameterized the type (using bool and ()), how would the OR-ing be implemented. Or would simply setting the flag instead of |= be sufficient?

I don't think we'd need OR-ing if we had two notifiers - we could just notify the wake-background-processor-for-events and wake-background-processor-for-persist notifiers separately and go from there. (I do think we'd always process events in the BP, just only persist based on one notifier...but then we have to handle "which notifier notified me and make sure that we check if both got woken so we persist then" crap).

jkczyz · 2023-09-08T13:47:47Z

Right, that's one notifier for handle-events + persist and a separate one for events. Not quite sure I understand your suggestion here.

Just trying to tease out whether there's a clean way of avoiding the flag in the notifier, but sounds like the OR-ing may not be a concern as you mentioned below. Presumably we wouldn't need the flag at all?

I don't think we'd need OR-ing if we had two notifiers - we could just notify the wake-background-processor-for-events and wake-background-processor-for-persist notifiers separately and go from there. (I do think we'd always process events in the BP, just only persist based on one notifier...but then we have to handle "which notifier notified me and make sure that we check if both got woken so we persist then" crap).

I'm not absolutely wedded to it, but it does seem like this logic belongs in the background processor. Is it much more difficult to set a "needs persist" bool when awaiting on the Selector? Also, why don't we need to do such a check currently when ChainMonitor is the one notifying (i.e., why don't we also need to check if the ChannelManager was notified and wants to persist)?

@wpaulino Any thoughts on this? I think you've been touching / reviewing this code more so than me recently.

wpaulino · 2023-09-08T17:00:19Z

Also, why don't we need to do such a check currently when ChainMonitor is the one notifying (i.e., why don't we also need to check if the ChannelManager was notified and wants to persist)?

The ChainMonitor mostly only wakes because there was an async monitor update persisted so the ChannelManager has some events to process. Once it processes those events, we'll wake again (via the ChannelManager this time) to persist.

Any thoughts on this?

Yeah I'm also not a fan of the new notifier flag, even if from the notifier's PoV it has nothing to do with persistence. What happens once we need to expose some other value? If we can't make the generic value work (bound on core::ops::BitOr?), then I'd at least like to see a patch for the two notifiers approach so we can properly evaluate.

TheBlueMatt · 2023-09-08T18:34:19Z

Just trying to tease out whether there's a clean way of avoiding the flag in the notifier, but sounds like the OR-ing may not be a concern as you mentioned below. Presumably we wouldn't need the flag at all?

I don't think there is, outside of removing the implicit "notify means persist" and instead have notify from ChannelManager mean "events or persist, maybe both" with a separate get'er on the ChannelManager that lets you ask if persistence is needed right now.

I'm not absolutely wedded to it, but it does seem like this logic belongs in the background processor. Is it much more difficult to set a "needs persist" bool when awaiting on the Selector?

I think that's what this is? Or are you saying the separate get'er described above?

Also, why don't we need to do such a check currently when ChainMonitor is the one notifying (i.e., why don't we also need to check if the ChannelManager was notified and wants to persist)?

If we get woken by the ChainMonitor, when we next go around the loop we'll immediately get woken by the ChannelManager and persist then.

TheBlueMatt · 2023-09-08T18:36:38Z

Yeah I'm also not a fan of the new notifier flag, even if from the notifier's PoV it has nothing to do with persistence. What happens once we need to expose some other value? If we can't make the generic value work (bound on core::ops::BitOr?), then

We could absolutely do a BitOr bound, but that seems strange - like lets say you put in an int, and you once notify with 4 and once with 5 and now you get 4 XOR 5? But you never put in 4 XOR 5...

I'd at least like to see a patch for the two notifiers approach so we can properly evaluate.

I'm really not a fan of two separate notifiers - its annoying to make that work without always going around the BP twice on every notify. Instead, if we want to remove the notifier bool, I think we should move to a notify implying "events and/or persist" with a separate get'er on the ChannelManager to fetch an atomic flag and see if we need persist. I think that'd be a relatively straightforward patch, just awkward that we have two separate get'ers that are tied at the hip but separate in the ChannelManager API.

wpaulino · 2023-09-08T19:31:21Z

We could absolutely do a BitOr bound, but that seems strange - like lets say you put in an int, and you once notify with 4 and once with 5 and now you get 4 XOR 5? But you never put in 4 XOR 5...

Ah, right...

Instead, if we want to remove the notifier bool, I think we should move to a notify implying "events and/or persist" with a separate get'er on the ChannelManager to fetch an atomic flag and see if we need persist. I think that'd be a relatively straightforward patch, just awkward that we have two separate get'ers that are tied at the hip but separate in the ChannelManager API.

Right this sounds better to me as well. I agree it's a bit awkward, but you could say the say the same about the ChainMonitor future returning an unused flag -- they're both things you only have to deal with if you're implementing your own background processor. I'll leave it up to @jkczyz, I'm happy moving forward with this suggestion, but can also deal with the current patch.

jkczyz · 2023-09-08T19:38:46Z

Right this sounds better to me as well. I agree it's a bit awkward, but you could say the say the same about the ChainMonitor future returning an unused flag -- they're both things you only have to deal with if you're implementing your own background processor. I'll leave it up to @jkczyz, I'm happy moving forward with this suggestion, but can also deal with the current patch.

Yeah, I agree. Let's go with that suggestion.

TheBlueMatt · 2023-09-08T20:41:40Z

Done

lightning/src/ln/channelmanager.rs

In the next commit, we separate `ChannelManager`'s concept of waking a listener to both be persisted and to allow the user to handle events. Here we rename the future-fetching method in anticipation of this split.

jkczyz

Feel free to squash, IMO.

lightning/src/ln/channelmanager.rs

TheBlueMatt · 2023-09-11T18:24:29Z

Squashed the previous fixups but added three new ones.

lightning/src/ln/channelmanager.rs

wpaulino

LGTM

lightning/src/ln/channelmanager.rs

jkczyz · 2023-09-12T19:01:01Z

LGTM, fee free to squash

Prior to any actions which may generate a `ChannelMonitorUpdate`, and in general after startup, `ChannelManager::process_background_events` must be called. This is mostly accomplished by doing so on taking the `total_consistency_lock` via the `PersistenceNotifierGuard`. In order to skip this call in block connection logic, the `PersistenceNotifierGuard::optionally_notify` constructor did not call the `process_background_events` method. However, this is very easy to misuse - `optionally_notify` does not convey to the reader that they need to call `process_background_events` at all. Here we fix this by adding a separate `optionally_notify_skipping_background_events` method, making the requirements much clearer to callers.

Currently, when a ChannelManager generates a notification for the background processor, any pending events are handled and the ChannelManager is always re-persisted. Many channel related messages don't actually change the channel state in a way that changes the persisted channel. For example, an `update_add_htlc` or `update_fail_htlc` message simply adds the change to a queue, changing the channel state when we receive a `commitment_signed` message. In these cases we shouldn't be re-persisting the ChannelManager as it hasn't changed (persisted) state at all. In anticipation of doing so in the next few commits, here we make the public API handle the two concepts (somewhat) separately. The notification still goes out via a single waker, however whether or not to persist is now handled via a separate atomic bool.

As we now signal events-available from persistence-needed separately, the `NotifyOption` enum should include a separate variant for events-but-no-persistence, which we add here.

Many channel related messages don't actually change the channel state in a way that changes the persisted channel. For example, an `update_add_htlc` or `update_fail_htlc` message simply adds the change to a queue, changing the channel state when we receive a `commitment_signed` message. In these cases there's really no reason to wake the background processor at all - there's no response message and there's no state update. However, note that if we close the channel we should persist the `ChannelManager`. If we send an error message without closing the channel, we should wake the background processor without persisting. Here we move to the appropriate `NotifyOption` on some of the simpler channel message handlers.

When a peer connects and we send some `channel_reestablish` messages or create a `per_peer_state` entry there's really no reason to need to persist the `ChannelManager`. None of the possible actions we take immediately result in a change to the persisted contents of a `ChannelManager`, only the peer's later `channel_reestablish` message does.

If we receive a `ChannelUpdate` message which was invalid, it can cause us to force-close the channel, which should result in a `ChannelManager` persistence, though its not critical to do so.

When we handle an inbound `channel_reestablish` from our peers it generally doesn't change any state and thus doesn't need a `ChannelManager` persistence. Here we avoid said persistence where possible.

When reloading nodes A or C, the chanmon_consistency fuzzer currently calls `get_and_clear_pending_msg_events` on the node, potentially causing additional `ChannelMonitor` or `ChannelManager` updates, just to check that no unexpected messages are generated. There's not much reason to do so, the fuzzer could always swap for a different command to call the same method, and the additional checking requires some weird monitor persistence introspection. Here we simplify the fuzzer by simply removing this logic.

TheBlueMatt · 2023-09-12T19:06:43Z

Squashed without further changes.

In the `chanmon_consistency` fuzz, we currently "persist" the `ChannelManager` on each loop iteration. With the new logic in the past few commits to reduce the frequency of `ChannelManager` persistences, this behavior now leaves a gap in our test coverage - missing persistence notifications. In order to cath (common-case) persistence misses, we update the `chanmon_consistency` fuzzer to no longer persist the `ChannelManager` unless the waker was woken and signaled to persist, possibly reloading with a previous `ChannelManager` if we were not signaled.

TheBlueMatt · 2023-09-12T21:29:03Z

Grrr, fuzz was still broken, force-pushed the last commit:

$ git diff-tree -U1 e6f564a15 32e5903ef
diff --git a/fuzz/src/chanmon_consistency.rs b/fuzz/src/chanmon_consistency.rs
index 77192bf46..8afc2e151 100644
--- a/fuzz/src/chanmon_consistency.rs
+++ b/fuzz/src/chanmon_consistency.rs
@@ -1298,3 +1298,3 @@ pub fn do_test<Out: Output>(data: &[u8], underlying_out: Out) {

-		if nodes[0].get_and_clear_needs_persistence() == Ok(true) {
+		if nodes[0].get_and_clear_needs_persistence() == true {
 			node_a_ser.0.clear();
@@ -1302,3 +1302,3 @@ pub fn do_test<Out: Output>(data: &[u8], underlying_out: Out) {
 		}
-		if nodes[1].get_and_clear_needs_persistence() == Ok(true) {
+		if nodes[1].get_and_clear_needs_persistence() == true {
 			node_b_ser.0.clear();
@@ -1306,3 +1306,3 @@ pub fn do_test<Out: Output>(data: &[u8], underlying_out: Out) {
 		}
-		if nodes[2].get_and_clear_needs_persistence() == Ok(true) {
+		if nodes[2].get_and_clear_needs_persistence() == true {
 			node_c_ser.0.clear();
$

TheBlueMatt added this to the 0.0.117 milestone Aug 24, 2023

TheBlueMatt linked an issue Aug 24, 2023 that may be closed by this pull request

Too many channel manager writes in LDK? #2502

Closed

TheBlueMatt force-pushed the 2023-08-one-less-write branch from 1e6a675 to 317c735 Compare August 28, 2023 01:29

TheBlueMatt force-pushed the 2023-08-one-less-write branch from 317c735 to 8e00e06 Compare August 28, 2023 01:35

dunxen self-requested a review August 29, 2023 19:43

TheBlueMatt force-pushed the 2023-08-one-less-write branch from 8e00e06 to cd4e89c Compare September 1, 2023 20:16

jkczyz requested review from jkczyz and wpaulino September 7, 2023 17:15

wpaulino reviewed Sep 7, 2023

View reviewed changes

TheBlueMatt force-pushed the 2023-08-one-less-write branch 3 times, most recently from 94b420c to f079049 Compare September 8, 2023 22:18

jkczyz reviewed Sep 10, 2023

View reviewed changes

Rename ChannelManager update future methods for new API

9228f90

In the next commit, we separate `ChannelManager`'s concept of waking a listener to both be persisted and to allow the user to handle events. Here we rename the future-fetching method in anticipation of this split.

TheBlueMatt force-pushed the 2023-08-one-less-write branch from f079049 to c1546ff Compare September 11, 2023 00:08

TheBlueMatt force-pushed the 2023-08-one-less-write branch from ecbfbd6 to 02d9367 Compare September 11, 2023 16:45

jkczyz reviewed Sep 11, 2023

View reviewed changes

lightning/src/ln/channelmanager.rs Show resolved Hide resolved

lightning/src/ln/channelmanager.rs Outdated Show resolved Hide resolved

lightning/src/ln/channelmanager.rs Show resolved Hide resolved

TheBlueMatt force-pushed the 2023-08-one-less-write branch from 02d9367 to cd93690 Compare September 11, 2023 18:24

wpaulino reviewed Sep 11, 2023

View reviewed changes

lightning/src/ln/channelmanager.rs Outdated Show resolved Hide resolved

lightning/src/ln/channelmanager.rs Outdated Show resolved Hide resolved

lightning/src/ln/channelmanager.rs Outdated Show resolved Hide resolved

lightning/src/ln/channelmanager.rs Show resolved Hide resolved

TheBlueMatt force-pushed the 2023-08-one-less-write branch from cd93690 to 7393812 Compare September 11, 2023 21:45

wpaulino reviewed Sep 11, 2023

View reviewed changes

lightning/src/ln/channelmanager.rs Outdated Show resolved Hide resolved

lightning/src/ln/channelmanager.rs Outdated Show resolved Hide resolved

TheBlueMatt force-pushed the 2023-08-one-less-write branch 2 times, most recently from 6d99f0a to 79b4b58 Compare September 12, 2023 15:29

TheBlueMatt added 8 commits September 12, 2023 19:06

Update channelmanager::NotifyOption to indicate persist or event

305df1d

As we now signal events-available from persistence-needed separately, the `NotifyOption` enum should include a separate variant for events-but-no-persistence, which we add here.

Always persist the ChannelManager on a failed ChannelUpdate

9078c0d

If we receive a `ChannelUpdate` message which was invalid, it can cause us to force-close the channel, which should result in a `ChannelManager` persistence, though its not critical to do so.

Skip persistence in the usual case handling channel_reestablish

ce94a5e

When we handle an inbound `channel_reestablish` from our peers it generally doesn't change any state and thus doesn't need a `ChannelManager` persistence. Here we avoid said persistence where possible.

TheBlueMatt force-pushed the 2023-08-one-less-write branch from 79b4b58 to e6f564a Compare September 12, 2023 19:06

wpaulino previously approved these changes Sep 12, 2023

View reviewed changes

jkczyz previously approved these changes Sep 12, 2023

View reviewed changes

TheBlueMatt dismissed stale reviews from jkczyz and wpaulino via 32e5903 September 12, 2023 21:28

TheBlueMatt force-pushed the 2023-08-one-less-write branch from e6f564a to 32e5903 Compare September 12, 2023 21:28

jkczyz approved these changes Sep 12, 2023

View reviewed changes

wpaulino approved these changes Sep 12, 2023

View reviewed changes

TheBlueMatt merged commit 286d1db into lightningdevkit:main Sep 13, 2023

Avoid persisting ChannelManager in some cases and separate event from persist notifies #2521

Avoid persisting ChannelManager in some cases and separate event from persist notifies #2521

Uh oh!

Conversation

TheBlueMatt commented Aug 24, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

TheBlueMatt commented Aug 24, 2023

Uh oh!

codecov-commenter commented Aug 28, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

TheBlueMatt commented Aug 28, 2023

Uh oh!

TheBlueMatt commented Sep 1, 2023

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jkczyz commented Sep 7, 2023

Uh oh!

TheBlueMatt commented Sep 8, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jkczyz commented Sep 8, 2023

Uh oh!

TheBlueMatt commented Sep 8, 2023

Uh oh!

jkczyz commented Sep 8, 2023

Uh oh!

wpaulino commented Sep 8, 2023

Uh oh!

TheBlueMatt commented Sep 8, 2023

Uh oh!

TheBlueMatt commented Sep 8, 2023

Uh oh!

wpaulino commented Sep 8, 2023

Uh oh!

jkczyz commented Sep 8, 2023

Uh oh!

TheBlueMatt commented Sep 8, 2023

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jkczyz left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

TheBlueMatt commented Sep 11, 2023

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

wpaulino left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

jkczyz commented Sep 12, 2023

Uh oh!

TheBlueMatt commented Sep 12, 2023

Uh oh!

TheBlueMatt commented Sep 12, 2023

Uh oh!

Uh oh!

TheBlueMatt commented Aug 24, 2023 •

edited

Loading

codecov-commenter commented Aug 28, 2023 •

edited

Loading

TheBlueMatt commented Sep 8, 2023 •

edited

Loading