-
Notifications
You must be signed in to change notification settings - Fork 418
Avoid persisting ChannelManager in some cases and separate event from persist notifies #2521
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Avoid persisting ChannelManager in some cases and separate event from persist notifies #2521
Conversation
Tagging 117 since we have users suffering substantially from overly-aggressive channelmanager persistence. |
Codecov ReportPatch coverage:
❗ Your organization needs to install the Codecov GitHub app to enable full functionality. Additional details and impacted files@@ Coverage Diff @@
## main #2521 +/- ##
=======================================
Coverage 90.45% 90.45%
=======================================
Files 112 112
Lines 58564 58653 +89
Branches 58564 58653 +89
=======================================
+ Hits 52976 53057 +81
- Misses 5588 5596 +8
☔ View full report in Codecov by Sentry. |
1e6a675
to
317c735
Compare
Pushed some additional test coverage via the fuzzer (and rebased). |
317c735
to
8e00e06
Compare
Rebased. |
8e00e06
to
cd4e89c
Compare
Instead of adding a flag, would using two |
Maybe. If we move to separate notifiers that's really saying we'll want to be able to capture the case of waking the background processor to persist without waking it to check for events. I don't think we care about that, and even the separation of waking and persisting here sucks, and I wouldn't suggest if we didn't have a good way to test it (yay fuzzers!) and didn't really need it. That said, if we do want to avoid including the flag in the notifier, we could move to storing the "needs persistence" flag separately in the |
I wasn't thinking about using separate notifiers for persisting and checking for events. Rather simply using one notifier for I suppose currently there's some interaction required between the two with logically OR-ing the flag. Even if we stick with one notifier, it seems having that logic in the notifier is a layer violation, IIUC -- which I may not lol. If we took @wpaulino's approach and parameterized the type (using |
Right, that's one notifier for handle-events + persist and a separate one for events. Not quite sure I understand your suggestion here.
Kinda? I mean yea, the notifier knowing that its about persisting is def a layer violation, but having an "arbitrary boolean" less so, kinda.
I don't think we'd need OR-ing if we had two notifiers - we could just notify the wake-background-processor-for-events and wake-background-processor-for-persist notifiers separately and go from there. (I do think we'd always process events in the BP, just only persist based on one notifier...but then we have to handle "which notifier notified me and make sure that we check if both got woken so we persist then" crap). |
Just trying to tease out whether there's a clean way of avoiding the flag in the notifier, but sounds like the OR-ing may not be a concern as you mentioned below. Presumably we wouldn't need the flag at all?
I'm not absolutely wedded to it, but it does seem like this logic belongs in the background processor. Is it much more difficult to set a "needs persist" bool when awaiting on the @wpaulino Any thoughts on this? I think you've been touching / reviewing this code more so than me recently. |
The
Yeah I'm also not a fan of the new notifier flag, even if from the notifier's PoV it has nothing to do with persistence. What happens once we need to expose some other value? If we can't make the generic value work (bound on |
I don't think there is, outside of removing the implicit "notify means persist" and instead have notify from ChannelManager mean "events or persist, maybe both" with a separate get'er on the ChannelManager that lets you ask if persistence is needed right now.
I think that's what this is? Or are you saying the separate get'er described above?
If we get woken by the ChainMonitor, when we next go around the loop we'll immediately get woken by the ChannelManager and persist then. |
We could absolutely do a
I'm really not a fan of two separate notifiers - its annoying to make that work without always going around the BP twice on every notify. Instead, if we want to remove the notifier bool, I think we should move to a notify implying "events and/or persist" with a separate get'er on the ChannelManager to fetch an atomic flag and see if we need persist. I think that'd be a relatively straightforward patch, just awkward that we have two separate get'ers that are tied at the hip but separate in the ChannelManager API. |
Ah, right...
Right this sounds better to me as well. I agree it's a bit awkward, but you could say the say the same about the |
Yeah, I agree. Let's go with that suggestion. |
Done |
94b420c
to
f079049
Compare
In the next commit, we separate `ChannelManager`'s concept of waking a listener to both be persisted and to allow the user to handle events. Here we rename the future-fetching method in anticipation of this split.
f079049
to
c1546ff
Compare
ecbfbd6
to
02d9367
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Feel free to squash, IMO.
Squashed the previous fixups but added three new ones. |
02d9367
to
cd93690
Compare
cd93690
to
7393812
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
6d99f0a
to
79b4b58
Compare
LGTM, fee free to squash |
Prior to any actions which may generate a `ChannelMonitorUpdate`, and in general after startup, `ChannelManager::process_background_events` must be called. This is mostly accomplished by doing so on taking the `total_consistency_lock` via the `PersistenceNotifierGuard`. In order to skip this call in block connection logic, the `PersistenceNotifierGuard::optionally_notify` constructor did not call the `process_background_events` method. However, this is very easy to misuse - `optionally_notify` does not convey to the reader that they need to call `process_background_events` at all. Here we fix this by adding a separate `optionally_notify_skipping_background_events` method, making the requirements much clearer to callers.
Currently, when a ChannelManager generates a notification for the background processor, any pending events are handled and the ChannelManager is always re-persisted. Many channel related messages don't actually change the channel state in a way that changes the persisted channel. For example, an `update_add_htlc` or `update_fail_htlc` message simply adds the change to a queue, changing the channel state when we receive a `commitment_signed` message. In these cases we shouldn't be re-persisting the ChannelManager as it hasn't changed (persisted) state at all. In anticipation of doing so in the next few commits, here we make the public API handle the two concepts (somewhat) separately. The notification still goes out via a single waker, however whether or not to persist is now handled via a separate atomic bool.
As we now signal events-available from persistence-needed separately, the `NotifyOption` enum should include a separate variant for events-but-no-persistence, which we add here.
Many channel related messages don't actually change the channel state in a way that changes the persisted channel. For example, an `update_add_htlc` or `update_fail_htlc` message simply adds the change to a queue, changing the channel state when we receive a `commitment_signed` message. In these cases there's really no reason to wake the background processor at all - there's no response message and there's no state update. However, note that if we close the channel we should persist the `ChannelManager`. If we send an error message without closing the channel, we should wake the background processor without persisting. Here we move to the appropriate `NotifyOption` on some of the simpler channel message handlers.
When a peer connects and we send some `channel_reestablish` messages or create a `per_peer_state` entry there's really no reason to need to persist the `ChannelManager`. None of the possible actions we take immediately result in a change to the persisted contents of a `ChannelManager`, only the peer's later `channel_reestablish` message does.
If we receive a `ChannelUpdate` message which was invalid, it can cause us to force-close the channel, which should result in a `ChannelManager` persistence, though its not critical to do so.
When we handle an inbound `channel_reestablish` from our peers it generally doesn't change any state and thus doesn't need a `ChannelManager` persistence. Here we avoid said persistence where possible.
When reloading nodes A or C, the chanmon_consistency fuzzer currently calls `get_and_clear_pending_msg_events` on the node, potentially causing additional `ChannelMonitor` or `ChannelManager` updates, just to check that no unexpected messages are generated. There's not much reason to do so, the fuzzer could always swap for a different command to call the same method, and the additional checking requires some weird monitor persistence introspection. Here we simplify the fuzzer by simply removing this logic.
79b4b58
to
e6f564a
Compare
Squashed without further changes. |
In the `chanmon_consistency` fuzz, we currently "persist" the `ChannelManager` on each loop iteration. With the new logic in the past few commits to reduce the frequency of `ChannelManager` persistences, this behavior now leaves a gap in our test coverage - missing persistence notifications. In order to cath (common-case) persistence misses, we update the `chanmon_consistency` fuzzer to no longer persist the `ChannelManager` unless the waker was woken and signaled to persist, possibly reloading with a previous `ChannelManager` if we were not signaled.
e6f564a
to
32e5903
Compare
Grrr, fuzz was still broken, force-pushed the last commit: $ git diff-tree -U1 e6f564a15 32e5903ef
diff --git a/fuzz/src/chanmon_consistency.rs b/fuzz/src/chanmon_consistency.rs
index 77192bf46..8afc2e151 100644
--- a/fuzz/src/chanmon_consistency.rs
+++ b/fuzz/src/chanmon_consistency.rs
@@ -1298,3 +1298,3 @@ pub fn do_test<Out: Output>(data: &[u8], underlying_out: Out) {
- if nodes[0].get_and_clear_needs_persistence() == Ok(true) {
+ if nodes[0].get_and_clear_needs_persistence() == true {
node_a_ser.0.clear();
@@ -1302,3 +1302,3 @@ pub fn do_test<Out: Output>(data: &[u8], underlying_out: Out) {
}
- if nodes[1].get_and_clear_needs_persistence() == Ok(true) {
+ if nodes[1].get_and_clear_needs_persistence() == true {
node_b_ser.0.clear();
@@ -1306,3 +1306,3 @@ pub fn do_test<Out: Output>(data: &[u8], underlying_out: Out) {
}
- if nodes[2].get_and_clear_needs_persistence() == Ok(true) {
+ if nodes[2].get_and_clear_needs_persistence() == true {
node_c_ser.0.clear();
$ |
We currently use the updatable-future from
ChannelManager
andChainMonitor
to inform thebackground-processor
that it needsto wake to process events and persist the
ChannelManager
. These,however, are two separate concepts, and just because we have some
messages to send to peers or events to process doesn't necessarily
mean that the
ChannelManager
needs to be persisted.This is particularly bad during startup, where we persist a
ChannelManager
twice for each peer connection - once onpeer_connected
and once onchannel_reestablish
, however there's also excess persists in general cases likeupdate_add_htlc
.Fixes #2502