Skip to content

ln: om: add option to internally queue forwards to offline peers #3765

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

phlip9
Copy link
Contributor

@phlip9 phlip9 commented May 2, 2025

While implementing BOLT 12 onion-message forwarding to offline user nodes in our LSP, I found this relatively simple change in OnionMessenger that reduced the integration on our end to +50 loc. Figured I'd create a draft PR here to see if there's any appetite for this change--if so, I can work on making this more upstreamable :)

Commit

Adds an option to OnionMessenger to have it queue all onion message forwards to offline peers.

Context: when someone requests a BOLT 12 invoice from one of our user nodes, our LSP may need to first spin up their node before it can finally forward the onion message.

Why not use intercept_messages_for_offline_peers? We'd have to rebuild almost the exact same message_recipients queue outside LDK. In our case, there's no reason to persist the onion message forwards since our user nodes can run on-demand and should be ready to handle the forward in time. With this change, the logic turned out super simple on our end since we just have to handle the ConnectionNeeded event. All the pending messages then automatically forward once the peer connects. If the connection fails or it's not one of our peers, the pending messages will just naturally churn out in a few timer ticks.

Adds an option to `OnionMessenger` to have it queue all onion message
forwards to offline peers.

Context: when someone requests or pays a BOLT12 invoice to one of our
user nodes, our LSP may need to first spin up their node before it can
finally forward the onion message.

Why not use `intercept_messages_for_offline_peers`? We'd have to rebuild
almost the exact same `message_recipients` queue outside LDK. With this
change, the logic turned out super simple on our end since we just have
to handle the `ConnectionNeeded` event. All the pending messages then
automatically forward once the peer connects. If the connection fails or
it's not one of our peers, the pending messages will just naturally
churn out in a few timer ticks.
@ldk-reviews-bot
Copy link

👋 Hi! I see this is a draft PR.
I'll wait to assign reviewers until you mark it as ready for review.
Just convert it out of draft status when you're ready for review!

Copy link

codecov bot commented May 2, 2025

Codecov Report

Attention: Patch coverage is 34.48276% with 38 lines in your changes missing coverage. Please review.

Project coverage is 89.31%. Comparing base (8d44e80) to head (6b8b874).

Files with missing lines Patch % Lines
lightning/src/onion_message/messenger.rs 34.48% 38 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #3765      +/-   ##
==========================================
- Coverage   89.37%   89.31%   -0.07%     
==========================================
  Files         157      157              
  Lines      124095   124150      +55     
  Branches   124095   124150      +55     
==========================================
- Hits       110914   110881      -33     
- Misses      10470    10547      +77     
- Partials     2711     2722      +11     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@TheBlueMatt
Copy link
Collaborator

I imagine we could take this upstream, but we'd have to figure out a anti-DoS policy that makes sense for any intended use-cases here. For y'all, that probably means dropping messages if they're queued for more than a minute or three, but maybe for others it doesn't? Also we'll have to limit the queues, which make them fairly DoS-vulnerable. Its the DoS policy stuff that's why we didn't do this in the first place (well, plus persistence), and not really sure what the right answer is to those :/

@phlip9
Copy link
Contributor Author

phlip9 commented May 2, 2025

For us at least, I think these (1) max total buffer size, (2) max per-peer buffer size, and (3) max queued lifetime (in timer ticks), are probably Good Enough^tm DoS/QoS policies for now, and are otherwise easily tunable.

What makes this diff less upstreamable IMO is that other LSPs probably can't make the same assumptions as us; their offline peers can't always come online in time so they need a persistent queue for these messages.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants