Skip to content

eth: add logic to drop peers randomly when saturated #31476

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 27 commits into from
Apr 14, 2025
Merged

eth: add logic to drop peers randomly when saturated #31476

merged 27 commits into from
Apr 14, 2025

Conversation

cskiraly
Copy link
Contributor

WIP: this is still wip for a few reasons:

  • conditions of dropping a peer should be improved, including also inbound peer drop.
  • I've observed a deadlock in one test, still to be checked if the new module was the reason or not.

As of now, Geth disconnects peers only on protocol error or timeout, meaning once connection slots are filled, the peerset is largely fixed.

As mentioned in #31321, Geth should occasionally disconnect peers to ensure some churn. What/when to disconnect could depend on:

  • the state of geth (e.g. sync or not)
  • current number of peers
  • peer level metrics

This PR adds a very slow churn using a random drop.

@cskiraly cskiraly requested review from fjl and zsfelfoldi as code owners March 24, 2025 08:26
@fjl fjl changed the title p2p/connmanager: WIP - Add connection manager to drop connections when needed p2p: add connection manager to drop connections when needed Mar 24, 2025
@fjl
Copy link
Contributor

fjl commented Mar 24, 2025

If you want to track the sync status of the node, this code will have to live outside of package p2p. It'll likely need to be created from eth.Ethereum instead.

@cskiraly cskiraly changed the title p2p: add connection manager to drop connections when needed eth/connmanager: add connection manager to drop connections when needed Mar 26, 2025
@fjl fjl changed the title eth/connmanager: add connection manager to drop connections when needed eth: add connection manager to drop connections when needed Mar 26, 2025
cskiraly added 15 commits March 28, 2025 13:12
Dropping peers randomly with a slow pace to create some artificial
churn.

Signed-off-by: Csaba Kiraly <csaba.kiraly@gmail.com>
Signed-off-by: Csaba Kiraly <csaba.kiraly@gmail.com>
Signed-off-by: Csaba Kiraly <csaba.kiraly@gmail.com>
Signed-off-by: Csaba Kiraly <csaba.kiraly@gmail.com>
Signed-off-by: Csaba Kiraly <csaba.kiraly@gmail.com>
Signed-off-by: Csaba Kiraly <csaba.kiraly@gmail.com>
Signed-off-by: Csaba Kiraly <csaba.kiraly@gmail.com>
Better positioned in package eth to access relevant data about
connection quality.
Signed-off-by: Csaba Kiraly <csaba.kiraly@gmail.com>
Signed-off-by: Csaba Kiraly <csaba.kiraly@gmail.com>
Signed-off-by: Csaba Kiraly <csaba.kiraly@gmail.com>
Signed-off-by: Csaba Kiraly <csaba.kiraly@gmail.com>
Signed-off-by: Csaba Kiraly <csaba.kiraly@gmail.com>
Signed-off-by: Csaba Kiraly <csaba.kiraly@gmail.com>
Signed-off-by: Csaba Kiraly <csaba.kiraly@gmail.com>
@fjl
Copy link
Contributor

fjl commented Apr 4, 2025

let's rename type to dropper as well

cskiraly added 7 commits April 7, 2025 16:41
Signed-off-by: Csaba Kiraly <csaba.kiraly@gmail.com>
Signed-off-by: Csaba Kiraly <csaba.kiraly@gmail.com>
Signed-off-by: Csaba Kiraly <csaba.kiraly@gmail.com>
Signed-off-by: Csaba Kiraly <csaba.kiraly@gmail.com>
Signed-off-by: Csaba Kiraly <csaba.kiraly@gmail.com>
Signed-off-by: Csaba Kiraly <csaba.kiraly@gmail.com>
Signed-off-by: Csaba Kiraly <csaba.kiraly@gmail.com>
@cskiraly cskiraly changed the title eth: add connection manager to drop connections when needed eth: add dropper to drop peers randomly when saturated Apr 11, 2025
Signed-off-by: Csaba Kiraly <csaba.kiraly@gmail.com>
@cskiraly
Copy link
Contributor Author

image

Dropper in action:

  • red lines are outbound drops, starting when outbound peer pool is saturated,
  • yellow lines are inbound drops, starting much later, when inbound pool is saturated.
    This was with an accelerated drop timer. Default interval between drop events is set higher.

Signed-off-by: Csaba Kiraly <csaba.kiraly@gmail.com>
eth/dropper.go Outdated
// Interval between peer drop events (uniform between min and max)
peerDropIntervalMax = 7 * time.Minute
// Avoid dropping peers for some time after connection
doNotDropBefore = 2 * peerDropIntervalMax
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is too high. At the current settings, it means each connection gets at least 14min of being peered before we consider it. I think it should be independent of the interval and a bit smaller, like 5min.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or maybe 10min? Either way it's not intuitive for me why this should be connected to the interval.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. The protection period is meant for peers to have time to start contribution, but at the moment we have no logic related to "start contribute", so the number is a bit arbitrary.

The logic in connecting the two intervals is from the other side of things: for not protecting a new node too long. What I wanted to avoid is protecting a node for too long, while the whole old peerset can be changed. Actually, for this it would be better to use peerDropIntervalMin ....

Let's make it independent for now, then we can consider again later. I would like this number to be bigger than the time a the dropped peer needs to find a replacement. 10min sounds safe to start with, then we can checked the effect when it is rolled out, and adjust.

// How close to max should we initiate the drop timer. O should be fine,
// dropping when no more peers can be added. Larger numbers result in more
// aggressive drop behavior.
peerDropThreshold = 0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe best to remove this setting. I wouldn't want people to tinker with this, since a non-zero setting here can lead to weird behavior if peer finding efficiency is bad.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changing this will not change the drop interval, which ultimately drives churn in the system.

Changing this is somewhat similar to changing maxpeercount, which we do allow people to change (even on command line).
The effect is different on the dialed and on the served pool.

  • On the dialed pool, it leads slightly more aggressive dial after drop, because dial slots are adaptive.
  • On the served pool, it leads to more free slots in the system, which is largely needed, but needs careful checks before we turn this up.

So I would prefer keeping it to simplify experimenting with it for us.

@fjl fjl changed the title eth: add dropper to drop peers randomly when saturated eth: add logic to drop peers randomly when saturated Apr 14, 2025
The protection period is meant for peers to have time to start
contribution, but at the moment we have no logic related to
"start contribute", so the number is a bit arbitrary.

The logic in setting this based on the drop interval is from the other
side of things: for not protecting a new node too long. What I wanted
to avoid is protecting a node for too long, while the whole old
peerset can be changed. Actually, for this it would be better to
use peerDropIntervalMin ....

Let's make it independent for now, then we can consider again later.
I would like this number to be bigger than the time a the dropped
peer needs to find a replacement. 10min sounds safe to start with,
then we can checked the effect when it is rolled out, and adjust.

Signed-off-by: Csaba Kiraly <csaba.kiraly@gmail.com>
fjl
fjl previously approved these changes Apr 14, 2025
@fjl fjl merged commit c5c7597 into master Apr 14, 2025
3 of 5 checks passed
@fjl fjl added this to the 1.15.9 milestone Apr 14, 2025
@fjl fjl mentioned this pull request Apr 14, 2025
sivaratrisrinivas pushed a commit to sivaratrisrinivas/go-ethereum that referenced this pull request Apr 21, 2025
As of now, Geth disconnects peers only on protocol error or timeout,
meaning once connection slots are filled, the peerset is largely fixed.

As mentioned in ethereum#31321,
Geth should occasionally disconnect peers to ensure some churn.
What/when to disconnect could depend on:
- the state of geth (e.g. sync or not)
- current number of peers
- peer level metrics

This PR adds a very slow churn using a random drop.

---------

Signed-off-by: Csaba Kiraly <csaba.kiraly@gmail.com>
Co-authored-by: Felix Lange <fjl@twurst.com>
0g-wh pushed a commit to 0glabs/0g-geth that referenced this pull request Apr 22, 2025
As of now, Geth disconnects peers only on protocol error or timeout,
meaning once connection slots are filled, the peerset is largely fixed.

As mentioned in ethereum#31321,
Geth should occasionally disconnect peers to ensure some churn.
What/when to disconnect could depend on:
- the state of geth (e.g. sync or not)
- current number of peers
- peer level metrics

This PR adds a very slow churn using a random drop.

---------

Signed-off-by: Csaba Kiraly <csaba.kiraly@gmail.com>
Co-authored-by: Felix Lange <fjl@twurst.com>
0g-wh pushed a commit to 0g-wh/0g-geth that referenced this pull request May 8, 2025
As of now, Geth disconnects peers only on protocol error or timeout,
meaning once connection slots are filled, the peerset is largely fixed.

As mentioned in ethereum#31321,
Geth should occasionally disconnect peers to ensure some churn.
What/when to disconnect could depend on:
- the state of geth (e.g. sync or not)
- current number of peers
- peer level metrics

This PR adds a very slow churn using a random drop.

---------

Signed-off-by: Csaba Kiraly <csaba.kiraly@gmail.com>
Co-authored-by: Felix Lange <fjl@twurst.com>
Rampex1 pushed a commit to streamingfast/go-ethereum that referenced this pull request May 15, 2025
As of now, Geth disconnects peers only on protocol error or timeout,
meaning once connection slots are filled, the peerset is largely fixed.

As mentioned in ethereum#31321,
Geth should occasionally disconnect peers to ensure some churn.
What/when to disconnect could depend on:
- the state of geth (e.g. sync or not)
- current number of peers
- peer level metrics

This PR adds a very slow churn using a random drop.

---------

Signed-off-by: Csaba Kiraly <csaba.kiraly@gmail.com>
Co-authored-by: Felix Lange <fjl@twurst.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants