-
Notifications
You must be signed in to change notification settings - Fork 20.9k
eth: add logic to drop peers randomly when saturated #31476
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 23 commits
136d32d
7a49bf0
91c1c30
75e4c26
4635dac
23cda63
ea8d05a
e0b0189
c41569d
61b26a9
628c5e5
cb5d672
301b396
77d634c
d46ef40
8bb7f1e
42d2c9b
e9065ac
5da26a9
75c8ee1
1647f51
7a76bdd
4a69bf9
ff66b1c
2a9372e
976e039
4961445
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,170 @@ | ||
// Copyright 2015 The go-ethereum Authors | ||
// This file is part of the go-ethereum library. | ||
// | ||
// The go-ethereum library is free software: you can redistribute it and/or modify | ||
// it under the terms of the GNU Lesser General Public License as published by | ||
// the Free Software Foundation, either version 3 of the License, or | ||
// (at your option) any later version. | ||
// | ||
// The go-ethereum library is distributed in the hope that it will be useful, | ||
// but WITHOUT ANY WARRANTY; without even the implied warranty of | ||
// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the | ||
// GNU Lesser General Public License for more details. | ||
// | ||
// You should have received a copy of the GNU Lesser General Public License | ||
// along with the go-ethereum library. If not, see <http://www.gnu.org/licenses/>. | ||
|
||
package eth | ||
|
||
import ( | ||
mrand "math/rand" | ||
"slices" | ||
"sync" | ||
"time" | ||
|
||
"github.com/ethereum/go-ethereum/common" | ||
"github.com/ethereum/go-ethereum/common/mclock" | ||
"github.com/ethereum/go-ethereum/log" | ||
"github.com/ethereum/go-ethereum/metrics" | ||
"github.com/ethereum/go-ethereum/p2p" | ||
) | ||
|
||
const ( | ||
// Interval between peer drop events (uniform between min and max) | ||
peerDropIntervalMin = 3 * time.Minute | ||
// Interval between peer drop events (uniform between min and max) | ||
peerDropIntervalMax = 7 * time.Minute | ||
// Avoid dropping peers for some time after connection | ||
doNotDropBefore = 2 * peerDropIntervalMax | ||
// How close to max should we initiate the drop timer. O should be fine, | ||
// dropping when no more peers can be added. Larger numbers result in more | ||
// aggressive drop behavior. | ||
peerDropThreshold = 0 | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Maybe best to remove this setting. I wouldn't want people to tinker with this, since a non-zero setting here can lead to weird behavior if peer finding efficiency is bad. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. changing this will not change the drop interval, which ultimately drives churn in the system. Changing this is somewhat similar to changing maxpeercount, which we do allow people to change (even on command line).
So I would prefer keeping it to simplify experimenting with it for us. |
||
) | ||
|
||
var ( | ||
// droppedInbound is the number of inbound peers dropped | ||
droppedInbound = metrics.NewRegisteredMeter("eth/dropper/inbound", nil) | ||
// droppedOutbound is the number of outbound peers dropped | ||
droppedOutbound = metrics.NewRegisteredMeter("eth/dropper/outbound", nil) | ||
) | ||
|
||
// dropper monitors the state of the peer pool and makes changes as follows: | ||
// - during sync the Downloader handles peer connections, so dropper is disabled | ||
// - if not syncing and the peer count is close to the limit, it drops peers | ||
// randomly every peerDropInterval to make space for new peers | ||
// - peers are dropped separately from the inboud pool and from the dialed pool | ||
type dropper struct { | ||
maxDialPeers int // maximum number of dialed peers | ||
maxInboundPeers int // maximum number of inbound peers | ||
peersFunc getPeersFunc | ||
syncingFunc getSyncingFunc | ||
|
||
// peerDropTimer introduces churn if we are close to limit capacity. | ||
// We handle Dialed and Inbound connections separately | ||
peerDropTimer *time.Timer | ||
|
||
peerEventCh chan *p2p.PeerEvent // channel for peer event changes | ||
|
||
wg sync.WaitGroup // wg for graceful shutdown | ||
shutdownCh chan struct{} | ||
} | ||
|
||
// Callback type to get the list of connected peers. | ||
type getPeersFunc func() []*p2p.Peer | ||
|
||
// Callback type to get syncing status. | ||
// Returns true while syncing, false when synced. | ||
type getSyncingFunc func() bool | ||
|
||
func newDropper(maxDialPeers, maxInboundPeers int) *dropper { | ||
cm := &dropper{ | ||
maxDialPeers: maxDialPeers, | ||
maxInboundPeers: maxInboundPeers, | ||
peerDropTimer: time.NewTimer(randomDuration(peerDropIntervalMin, peerDropIntervalMax)), | ||
peerEventCh: make(chan *p2p.PeerEvent), | ||
shutdownCh: make(chan struct{}), | ||
} | ||
if peerDropIntervalMin > peerDropIntervalMax { | ||
panic("peerDropIntervalMin duration must be less than or equal to peerDropIntervalMax duration") | ||
} | ||
return cm | ||
} | ||
|
||
// Start the dropper. | ||
func (cm *dropper) Start(srv *p2p.Server, syncingFunc getSyncingFunc) { | ||
cm.peersFunc = srv.Peers | ||
cm.syncingFunc = syncingFunc | ||
cm.wg.Add(1) | ||
go cm.loop() | ||
} | ||
|
||
// Stop the dropper. | ||
func (cm *dropper) Stop() { | ||
cm.peerDropTimer.Stop() | ||
close(cm.shutdownCh) | ||
cm.wg.Wait() | ||
} | ||
|
||
// dropRandomPeer selects one of the peers randomly and drops it from the peer pool. | ||
func (cm *dropper) dropRandomPeer() bool { | ||
peers := cm.peersFunc() | ||
var numInbound int | ||
for _, p := range peers { | ||
if p.Inbound() { | ||
numInbound++ | ||
} | ||
} | ||
numDialed := len(peers) - numInbound | ||
|
||
selectDoNotDrop := func(p *p2p.Peer) bool { | ||
// Avoid dropping trusted and static peers, or recent peers. | ||
// Only drop peers if their respective category (dialed/inbound) | ||
// is close to limit capacity. | ||
return p.Trusted() || p.StaticDialed() || | ||
p.Lifetime() < mclock.AbsTime(doNotDropBefore) || | ||
(p.DynDialed() && cm.maxDialPeers-numDialed > peerDropThreshold) || | ||
(p.Inbound() && cm.maxInboundPeers-numInbound > peerDropThreshold) | ||
} | ||
|
||
droppable := slices.DeleteFunc(peers, selectDoNotDrop) | ||
if len(droppable) > 0 { | ||
p := droppable[mrand.Intn(len(droppable))] | ||
log.Debug("Dropping random peer", "inbound", p.Inbound(), | ||
"id", p.ID(), "duration", common.PrettyDuration(p.Lifetime()), "peercountbefore", len(peers)) | ||
p.Disconnect(p2p.DiscTooManyPeers) | ||
cskiraly marked this conversation as resolved.
Show resolved
Hide resolved
|
||
if p.Inbound() { | ||
droppedInbound.Mark(1) | ||
} else { | ||
droppedOutbound.Mark(1) | ||
} | ||
return true | ||
} | ||
return false | ||
} | ||
|
||
// randomDuration generates a random duration between min and max. | ||
func randomDuration(min, max time.Duration) time.Duration { | ||
if min > max { | ||
panic("min duration must be less than or equal to max duration") | ||
} | ||
return time.Duration(mrand.Int63n(int64(max-min)) + int64(min)) | ||
} | ||
|
||
// loop is the main loop of the connection dropper. | ||
func (cm *dropper) loop() { | ||
defer cm.wg.Done() | ||
|
||
for { | ||
select { | ||
case <-cm.peerDropTimer.C: | ||
// Drop a random peer if we are not syncing and the peer count is close to the limit. | ||
if !cm.syncingFunc() { | ||
cm.dropRandomPeer() | ||
} | ||
cm.peerDropTimer.Reset(randomDuration(peerDropIntervalMin, peerDropIntervalMax)) | ||
case <-cm.shutdownCh: | ||
return | ||
} | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is too high. At the current settings, it means each connection gets at least 14min of being peered before we consider it. I think it should be independent of the interval and a bit smaller, like 5min.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or maybe 10min? Either way it's not intuitive for me why this should be connected to the interval.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. The protection period is meant for peers to have time to start contribution, but at the moment we have no logic related to "start contribute", so the number is a bit arbitrary.
The logic in connecting the two intervals is from the other side of things: for not protecting a new node too long. What I wanted to avoid is protecting a node for too long, while the whole old peerset can be changed. Actually, for this it would be better to use peerDropIntervalMin ....
Let's make it independent for now, then we can consider again later. I would like this number to be bigger than the time a the dropped peer needs to find a replacement. 10min sounds safe to start with, then we can checked the effect when it is rolled out, and adjust.