Skip to content

Commit 0a55bcd

Browse files
committed
Merge bitcoin/bitcoin#27981: Fix potential network stalling bug
3388e52 Rework receive buffer pushback (Pieter Wuille) Pull request description: See ElementsProject/elements#1233. There, it has been observed that if both sides of a P2P connection have a significant amount of data to send, a stall can occur, where both try to drain their own send queue before trying to receive. The same issue seems to apply to the current Bitcoin Core codebase, though I don't know whether it's a frequent issue for us. The core issue is that whenever our optimistic send fails to fully send a message, we do subsequently not even select() for receiving; if it then turns out that sending is not possible either, no progress is made at all. To address this, the solution used in this PR is to still select() for both sending and receiving when an optimistic send fails, but skip receiving if sending succeeded, and (still) doesn't fully drain the send queue. This is a significant reduction in how aggressive the "receive pushback" mechanism is, because now it will only mildly push back while sending progress is made; if the other side stops receiving entirely, the pushback disappears. I don't think that's a serious problem though: * We still have a pushback mechanism at the application buffer level (when the application receive buffer overflows, receiving is paused until messages in the buffer get processed; waiting on our own net_processing thread, not on the remote party). * There are cases where the existing mechanism is too aggressive; e.g. when the send queue is non-empty, but tiny, and can be sent with a single send() call. In that case, I think we'd prefer to still receive within the same processing loop of the network thread. ACKs for top commit: ajtowns: ACK 3388e52 naumenkogs: ACK 3388e52 mzumsande: Tested ACK 3388e52 Tree-SHA512: 28960feb3cd2ff3dfb39622510da62472612f88165ea98fc9fb844bfcb8fa3ed3633f83e7bd72bdbbbd37993ef10181b2e1b34836ebb8f0d83fd1c558921ec17
2 parents 7ef2d4e + 3388e52 commit 0a55bcd

File tree

2 files changed

+30
-37
lines changed

2 files changed

+30
-37
lines changed

src/net.cpp

Lines changed: 27 additions & 36 deletions
Original file line numberDiff line numberDiff line change
@@ -833,7 +833,7 @@ void V1TransportSerializer::prepareForTransport(CSerializedNetMsg& msg, std::vec
833833
CVectorWriter{SER_NETWORK, INIT_PROTO_VERSION, header, 0, hdr};
834834
}
835835

836-
size_t CConnman::SocketSendData(CNode& node) const
836+
std::pair<size_t, bool> CConnman::SocketSendData(CNode& node) const
837837
{
838838
auto it = node.vSendMsg.begin();
839839
size_t nSentSize = 0;
@@ -888,7 +888,7 @@ size_t CConnman::SocketSendData(CNode& node) const
888888
assert(node.nSendSize == 0);
889889
}
890890
node.vSendMsg.erase(node.vSendMsg.begin(), it);
891-
return nSentSize;
891+
return {nSentSize, !node.vSendMsg.empty()};
892892
}
893893

894894
/** Try to find a connection to evict when the node is full.
@@ -1226,37 +1226,15 @@ Sock::EventsPerSock CConnman::GenerateWaitSockets(Span<CNode* const> nodes)
12261226
}
12271227

12281228
for (CNode* pnode : nodes) {
1229-
// Implement the following logic:
1230-
// * If there is data to send, select() for sending data. As this only
1231-
// happens when optimistic write failed, we choose to first drain the
1232-
// write buffer in this case before receiving more. This avoids
1233-
// needlessly queueing received data, if the remote peer is not themselves
1234-
// receiving data. This means properly utilizing TCP flow control signalling.
1235-
// * Otherwise, if there is space left in the receive buffer, select() for
1236-
// receiving data.
1237-
// * Hand off all complete messages to the processor, to be handled without
1238-
// blocking here.
1239-
12401229
bool select_recv = !pnode->fPauseRecv;
1241-
bool select_send;
1242-
{
1243-
LOCK(pnode->cs_vSend);
1244-
select_send = !pnode->vSendMsg.empty();
1245-
}
1230+
bool select_send = WITH_LOCK(pnode->cs_vSend, return !pnode->vSendMsg.empty());
1231+
if (!select_recv && !select_send) continue;
12461232

12471233
LOCK(pnode->m_sock_mutex);
1248-
if (!pnode->m_sock) {
1249-
continue;
1234+
if (pnode->m_sock) {
1235+
Sock::Event event = (select_send ? Sock::SEND : 0) | (select_recv ? Sock::RECV : 0);
1236+
events_per_sock.emplace(pnode->m_sock, Sock::Events{event});
12501237
}
1251-
1252-
Sock::Event requested{0};
1253-
if (select_send) {
1254-
requested = Sock::SEND;
1255-
} else if (select_recv) {
1256-
requested = Sock::RECV;
1257-
}
1258-
1259-
events_per_sock.emplace(pnode->m_sock, Sock::Events{requested});
12601238
}
12611239

12621240
return events_per_sock;
@@ -1317,6 +1295,24 @@ void CConnman::SocketHandlerConnected(const std::vector<CNode*>& nodes,
13171295
errorSet = it->second.occurred & Sock::ERR;
13181296
}
13191297
}
1298+
1299+
if (sendSet) {
1300+
// Send data
1301+
auto [bytes_sent, data_left] = WITH_LOCK(pnode->cs_vSend, return SocketSendData(*pnode));
1302+
if (bytes_sent) {
1303+
RecordBytesSent(bytes_sent);
1304+
1305+
// If both receiving and (non-optimistic) sending were possible, we first attempt
1306+
// sending. If that succeeds, but does not fully drain the send queue, do not
1307+
// attempt to receive. This avoids needlessly queueing data if the remote peer
1308+
// is slow at receiving data, by means of TCP flow control. We only do this when
1309+
// sending actually succeeded to make sure progress is always made; otherwise a
1310+
// deadlock would be possible when both sides have data to send, but neither is
1311+
// receiving.
1312+
if (data_left) recvSet = false;
1313+
}
1314+
}
1315+
13201316
if (recvSet || errorSet)
13211317
{
13221318
// typical socket buffer is 8K-64K
@@ -1363,12 +1359,6 @@ void CConnman::SocketHandlerConnected(const std::vector<CNode*>& nodes,
13631359
}
13641360
}
13651361

1366-
if (sendSet) {
1367-
// Send data
1368-
size_t bytes_sent = WITH_LOCK(pnode->cs_vSend, return SocketSendData(*pnode));
1369-
if (bytes_sent) RecordBytesSent(bytes_sent);
1370-
}
1371-
13721362
if (InactivityCheck(*pnode)) pnode->fDisconnect = true;
13731363
}
13741364
}
@@ -2935,7 +2925,8 @@ void CConnman::PushMessage(CNode* pnode, CSerializedNetMsg&& msg)
29352925
if (nMessageSize) pnode->vSendMsg.push_back(std::move(msg.data));
29362926

29372927
// If write queue empty, attempt "optimistic write"
2938-
if (optimisticSend) nBytesSent = SocketSendData(*pnode);
2928+
bool data_left;
2929+
if (optimisticSend) std::tie(nBytesSent, data_left) = SocketSendData(*pnode);
29392930
}
29402931
if (nBytesSent) RecordBytesSent(nBytesSent);
29412932
}

src/net.h

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1013,7 +1013,9 @@ class CConnman
10131013

10141014
NodeId GetNewNodeId();
10151015

1016-
size_t SocketSendData(CNode& node) const EXCLUSIVE_LOCKS_REQUIRED(node.cs_vSend);
1016+
/** (Try to) send data from node's vSendMsg. Returns (bytes_sent, data_left). */
1017+
std::pair<size_t, bool> SocketSendData(CNode& node) const EXCLUSIVE_LOCKS_REQUIRED(node.cs_vSend);
1018+
10171019
void DumpAddresses();
10181020

10191021
// Network stats

0 commit comments

Comments
 (0)