Skip to content

Fix leak in range sync components_by_range_requests #7767

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 4 commits into from

Conversation

jimmygchen
Copy link
Member

@jimmygchen jimmygchen commented Jul 21, 2025

Issue Addressed

When a data column range sync fails with coupling error returned below, if the coupling error does not contain a column and peer (i.e. column_and_peer == None)

fn responses_with_custody_columns(
blocks: Vec<Arc<SignedBeaconBlock<E>>>,
data_columns: DataColumnSidecarList<E>,
column_to_peer: HashMap<u64, PeerId>,
expects_custody_columns: &[ColumnIndex],
spec: &ChainSpec,
) -> Result<Vec<RpcBlock<E>>, CouplingError> {

The entire batch gets retried, without the previous context.components_by_range_requests entry being removed, this result in unbounded memory usage increase as shown in the metric:

image

Note: this PR is not tested and I'm not sure if the fix is good - but just wanted to illustrate the issue.

@jimmygchen jimmygchen requested a review from pawanjay176 July 21, 2025 07:53
@@ -885,6 +885,8 @@ impl<T: BeaconChainTypes> SyncingChain<T> {
failing_batch: batch_id,
});
}
// FIXME: hack to remove range request before a retry
network.remove_range_request_by_id(request_id);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that this hack didn't fix it

image

The metric i was looking at is sync_active_network_requests, type == components_by_range, recorded here:

(
"components_by_range",
self.components_by_range_requests.len(),
),

This change what caused entries not to be removed i believe, its used by both range and backfill:

if blocks_result.is_ok() {
// remove the entry only if it coupled successfully with
// no errors
entry.remove();
}
// If the request is finished, dequeue everything
Some(blocks_result.map_err(RpcResponseError::BlockComponentCouplingError))

@jimmygchen
Copy link
Member Author

This will be fixed properly by @pawanjay176 in #7762.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant