Skip to content

Block lookup sync technical debt #5549

@dapplion

Description

@dapplion

Tracker issue for multiple technical debt cleaning ideas or simplifications that came up while working on DAS.

PR queue by intended merge order

Handle streams in SyncNetworkContext

Today, each sync component (Range, Backfill, Lookup) handles stream progression and termination. None of the components do anything until the stream is completed. Therefore they don't need to be aware of streams at all. Their current API to inject stream events is

impl SomeSync {
  /// None = stream termination
  fn block_response(block: Option<Block>) {}
  fn block_error(error: RPCError) {}
}

Instead their API could simply into

impl SomeSync {
  fn blocks_response(blocks: Result<Vec<Block>, RPCError>) {}
}

Where SyncNetworkContext tracks and accumulates stream items the same way we do today with range sync blocks and blobs.

Merge BackFillBlocks, BackFillBlockAndBlobs, RangeBlocks, RangeBlockAndBlobs handling

All those requests return the same type of data, a Vec<RpcBlock>, which can be handled with a proper request id.

Merge SingleBlock + ParentLookup request Ids

There's no need for the network router to be aware of the distinction between parent and current block lookups. Nesting the lookup type into SingleLookupReqId de-clutters the router and sync manager code.

Strict state transitions in single lookup state State

All properties of SingleLookupRequestState are public and some are mutated in disperse parts of the codebase, making it hard to follow state changes. We should follow more strict syntax like we do in range sync batches, where the state is checked before mutating it.

Avoid remove + insert pattern in single_block_lookups

It's possible to handle block lookup state transitions with a mutable reference to single_block_lookups's items. (Subjective) IMO the state transitions in this snippet below are easier to follow, as each processing result is strictly converted to one Action 👇

https://github.com/dapplion/lighthouse/blob/bca1967d1c5a03a55e1075e2c8736e8ed842adf2/beacon_node/network/src/sync/block_lookups/mod.rs#L321

The trade-off is longer functions, as we can't borrow self. IMO it's worth it, but let me know what do you think.

Re-consider ChildComponents

Won't Somebody Pleeease Think of the Children???

Motivation

Child components sole purpose is the following:

  • We receive network object A that claims to descend from an unknown block B
  • We attempt to fetch / resolve the parent chain of block B
  • If we do, fetch the block components of the parent chain which will include A
  • Skip downloading A because its cached as a child component (but we could just download it again)

The net result is a very slight reduction in bandwidth consumption and slighter faster times to complete recovery via parent lookups. Lighthouse would be just fine without them.

Simplification

A child component can be understood as a single lookup that starts into a Downloaded state. Since we can attribute the object to a sender there isn't a strict need to treat it differently that a regular download from an empty state.

Handle parent and child lookups together

Parent and child lookups could be handled equally if:

  • single block lookups can be "paused" while waiting for their parent to show-up
  • we can do accounting to bound max depth of parent chains

Then we can de-dup the code paths for block lookups for parent and child.

Note: I'm not sold on this path, as parent lookups have some fundamental differences. However, it's worth exploring once the low hanging fruit above is addressed

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions