Skip to content
Hitenjain14 edited this page Mar 18, 2025 · 2 revisions

Blobber Repair Protocol

Overview

The repair process ensures that all blobbers in a decentralized storage allocation maintain a consistent version of data. This is necessary when:

  • A blobber misses a commit, leading to data inconsistency.
  • A user adds or replaces a blobber in the storage allocation.

Since data is erasure encoded using Reed-Solomon coding, we can recover the original data as long as at least data_shards number of blobbers have the correct data. The repair process synchronizes all blobbers to the same allocation root, ensuring consistency and integrity.


Problem Statement

Decentralized storage relies on multiple independent blobbers to store data redundantly using erasure encoding. However, due to failures, inconsistencies arise:

  1. A blobber may have missed a commit, leading to data mismatches.
  2. When a new blobber is added or replaced, it starts with an empty or outdated state.
  3. Data integrity needs to be enforced by ensuring all blobbers maintain the same allocation root, representing the latest version of stored data.

To resolve this, a structured repair process is required to restore all blobbers to the same version.


Repair Process

1. Allocation Root Consensus

  • The client fetches the allocation roots from all participating blobbers.
  • The client groups blobbers into sets based on their allocation roots.
  • The set with at least data_shards blobbers that share the same allocation root is considered the master set.
  • Blobbers not in the master set are secondary blobbers that require repair.

2. File Synchronization Using a Lead Blobber

  • A lead blobber is chosen from each set to act as a representative.
  • The lead blobber lists all files in a paginated manner.
  • The client processes file lists using a diff function to determine:
    1. Missing Files: Files present in the master set but absent in secondary blobbers.
    2. Extra Files: Files present in secondary blobbers but missing in the master set.
    3. Modified Files: Files with mismatched file hashes, indicating a need for update.
  • Based on this analysis, file operations are queued for execution.

3. Repair Execution

  • Batch processing is used for high throughput.
  • Files requiring repair are downloaded from the master set and uploaded to secondary blobbers.
  • Pipelining: Data is streamed from the master set directly to secondary blobbers, preventing disk writes and maximizing throughput.
  • The repair process iterates until all files are processed.

4. Ensuring Synchronization

  • Once all files are synchronized, all blobbers should have the same allocation root as the master set.
  • This ensures that all blobbers in the allocation are fully synchronized and maintain data consistency.

Clone this wiki locally