Fix get_custody_groups not sorting the result #7711

nethoxa · 2025-07-07T18:14:42Z

Issue Addressed

The function get_custody_groups does not follow the spec as it uses a hashset, which, upon iterated, follows an arbitrary order. The impact, at much, is LH nodes broadcasting and downloading data unorderly, while other nodes would do it in order (kind of LH nodes being misleading). However, as all nodes have the same values, and so the same downloaded data, no punishment in the peer score is applied (at first, would need to dig more).

Nevermind, this contradicts the specs which clearly enforces that the returned custody_groups must be ordered:

link

def get_custody_groups(node_id: NodeID, custody_group_count: uint64) -> Sequence[CustodyIndex]:
    assert custody_group_count <= NUMBER_OF_CUSTODY_GROUPS

    current_id = uint256(node_id)
    custody_groups: List[CustodyIndex] = []
    while len(custody_groups) < custody_group_count:
        custody_group = CustodyIndex(
            bytes_to_uint64(hash(uint_to_bytes(current_id))[0:8]) % NUMBER_OF_CUSTODY_GROUPS
        )
        if custody_group not in custody_groups:
            custody_groups.append(custody_group)
        if current_id == UINT256_MAX:
            # Overflow prevention
            current_id = uint256(0)
        else:
            current_id += 1

    assert len(custody_groups) == len(set(custody_groups))
    return sorted(custody_groups) <===================== HERE

This was not triggered by EF test suite as in the handler, it sorted the array, so even if the function returned the wrong value, the test passed:

link

    fn result(&self, _case_index: usize, _fork_name: ForkName) -> Result<(), Error> {
        let spec = E::default_spec();
        let node_id = U256::from_str_radix(&self.node_id, 10)
            .map_err(|e| Error::FailedToParseTest(format!("{e:?}")))?;
        let raw_node_id = node_id.to_be_bytes::<32>();
        let mut computed = get_custody_groups(raw_node_id, self.custody_group_count, &spec)
            .map(|set| set.into_iter().collect::<Vec<_>>())
            .expect("should compute custody groups");
        computed.sort();

        let expected = &self.result;
        if computed == *expected {
            Ok(())
        } else {
            Err(Error::NotEqual(format!(
                "Got {computed:?}\nExpected {expected:?}"
            )))
        }
    }

Proposed Changes

Change HashSet to BTreeSet, which has the same functions and is assured to be ordered, as the test proves
Remove the call to sort from the handler
Change the types of the functions that call get_custody_groups to handle the correct type

Additional Info

As reference, check Prysm. I think another check for the length of the set should be added as in the specs, even if redundant, just in case.

cla-assistant · 2025-07-07T18:14:55Z

All committers have signed the CLA.

cla-assistant · 2025-07-07T18:14:57Z

Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
_{You have signed the CLA already but the status is still pending? Let us recheck it.}

nethoxa · 2025-07-08T12:52:09Z

About the length check, it is not needed as it is done to prevent duplicates, but a BTreeSet is a set, so no duplicates allowed.

jimmygchen · 2025-07-23T07:02:34Z

Hi @nethoxa, thanks for the PR and bringing this up!

is LH nodes broadcasting and downloading data unorderly, while other nodes would do it in order (kind of LH nodes being misleading)

I don't fully understand what you mean by this - when broadcasting we only use the custody columns for filtering, and when we download via gossip / rpc, the order is not determined by the get_custody_group ordering. This should not result in any peer penalty. The ordering should have no impact to functionality. Please correct me if I'm wrong.

I'm actually working on an validator-custody related refactor, and I'm now considering to preserve the exact order of the get_custody_group result (ordered but NOT sorted), so that we don't have to maintain multiple set of custody column indices for different CGCs. When the CGC increases, we can just derive the column indices by taking a bigger slice of the full column set.

Consider a scenario where the node's custody column is 8, it custodies the following columns
[32, 79, 125, 35, 10, 112, 52, 56]

Let's say mid way through the epoch, the node's CGC increased due to an extra validator being attached to the beacon node, now:

we immediately update our advertised CGC to 9
the custody columns are now [32, 79, 125, 35, 10, 112, 52, 56, 94], with 94 being the additional column
we immediately subscribe to the additional custody subnet 94
however our DA check still requires the previous 8 for all slots before the next epoch

We now have to maintain two sets of custody groups, one for current epoch, and one for the next epoch. However if we maintain one full ordered list (128 ordered indices), we could easily take slices of it to determine the columns for any custody count.

There are different ways to approach this of course. But I don't see the need to make our internal implementation match the spec implementation, unless it results in undesired behaviour?

nethoxa · 2025-07-23T13:47:42Z

Hi @jimmygchen, as you say, no functionality rn would be affected nor any penalty. About this:

But I don't see the need to make our internal implementation match the spec implementation

I think you should, as on futures upgrades on top may rely on compliance with the specs, so could lead to some issues in the future (has happened recently, that's why I'm saying it). About your work on an validator-custody related refactor, lemme some time to think about it.

Thanks

Fix get_custody_groups not sorting the result

0981513

nethoxa requested a review from jxs as a code owner July 7, 2025 18:14

chong-he added test improvement Improve tests ready-for-review The code is ready for review das Data Availability Sampling labels Jul 9, 2025

Merge branch 'sigp:unstable' into unstable

1b6a089

Merge branch 'sigp:unstable' into unstable

1d3e1b6

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix get_custody_groups not sorting the result #7711

Fix get_custody_groups not sorting the result #7711

nethoxa commented Jul 7, 2025

Uh oh!

cla-assistant bot commented Jul 7, 2025 •

edited

Loading

Uh oh!

cla-assistant bot commented Jul 7, 2025

Uh oh!

nethoxa commented Jul 8, 2025

Uh oh!

jimmygchen commented Jul 23, 2025

Uh oh!

nethoxa commented Jul 23, 2025

Uh oh!

Uh oh!

Fix get_custody_groups not sorting the result #7711

Are you sure you want to change the base?

Fix get_custody_groups not sorting the result #7711

Conversation

nethoxa commented Jul 7, 2025

Issue Addressed

Proposed Changes

Additional Info

Uh oh!

cla-assistant bot commented Jul 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cla-assistant bot commented Jul 7, 2025

Uh oh!

nethoxa commented Jul 8, 2025

Uh oh!

jimmygchen commented Jul 23, 2025

Uh oh!

nethoxa commented Jul 23, 2025

Uh oh!

Uh oh!

cla-assistant bot commented Jul 7, 2025 •

edited

Loading