Skip to content

InvalidSignature errors caused by race between duties & attestation data #5687

@michaelsproul

Description

@michaelsproul

Background

The algorithm the VC uses to attest is:

  1. Regularly fetch attestation duties from the BN(s).
  2. When a validator is assigned to attest, fetch attestation data from the BN, sign it and publish it.

It handles re-orgs by flushing the duties whenever the BN indicates a different dependent_root in its attestation duties response.

The Bug

The problem is, the BN can re-org in the time between the VC requesting duties & requesting the attestation data. This leads to the VC signing attestations at the wrong slots with the wrong validator indices, which when they are validated by the BN correctly trigger InvalidSignature errors.

In detail, the sequence of events leading to InvalidSignature errors is:

  1. The VC requests duties while the BN is still a bit behind. The BN advances the state and serves the duties.
  2. The BN catches up to the head (which changes the duties).
  3. Before the VC has a chance to learn about the new duties, it requests attestation data based on the old duties.
  4. The VC signs the attestation data under the assumption that the old duties are correct. This means it is essentially signing with the wrong shuffling: if validator i was in committee j at slot n under the old duties, then it is unlikely to be at the same position after the duties change.
  5. When the BN receives the signed attestations, it (correctly) uses the beacon_block_root to calculate the committees, and finds that the signatures are wrong.

Steps to resolve

It's not immediately obvious how to fix this.

Unfortunately the beacon API spec does not define the dependent_root as part of the AttestationData response. If it did, the VC could detect the change in duties at the point where it requests the AttestationData, and either abort the request (and rely on it being re-triggered at the next slot), or wait for the new duties to be loaded.

Other clients use the SSE endpoint to be informed of dependent_root changes as soon as they happen, but even if we take this approach, there will still be a race as processing the SSE event and requesting the attestation data can happen concurrently.

Workaround

Once the BN is in sync the errors will disappear.

Additional Info

This issue is possibly more obvious in v5.2.0 and later, as we've made the BN more likely to return out of date duties:

This is both a simplification and a liveness benefit, so I don't think we should revert this change. Particularly on small or struggling networks, being able to produce blocks and attestations when the chain appears stalled is very useful.

Metadata

Metadata

Assignees

No one assigned

    Labels

    UX-and-logsbugSomething isn't workingval-clientRelates to the validator client binary

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions