-
Notifications
You must be signed in to change notification settings - Fork 892
Description
Background
The algorithm the VC uses to attest is:
- Regularly fetch attestation duties from the BN(s).
- When a validator is assigned to attest, fetch attestation data from the BN, sign it and publish it.
It handles re-orgs by flushing the duties whenever the BN indicates a different dependent_root
in its attestation duties response.
The Bug
The problem is, the BN can re-org in the time between the VC requesting duties & requesting the attestation data. This leads to the VC signing attestations at the wrong slots with the wrong validator indices, which when they are validated by the BN correctly trigger InvalidSignature
errors.
In detail, the sequence of events leading to InvalidSignature
errors is:
- The VC requests duties while the BN is still a bit behind. The BN advances the state and serves the duties.
- The BN catches up to the head (which changes the duties).
- Before the VC has a chance to learn about the new duties, it requests attestation data based on the old duties.
- The VC signs the attestation data under the assumption that the old duties are correct. This means it is essentially signing with the wrong shuffling: if validator
i
was in committeej
at slotn
under the old duties, then it is unlikely to be at the same position after the duties change. - When the BN receives the signed attestations, it (correctly) uses the
beacon_block_root
to calculate the committees, and finds that the signatures are wrong.
Steps to resolve
It's not immediately obvious how to fix this.
Unfortunately the beacon API spec does not define the dependent_root
as part of the AttestationData
response. If it did, the VC could detect the change in duties at the point where it requests the AttestationData
, and either abort the request (and rely on it being re-triggered at the next slot), or wait for the new duties to be loaded.
Other clients use the SSE endpoint to be informed of dependent_root
changes as soon as they happen, but even if we take this approach, there will still be a race as processing the SSE event and requesting the attestation data can happen concurrently.
Workaround
Once the BN is in sync the errors will disappear.
Additional Info
This issue is possibly more obvious in v5.2.0 and later, as we've made the BN more likely to return out of date duties:
This is both a simplification and a liveness benefit, so I don't think we should revert this change. Particularly on small or struggling networks, being able to produce blocks and attestations when the chain appears stalled is very useful.