Skip to content

feat(code): Implement minimal gossip properties to ensure liveness #997

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 51 commits into from
May 16, 2025

Conversation

ancazamfir
Copy link
Collaborator

@ancazamfir ancazamfir commented May 1, 2025

Closes: #998

Implement minimal gossip properties for liveness.

Summary of changes

  1. Liveness messages:

    • Added a new /liveness topic/ channel and message types for network communication
    • Introduced new protobuf definitions in liveness.proto for handling gossip messages
    • Added support for PolkaCertificate and RoundCertificate message types
  2. Network Layer Changes:

    • Added a new Liveness channel type alongside existing channels (Consensus, ProposalParts, Sync)
    • Modified network event handling to distinguish between consensus and liveness messages
    • Updated message encoding/decoding to support the new liveness message types
  3. Certificate Handling:

    • Moved PolkaCertificate and related types from sync proto to liveness proto
    • Added new RoundCertificate type for handling round-specific certificates
  4. Codec Updates:

    • Added new codec implementations for liveness messages
    • Updated protobuf codec to handle the new message types

Still to be done:
- [ ] Remove vote set mode - done in separate PR #1008
- [ ] Move rebroadcast in its own module and outside core-consensus - will do in separate PR

  • Proper verification for gossip messages (in particular the round certificate)
  • Decide if we store the round certificate when it occurs (i.e. we got f+1 quorum or 2f+1 precommit any) similarly with last_prevote and last_precommit or find the certificate when we enter a round > 0 (current approach)
  • Write IT tests

PR author checklist

For all contributors

For external contributors

Copy link

github-actions bot commented May 1, 2025

✅ Semver Check Passed

Great job! All semver violations have been resolved. This PR now complies with semantic versioning rules.

If you made version updates, please ensure that:

  • The version in Cargo.toml accurately reflects the nature of your changes
  • Any breaking changes are properly documented in BREAKING_CHANGES.md

Copy link

codecov bot commented May 2, 2025

Codecov Report

Attention: Patch coverage is 74.31421% with 206 lines in your changes missing coverage. Please review.

Project coverage is 77.64%. Comparing base (acc180b) to head (e0c3353).
Report is 2 commits behind head on main.

✅ All tests successful. No failed tests found.

Additional details and impacted files
@@            Coverage Diff            @@
##           main     #997       +/-   ##
=========================================
+ Coverage      0   77.64%   +77.64%     
=========================================
  Files         0      159      +159     
  Lines         0    16814    +16814     
  Branches      0    16814    +16814     
=========================================
+ Hits          0    13055    +13055     
- Misses        0     2883     +2883     
- Partials      0      876      +876     
Flag Coverage Δ
integration 77.45% <74.31%> (?)
mbt 7.92% <0.00%> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

romac added a commit that referenced this pull request May 7, 2025
@romac romac changed the title feat(code): Add gossip messages and handling feat(code): Implement minimal gossip properties May 8, 2025
@romac
Copy link
Member

romac commented May 8, 2025

@ancazamfir I just added a basic integration test, looking good so far :)

@ancazamfir
Copy link
Collaborator Author

@nenadmilosevic95 @romac one last thing, we need to decide on (from the round sync spec):

"Open questions:

  • What should be the value we adopt for rebroadcastTimeout?"

@ancazamfir
Copy link
Collaborator Author

ancazamfir commented May 14, 2025

I enabled these rebroadcast mode tests in 0d7321e:

restart_with_byzantine_proposer_1_rebroadcast_parts_only
restart_with_byzantine_proposer_1_rebroadcast_proposal_and_parts
restart_with_byzantine_proposer_2_rebroadcast_parts_only
restart_with_byzantine_proposer_2_rebroadcast_proposal_and_parts

They all passed locally, however on CI we get
Error publishing message: InsufficientPeers channel=/liveness

I will disable them again and open an issue to solve the failures in a separate PR (issue #1029)

@ancazamfir
Copy link
Collaborator Author

Also liveness::round_certificate_rebroadcast fails in commit f7bd600. Passes locally.

@nenadmilosevic95
Copy link
Contributor

@nenadmilosevic95 @romac one last thing, we need to decide on (from the round sync spec):

"Open questions:

* What should be the value we adopt for rebroadcastTimeout?"

What is the value that you currently use?

@ancazamfir
Copy link
Collaborator Author

@nenadmilosevic95 @romac one last thing, we need to decide on (from the round sync spec):
"Open questions:

* What should be the value we adopt for rebroadcastTimeout?"

What is the value that you currently use?

I still need to do the change to reflect the round sync proposal. Currently we use the old timeout mechanism. We have two timers, PrevoteRebroadcast with duration equal to the timeout_prevote and, similarly, PrecommitRebroadcast with duration equal to the timeout_precommit. When we enter Prevote state we start the PrevoteRebroadcast timer and if it expires we resend last prevote and precommit. When we enter Precommit state we start the PrecommitRebroadcast timer and we resent the last votes when it expires. In both cases we restart the timer that just expired.
The timer duration does not take into account the deltas.

We will have only one timer as proposed but what should be the duration?

@nenadmilosevic95
Copy link
Contributor

@nenadmilosevic95 @romac one last thing, we need to decide on (from the round sync spec):
"Open questions:

* What should be the value we adopt for rebroadcastTimeout?"

What is the value that you currently use?

I still need to do the change to reflect the round sync proposal. Currently we use the old timeout mechanism. We have two timers, PrevoteRebroadcast with duration equal to the timeout_prevote and, similarly, PrecommitRebroadcast with duration equal to the timeout_precommit. When we enter Prevote state we start the PrevoteRebroadcast timer and if it expires we resend last prevote and precommit. When we enter Precommit state we start the PrecommitRebroadcast timer and we resent the last votes when it expires. In both cases we restart the timer that just expired. The timer duration does not take into account the deltas.

We will have only one timer as proposed but what should be the duration?

What was the motivation to have two timeouts?
By "does not take into account the deltas" you mean that it does not increase as we progress in higher rounds?

@ancazamfir
Copy link
Collaborator Author

ancazamfir commented May 15, 2025

What was the motivation to have two timeouts?

It was the initial rebroadcast design that starknet proposed. And (in main) we are sending only the last prevote on PrevoteRebroadcast expiry and the last precommit on PrecommitRebroadcast

Copy link
Contributor

@nenadmilosevic95 nenadmilosevic95 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, left some minor comments

ancazamfir and others added 2 commits May 16, 2025 11:00
* Implement solution 2 wrt rebroadcast timer

* Change names according to spec

* Cancel rebroadcast timer when round cert is received

* Increase the rebroadcast timeout each round

* Update the config.toml files used by tests
Co-authored-by: nenadmilosevic95 <50905385+nenadmilosevic95@users.noreply.github.com>
Signed-off-by: Anca Zamfir <ancazamfir@users.noreply.github.com>
Copy link
Contributor

@nenadmilosevic95 nenadmilosevic95 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

ancazamfir and others added 3 commits May 16, 2025 11:46
Co-authored-by: nenadmilosevic95 <50905385+nenadmilosevic95@users.noreply.github.com>
@ancazamfir ancazamfir added this pull request to the merge queue May 16, 2025
Merged via the queue into main with commit 1420a0f May 16, 2025
23 checks passed
@ancazamfir ancazamfir deleted the anca/gossip branch May 16, 2025 17:45
@romac
Copy link
Member

romac commented May 16, 2025

🎉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Ensure Malachite provides the Gossip property for Tendermint liveness
3 participants