-
Notifications
You must be signed in to change notification settings - Fork 11.6k
[consensus] only wait to replay until highest consumer processed commit #22854
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
The latest updates on your projects. Learn more about Vercel for Git ↗︎
2 Skipped Deployments
|
b51d012
to
ab47b1a
Compare
ab47b1a
to
920a29a
Compare
920a29a
to
84af149
Compare
84af149
to
f7ca8b6
Compare
f7ca8b6
to
104aa61
Compare
104aa61
to
9b91e75
Compare
9b91e75
to
6fe223c
Compare
if committed_sub_dag.commit_ref.index | ||
> commit_consumer.consumer_last_processed_commit_index | ||
{ | ||
committed_sub_dag.local_dag_has_finalization_blocks = false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is needed to mitigate incorrectly finalize transactions in recovered commits.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure how this fixes the issue? We only persist finalized commits to store so recovering finalized commits should be able to be reprocessed correctly. What information is missing to be able to properly finalize transactions in recovered commits?
sui/consensus/core/src/commit_finalizer.rs
Lines 118 to 124 in 6fe223c
if !finalized_commits.is_empty() { | |
// Commits and committed blocks must be persisted to storage before sending them to Sui | |
// to execute their finalized transactions. | |
// Commit metadata and uncommitted blocks can be persisted more lazily because they are recoverable. | |
// But for simplicity, all unpersisted commits and blocks are flushed to storage. | |
self.dag_state.write().flush(); | |
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Chatted offline. Today for simplicity both finalized and non-finalized commits are persisted. But a suffix of commits may not have been finalized, and they don't have enough info in local dag to optimistically finalize transactions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approving to unblock fixing the original issue of reducing delay in rebuilding checkpoints which looks good, but I still have an open question on how this fixes incorrect finalization
if committed_sub_dag.commit_ref.index | ||
> commit_consumer.consumer_last_processed_commit_index | ||
{ | ||
committed_sub_dag.local_dag_has_finalization_blocks = false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure how this fixes the issue? We only persist finalized commits to store so recovering finalized commits should be able to be reprocessed correctly. What information is missing to be able to properly finalize transactions in recovered commits?
sui/consensus/core/src/commit_finalizer.rs
Lines 118 to 124 in 6fe223c
if !finalized_commits.is_empty() { | |
// Commits and committed blocks must be persisted to storage before sending them to Sui | |
// to execute their finalized transactions. | |
// Commit metadata and uncommitted blocks can be persisted more lazily because they are recoverable. | |
// But for simplicity, all unpersisted commits and blocks are flushed to storage. | |
self.dag_state.write().flush(); | |
} |
6fe223c
to
5882b35
Compare
5882b35
to
710a9d9
Compare
710a9d9
to
d2b91e7
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💯
/// and is false when the commit is received through commit sync. | ||
/// When this is false, CommitFinalizer will not assume there are enough blocks | ||
/// to optimistically finalize transactions in the commit. | ||
pub local_dag_has_finalization_blocks: bool, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: maybe this could be something more directly with how this sub dag has been committed? Ex:
committed_with_local_certificate
commit_certificate_exists_locally
leader_certificate_exists_locally
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good ideas.
// & last_committed_rounds from the commits as needed. | ||
commit_info_to_write: Vec<(CommitRef, CommitInfo)>, | ||
|
||
// Buffers finalized commits and their rejected transactions to be written to storage. | ||
finalized_commits_to_write: Vec<(CommitRef, BTreeMap<BlockRef, Vec<TransactionIndex>>)>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we intend to use somewhere the rejected transaction indexes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, let me follow up with verification. Also they can be used for debugging if needed.
@@ -154,10 +154,10 @@ impl ConsensusManagerTrait for MysticetiManager { | |||
|
|||
let num_prior_commits = protocol_config.consensus_num_requested_prior_commits_at_startup(); | |||
let last_processed_commit = consensus_handler.last_processed_subdag_index() as CommitIndex; | |||
let starting_commit = last_processed_commit.saturating_sub(num_prior_commits); | |||
let restart_after_commit = last_processed_commit.saturating_sub(num_prior_commits); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: restart_after_commit
-> replay_after_commit
?
Description
Currently,
CommitConsumerMonitor::replay_complete()
waits until all local commits found during initialization gets processed throughhandle_consensus_commit()
. But a suffix of local commits may not have been finalized before. These commits can only be finalized with new commits, after consensus starts running. So the waiter can wait for a long time.This change makes the waiter only wait until the last commit index
handle_consensus_commit()
finished processing, which is guaranteed to be finalized in local DAG. Then the waiter should not wait for too long on restart.Also, record finalized commit refs and their rejected transactions. They will be used during recovery to identify the commits which have enough blocks in local DAG to finalize them. In future, they will be used to verify rejected transactions are computed correctly during recovery.
Test plan
CI
Release notes
Check each box that your changes affect. If none of the boxes relate to your changes, release notes aren't required.
For each box you select, include information after the relevant heading that describes the impact of your changes that a user might notice and any actions they must take to implement updates.