Skip to content

Conversation

@samliok
Copy link
Collaborator

@samliok samliok commented Mar 12, 2025

This PR updates the code for replication. Rather than having two separate requests for notarizations and finalizations, this consolidates them into a single ReplicationRequest struct.

type  ReplicationRequest  struct {
	Seqs []uint64  // sequences we are requesting
	LatestRound  uint64  // latest round that we are aware of
}

type  ReplicationResponse  struct {
	Data []QuorumRound
	LatestRound  *QuorumRound
}

type  QuorumRound  struct {
	Block  Block
	Notarization  *Notarization
	FCert  *FinalizationCertificate
	EmptyNotarization  *EmptyNotarization
}

The LatestRound field notifies the responding node, whether the requesting node is behind. This way it knows whether to send its most recent QuorumRound. Receiving a higher latest round, tells the requesting node that it is still behind and may need to send out more replication requests.

Copy link
Collaborator

@yacovm yacovm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Made another pass, will make another final pass later.

return q.Notarization.Verify()
}

return nil
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know that IsWellFormed() ensures that either we have an fCert or a notarization, but we won't have neither, but... is it possible to return an error here, and return the result of q.FCert.Verify() above?

Like:

	if q.FCert != nil {
		if !bytes.Equal(blockDigest[:], q.FCert.Finalization.Digest[:]) {
			return fmt.Errorf("finalization certificate does not match the block")
		}
		return q.FCert.Verify()
	}

	if q.Notarization != nil {
		if !bytes.Equal(blockDigest[:], q.Notarization.Vote.Digest[:]) {
			return fmt.Errorf("notarization does not match the block")
		}
		return q.Notarization.Verify()
	}

	return fmt.Errorf("QuorumRound is neither an EmptyNotarization, nor a Notarization or a Finalization")

This way it's clear from the code that we can't pass none of them and succeed verification.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i dont think we need this. IsWellFormed would have returned early if so.

also one of your previous comments suggested that i shouldn't return early after verifying the fCert. i think it makes more sense to go with what you said before, and verify both the fCert and notarization if necessary
#123 (comment)

delete(e.replicationState.receivedFinalizationCertificates, nextSeqToCommit)
e.replicationState.maybeCollectFutureFinalizationCertificates(e.round, e.Storage.Height())
return e.processFinalizedBlock(&finalizedBlock)
// TODO: for this pr include a helper function to allow the node to deduce whether
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO for me - better understand this piece in the next review round

epoch.go Outdated
return e.processReplicationState()
}

// TODO: we need to make sure that we do not forget about notarizations missing for rounds < e.round
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm also a bit puzzled regarding why have e.round here? Are we assuming that processReplicationState gave us sequences that advanced the round to the point where we might be right before an empty notarization?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think this is an old comment. The point of it was to double check if we would properly replicate notarizations that we receive that are less than our current round. However, i don't think we can ever receive a notarization that is less than e.round since we only advance e.round after we notarize or finalize.


roundDigest := round.block.BlockHeader().Digest
notarizedDigest := notarizedBlock.notarization.Vote.BlockHeader.Digest
if !bytes.Equal(roundDigest[:], notarizedDigest[:]) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should rather adopt this block instead of what we have, because this block is the notarized one, so ours is an equivocated block.

However let's take care of this in a new PR, I want to freeze this one for major changes.

if exists {
roundDigest := round.block.BlockHeader().Digest
seqDigest := finalizedBlock.FCert.Finalization.BlockHeader.Digest
seqDigest := fCert.Finalization.BlockHeader.Digest
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I missed this before:

if !bytes.Equal(roundDigest[:], seqDigest[:]) {
			e.Logger.Warn("Received finalized block that is different from the one we have in the rounds map",
				zap.Stringer("roundDigest", roundDigest), zap.Stringer("seqDigest", seqDigest))
			return nil
		}

But this shouldn't happen. we should return an error here instead, and mark the halted error.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done


normalNode1 := newSimplexNode(t, nodes[0], net, bb, newNodeConfig(nodes[0]))
normalNode2 := newSimplexNode(t, nodes[1], net, bb, newNodeConfig(nodes[1]))
newSimplexNode(t, nodes[2], net, bb, newNodeConfig(nodes[2]))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is the third instance not given a variable?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there's no need to give it a variable name, it would be unused

msg := &simplex.Message{
EmptyNotarization: emptyNotarization,
}
normalNode2.e.Comm.SendMessage(msg, normalNode1.e.ID)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it not possible to make the nodes send the messages from their own will, as done in the failover tests via waitForBlockProposerTimeout or something?

Copy link
Collaborator Author

@samliok samliok Mar 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes. Do you mind if I also do this in a follow up? I think it would add more to this PR since I'd need to change to change how advanceRound works as well as the number of nodes in the network for this test.

I did some changes locally and got the test passing with a flake, i'd like to look at and create a new PR with a clean diff.


fCert, _ := newFinalizationRecord(t, laggingNode.e.Logger, laggingNode.e.SignatureAggregator, block, nodes)

// we broadcast from the second node so that node 1 will be able to respond
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The point of network tests is to simulate a real scenario of how nodes behave under certain network conditions of crashes and message omissions.

Forcing them to send messages as we see fit isn't ideal for testing a real scenario.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i agree this isn't ideal, but our current code for replication is hardcoded to be sent from the first node. I think this test should be updated in the same pr as #82.

@yacovm
Copy link
Collaborator

yacovm commented Mar 17, 2025

production code LGTM, will make another pass on the tests tomorrow.

@yacovm yacovm merged commit 407c708 into main Mar 18, 2025
5 checks passed
@samliok samliok deleted the seq-replication branch March 20, 2025 22:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants