Replication of Future Sequences & Rounds #123

samliok · 2025-03-12T19:47:23Z

This PR updates the code for replication. Rather than having two separate requests for notarizations and finalizations, this consolidates them into a single ReplicationRequest struct.

type  ReplicationRequest  struct {
	Seqs []uint64  // sequences we are requesting
	LatestRound  uint64  // latest round that we are aware of
}

type  ReplicationResponse  struct {
	Data []QuorumRound
	LatestRound  *QuorumRound
}

type  QuorumRound  struct {
	Block  Block
	Notarization  *Notarization
	FCert  *FinalizationCertificate
	EmptyNotarization  *EmptyNotarization
}

The LatestRound field notifies the responding node, whether the requesting node is behind. This way it knows whether to send its most recent QuorumRound. Receiving a higher latest round, tells the requesting node that it is still behind and may need to send out more replication requests.

Signed-off-by: Sam Liokumovich <65994425+samliok@users.noreply.github.com>

epoch.go

yacovm

Made another pass, will make another final pass later.

replication.go

epoch.go

yacovm · 2025-03-14T22:01:21Z

msg.go

+		return q.Notarization.Verify()
+	}
+
+	return nil


I know that IsWellFormed() ensures that either we have an fCert or a notarization, but we won't have neither, but... is it possible to return an error here, and return the result of q.FCert.Verify() above?

Like:

if q.FCert != nil { if !bytes.Equal(blockDigest[:], q.FCert.Finalization.Digest[:]) { return fmt.Errorf("finalization certificate does not match the block") } return q.FCert.Verify() } if q.Notarization != nil { if !bytes.Equal(blockDigest[:], q.Notarization.Vote.Digest[:]) { return fmt.Errorf("notarization does not match the block") } return q.Notarization.Verify() } return fmt.Errorf("QuorumRound is neither an EmptyNotarization, nor a Notarization or a Finalization")

This way it's clear from the code that we can't pass none of them and succeed verification.

i dont think we need this. IsWellFormed would have returned early if so.

also one of your previous comments suggested that i shouldn't return early after verifying the fCert. i think it makes more sense to go with what you said before, and verify both the fCert and notarization if necessary
#123 (comment)

yacovm · 2025-03-14T23:11:39Z

epoch.go

-	delete(e.replicationState.receivedFinalizationCertificates, nextSeqToCommit)
-	e.replicationState.maybeCollectFutureFinalizationCertificates(e.round, e.Storage.Height())
-	return e.processFinalizedBlock(&finalizedBlock)
+	// TODO: for this pr include a helper function to allow the node to deduce whether


TODO for me - better understand this piece in the next review round

yacovm · 2025-03-14T23:13:17Z

epoch.go

+		return e.processReplicationState()
+	}
+
+	// TODO: we need to make sure that we do not forget about notarizations missing for rounds < e.round


I'm also a bit puzzled regarding why have e.round here? Are we assuming that processReplicationState gave us sequences that advanced the round to the point where we might be right before an empty notarization?

i think this is an old comment. The point of it was to double check if we would properly replicate notarizations that we receive that are less than our current round. However, i don't think we can ever receive a notarization that is less than e.round since we only advance e.round after we notarize or finalize.

epoch_multinode_test.go

Signed-off-by: Sam Liokumovich <65994425+samliok@users.noreply.github.com>

yacovm · 2025-03-17T21:19:31Z

epoch.go

+
+		roundDigest := round.block.BlockHeader().Digest
+		notarizedDigest := notarizedBlock.notarization.Vote.BlockHeader.Digest
+		if !bytes.Equal(roundDigest[:], notarizedDigest[:]) {


we should rather adopt this block instead of what we have, because this block is the notarized one, so ours is an equivocated block.

However let's take care of this in a new PR, I want to freeze this one for major changes.

yacovm · 2025-03-17T22:08:56Z

epoch.go

 	if exists {
 		roundDigest := round.block.BlockHeader().Digest
-		seqDigest := finalizedBlock.FCert.Finalization.BlockHeader.Digest
+		seqDigest := fCert.Finalization.BlockHeader.Digest


I think I missed this before:

if !bytes.Equal(roundDigest[:], seqDigest[:]) { e.Logger.Warn("Received finalized block that is different from the one we have in the rounds map", zap.Stringer("roundDigest", roundDigest), zap.Stringer("seqDigest", seqDigest)) return nil }

But this shouldn't happen. we should return an error here instead, and mark the halted error.

epoch.go

yacovm · 2025-03-17T23:36:55Z

replication_test.go

+
+	normalNode1 := newSimplexNode(t, nodes[0], net, bb, newNodeConfig(nodes[0]))
+	normalNode2 := newSimplexNode(t, nodes[1], net, bb, newNodeConfig(nodes[1]))
+	newSimplexNode(t, nodes[2], net, bb, newNodeConfig(nodes[2]))


why is the third instance not given a variable?

there's no need to give it a variable name, it would be unused

yacovm · 2025-03-17T23:39:19Z

replication_test.go

+		msg := &simplex.Message{
+			EmptyNotarization: emptyNotarization,
+		}
+		normalNode2.e.Comm.SendMessage(msg, normalNode1.e.ID)


is it not possible to make the nodes send the messages from their own will, as done in the failover tests via waitForBlockProposerTimeout or something?

yes. Do you mind if I also do this in a follow up? I think it would add more to this PR since I'd need to change to change how advanceRound works as well as the number of nodes in the network for this test.

I did some changes locally and got the test passing with a flake, i'd like to look at and create a new PR with a clean diff.

yacovm · 2025-03-17T23:42:13Z

replication_test.go

+
+	fCert, _ := newFinalizationRecord(t, laggingNode.e.Logger, laggingNode.e.SignatureAggregator, block, nodes)
+
+	// we broadcast from the second node so that node 1 will be able to respond


The point of network tests is to simulate a real scenario of how nodes behave under certain network conditions of crashes and message omissions.

Forcing them to send messages as we see fit isn't ideal for testing a real scenario.

i agree this isn't ideal, but our current code for replication is hardcoded to be sent from the first node. I think this test should be updated in the same pr as #82.

yacovm · 2025-03-17T23:43:19Z

production code LGTM, will make another pass on the tests tomorrow.

replication_test.go

samliok and others added 30 commits February 27, 2025 18:01

add tests for notarization request

1741814

more debug

387c812

finish note replication tests

8bc37e1

add replication logic

841a5e6

deny finalizations

777df85

add num notarizations

b54a2ab

add notarization test

35b8354

finalization replication working

1aa1d9a

notarization working

b69fd95

update newEmptyNotarization

6b68f6c

update newEmptyNotarization

804ab13

standardize test suite

4539f77

add to rounds map

1626599

remove println

7c447bb

rename to messageFilter

cd22b4e

remove notarizeAndFinalize round

e0e22f1

update advance round

257b27e

comments

bfaf226

clean up replication file

fb9037d

first pass through replication

ec54a08

empty digest

05ebcd0

Merge branch 'main' into note-replication

49e37a1

Signed-off-by: Sam Liokumovich <65994425+samliok@users.noreply.github.com>

rebase many prs

0a4b142

fix failover test

edae90a

styling

f99d21d

reduce empty notarization helper

36691d5

Merge branch 'main' into note-replication

fbc0e79

Signed-off-by: Sam Liokumovich <65994425+samliok@users.noreply.github.com>

fix merge conflicts

506ed41

notarization tests passsing again

c323b81

go fmt

f12cfba

yacovm reviewed Mar 13, 2025

View reviewed changes

epoch.go Show resolved Hide resolved

samliok added 5 commits March 13, 2025 19:29

yacov code review, handle errors, verification, empty notarizations

de56015

add tests for quorum round

1f5307f

update e.lastblock and quorum round helpers

cb2b05b

separate helper function for HighestQuorumRound

4631325

load from storage

549a200

yacovm reviewed Mar 14, 2025

View reviewed changes

samliok and others added 7 commits March 17, 2025 08:31

highestSeqReceived to highestSeqObserved

967b2e5

rename receivedSeq to observedSeq

15313ac

add enabled check in maybeCollect

2e4cbee

rlock and replication enabled flag

6c473a0

isMessagePermitted helper

34956b8

remove comment

2325853

Merge branch 'main' into seq-replication

270ee1d

Signed-off-by: Sam Liokumovich <65994425+samliok@users.noreply.github.com>

yacovm reviewed Mar 17, 2025

View reviewed changes

epoch.go Outdated Show resolved Hide resolved

yacovm reviewed Mar 17, 2025

View reviewed changes

samliok mentioned this pull request Mar 17, 2025

Adopt replicated block instead of block in rounds map #125

Closed

samliok added 2 commits March 17, 2025 16:55

fix comment

49fa1bd

return err if digests not equal

5d3f269

yacovm reviewed Mar 18, 2025

View reviewed changes

replication_test.go Show resolved Hide resolved

yacovm approved these changes Mar 18, 2025

View reviewed changes

yacovm merged commit 407c708 into main Mar 18, 2025
5 checks passed

This was referenced Mar 19, 2025

Replicate Empty Notarizations #126

Merged

Request Latest Notarizations/Blocks #84

Closed

samliok deleted the seq-replication branch March 20, 2025 22:54


		fCert, _ := newFinalizationRecord(t, laggingNode.e.Logger, laggingNode.e.SignatureAggregator, block, nodes)

		// we broadcast from the second node so that node 1 will be able to respond

Uh oh!

Replication of Future Sequences & Rounds #123

Replication of Future Sequences & Rounds #123

Uh oh!

Conversation

samliok commented Mar 12, 2025

Uh oh!

Uh oh!

yacovm left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

samliok Mar 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

yacovm commented Mar 17, 2025

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

samliok Mar 18, 2025 •

edited

Loading