Skip to content

Draft: New EraE implementation #32157

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 25 commits into
base: master
Choose a base branch
from

Conversation

shazam8253
Copy link
Contributor

@shazam8253 shazam8253 commented Jul 7, 2025

Here is a draft for the New EraE implementation. The code follows along with the spec listed at this link.

Copy link
Member

@MariusVanDerWijden MariusVanDerWijden left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lint is failing, you can check on your machine with make lint

@@ -0,0 +1,429 @@
package era2
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

header

Copy link
Member

@lightclient lightclient left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a bunch of comments here - we still need to figure out how to call this module since era2 isn't very elegant. In the meantime should rename builder2.go to just builder.go and era2.go to era.go.

}
}

func (b *Builder) Add(header types.Header, body types.Body, receipts types.Receipts, blockhash common.Hash, blocknum uint64, td *big.Int, proof *Proof) error {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

blockhash and blocknum are already available via header, so no need to duplicate them

}
}

func (b *Builder) Add(header types.Header, body types.Body, receipts types.Receipts, blockhash common.Hash, blocknum uint64, td *big.Int, proof *Proof) error {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would separate this into Add and AddRLP

Comment on lines 83 to 86
headersRLP [][]byte
bodiesRLP [][]byte
receiptsRLP [][]byte
proofsRLP [][]byte
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would remove RLP suffix since it is pretty explanatory by the type [][]byte

"github.com/klauspost/compress/snappy"
)

type meta struct {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
type meta struct {
type metadata struct {

type meta struct {
start uint64 // start block number
count uint64 // number of blocks in the era
compcount uint64 // number of properties
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
compcount uint64 // number of properties
components uint64 // number of properties

start uint64 // start block number
count uint64 // number of blocks in the era
compcount uint64 // number of properties
filelen int64 // length of the file in bytes
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
filelen int64 // length of the file in bytes
length int64 // length of the file in bytes

mu *sync.Mutex
headeroff, bodyoff, receiptsoff, tdoff, proofsoff []uint64 // offsets for each entry type
indstart int64
rootheader uint64 // offset of the root header in the file if present
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is the use of this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when reading each era file I load in the index table into cache to make lookup faster, indstart is where the byte where the index table starts, and the root header is where the accumulator root is present when reading so it can seek there quickly since it is its own e2store object. The mutex should be removed though, forgot to do so very early on when I didn't understand what the file was doing I thought I wouldn't want it to read and write at the same time so put a lock.

m meta // metadata for the era2 file
mu *sync.Mutex
headeroff, bodyoff, receiptsoff, tdoff, proofsoff []uint64 // offsets for each entry type
indstart int64
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's this?

@lightclient
Copy link
Member

Also, please write a description for your PRs and try to fix the CI errors so your code is all green.

Fixed all issues including extra logic regarding proof types, modularizing some functions and refactoring code for correctness and readability.
@lightclient lightclient self-assigned this Jul 10, 2025

func (*BlockProofHistoricalSummariesDeneb) Variant() proofvar { return proofDeneb }

func proofVariantOf(p Proof) proofvar {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
func proofVariantOf(p Proof) proofvar {
func variantOf(p Proof) proofvar {

@@ -57,41 +57,33 @@ const (

type proofvar uint16
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
type proofvar uint16
type variant uint16

We know that the variant is for Proofs, should be at most a comment, not part of the variable name, otherwise you end up with the types like mev-boost :D

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

gotcha will make the change :)

Copy link
Member

@lightclient lightclient left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a lot of style nits, but the main problems we'll need to work through are:

  • utilize proof interface more, avoid handling variant values directly
  • remove unneeded struct fields
  • avoid building entire era file in memory, write incremental work to disk

buff buffer
off offsets

prooftype variant
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be removed. If you store the proofs as the interface object you will be able to determine the variant at anytime / alternatively if it makes better sense to store the bytes, then the bytes will already have included the variant type so it won't matter anymore.

off offsets

prooftype variant
tdsint []*big.Int
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be removed and tds in buffer should be big.Ints. Serialize them all at once during finalize.

type Builder struct {
w *e2store.Writer
buf *bytes.Buffer
sn *snappy.Writer
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
sn *snappy.Writer
snappy *snappy.Writer

}

// retrieves the raw body frame in bytes of a specific block
func (e *Era) GetRawBodyFrameByNumber(blockNum uint64) ([]byte, error) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what do you mean by "frame" here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is the raw bytes that are written, without snappy decoding it and rlp decoding

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you should decode the snappy here (and other methods like this) and return the RLP bytes

return io.ReadAll(r)
}

// retrieves the raw receipts frame in bytes of a specific block
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// retrieves the raw receipts frame in bytes of a specific block
// GetRawReceiptsFrameByNumber retrieves the raw receipts frame in bytes of a specific block.

return io.ReadAll(r)
}

// retrieves the raw proof frame in bytes of a specific block proof
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// retrieves the raw proof frame in bytes of a specific block proof
// GetRawProofFrameByNumber retrieves the raw proof frame in bytes of a specific block proof.

return io.ReadAll(r)
}

// loads in the index table containing all offsets and caches it
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// loads in the index table containing all offsets and caches it
// loadIndex loads in the index table containing all offsets and caches it.

// Getter methods to calculate offset of a specific component in the file.
func (e *Era) headerOff(num uint64) (uint64, error) { return e.indexOffset(num, compHeader) }
func (e *Era) bodyOff(num uint64) (uint64, error) { return e.indexOffset(num, compBody) }
func (e *Era) rcptOff(num uint64) (uint64, error) { return e.indexOffset(num, compReceipts) }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
func (e *Era) rcptOff(num uint64) (uint64, error) { return e.indexOffset(num, compReceipts) }
func (e *Era) receiptOff(num uint64) (uint64, error) { return e.indexOffset(num, compReceipts) }

Really no reason to ever abbreviate by taking out vowels in golang.

cmd/utils/cmd.go Outdated
@@ -403,8 +410,8 @@ func ExportAppendChain(blockchain *core.BlockChain, fn string, first uint64, las

// ExportHistory exports blockchain history into the specified directory,
// following the Era format.
func ExportHistory(bc *core.BlockChain, dir string, first, last, step uint64) error {
log.Info("Exporting blockchain history", "dir", dir)
func ExportHistory(bc *core.BlockChain, dir string, first, last, step uint64, f ExportFormat) error {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would not use an enum to dictate the output format, this should be done via the method naming, e.g.

ExportHistoryEra1(..)
ExportHistoryEraE(..)

cmd/utils/cmd.go Outdated
Comment on lines 433 to 445
if f == Era1 {
filename = era.Filename
newBuilder = func(w io.Writer) any { return era.NewBuilder(w) }
add = func(b any, blk *types.Block, rcpt types.Receipts, td *big.Int) error {
return b.(*era.Builder).Add(blk, rcpt, td)
}
} else {
filename = era2.Filename
newBuilder = func(w io.Writer) any { return era2.NewBuilder(w) }
add = func(b any, blk *types.Block, rcpt types.Receipts, td *big.Int) error {
return b.(*era2.Builder).Add(*blk.Header(), *blk.Body(), rcpt, td, nil)
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is kind of impressive, but also is what an Interface is for :)

If you want different builders with the same methods (like Add) you can create the interface type and implement the method for both.

receipts := bc.GetReceiptsByHash(block.Hash())
if receipts == nil {
return fmt.Errorf("export failed on #%d: receipts not found", n)
rcpt := bc.GetReceiptsByHash(blk.Hash())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
rcpt := bc.GetReceiptsByHash(blk.Hash())
receipts := bc.GetReceiptsByHash(blk.Hash())


for j := uint64(0); j < step && batch+j <= last; j++ {
n := batch + j
blk := bc.GetBlockByNumber(n)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
blk := bc.GetBlockByNumber(n)
block := bc.GetBlockByNumber(n)

cmd/utils/cmd.go Outdated
)
if block == nil {
return fmt.Errorf("export failed on #%d: not found", n)
bldr := newBuilder(f)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
bldr := newBuilder(f)
builder := newBuilder(f)

tds []*big.Int
}

// The offsets holds the offsets of the different block components in the e2store file. Eventually these offsets will be used to write the index table at the end of the file.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// The offsets holds the offsets of the different block components in the e2store file. Eventually these offsets will be used to write the index table at the end of the file.
// offsets holds the offsets of the different block components in the e2store file. Eventually these offsets will be used to write the index table at the end of the file.

Comment on lines 95 to 97
buf *bytes.Buffer

buff buffer
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we still have buf and buff, I think we discussed removing buf since it isn't used until the end?

// Add writes a block entry, its reciepts, and optionally its proofs as well into the e2store file.
func (b *Builder) Add(header types.Header, body types.Body, receipts types.Receipts, td *big.Int, proof Proof) error {
if len(b.buff.headers) == 0 { // first block determines wether proofs are expected
b.expectsProofs = proof != nil
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i don't think you need to track this explicitly, just check if b.buff.proofs != nil and if proof == nil or vice versa. only special case is first block, which you allow for the b.buff.proofs to be nil even if proof is non-nil.

if err != nil {
return common.Hash{}, fmt.Errorf("compute accumulator: %w", err)
}
if n, err := b.w.Write(TypeAccumulatorRoot, accRoot[:]); err != nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks like this still needs to be addressed

}

// retrieves the raw body frame in bytes of a specific block
func (e *Era) GetRawBodyFrameByNumber(blockNum uint64) ([]byte, error) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you should decode the snappy here (and other methods like this) and return the RLP bytes

Comment on lines 15 to 21
type Iterator interface {
Next() bool
Number() uint64
Block() (*types.Block, error)
Receipts() (types.Receipts, error)
Error() error
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This interface looks right, but it's in the wrong place. We should define it in the era package. I will take a stab at this.

Comment on lines 48 to 57
func (era1Format) Filename(n string, e int, h common.Hash) string { return era.Filename(n, e, h) }
func (era1Format) NewBuilder(w io.Writer) Builder { return &era1Builder{era.NewBuilder(w)} }
func (era1Format) ReadDir(dir, net string) ([]string, error) { return era.ReadDir(dir, net) }
func (era1Format) NewIterator(f *os.File) (Iterator, error) {
e, err := era.From(f)
if err != nil {
return nil, err
}
return era.NewIterator(e)
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you shouldn't need to redefine these methods to satisfy the interface - if the type implements the interface methods then it can be accepted anywhere the interface is accepted

cmd/utils/cmd.go Outdated
@@ -248,11 +250,11 @@ func readList(filename string) ([]string, error) {
// ImportHistory imports Era1 files containing historical block information,
// starting from genesis. The assumption is held that the provided chain
// segment in Era1 file should all be canonical and verified.
func ImportHistory(chain *core.BlockChain, dir string, network string) error {
func ImportHistory(chain *core.BlockChain, dir string, network string, format Format) error {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So on this method and ExportHistory we can avoid using this Format type by just passing in the functions we need. For example, here we need a way to read all the entries and a way to create an iterator from an open file. If we just pass those two functions in, we can create an if statement in the cli handling so that we can reuse the function.

The issue with format is it is kind of a superfluous type.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants