Skip to content

core: fix sync reset in pruned nodes #31638

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Apr 17, 2025

Conversation

s1na
Copy link
Contributor

@s1na s1na commented Apr 14, 2025

This is an attempt at fixing #31601. I think what happens is the startup logic will try to get the full block body (it's bc.loadLastState) and fail because genesis block has been pruned from the freezer. This will cause it to keep repeating the reset logic, causing a deadlock.

This can happen when due to an unsuccessful sync we don't have the state for the head (or any other state) fully, and try to redo the snap sync.

@s1na
Copy link
Contributor Author

s1na commented Apr 14, 2025

The alternative would be to relax the condition in loadLastState to allow only the header to exist when we are at genesis block, like so:

        var headBlock *types.Block
        if headBlockNum == 0 {
            header := bc.GetHeaderByHash(head)
            headBlock = types.NewBlockWithHeader(header)
        } else {
	    headBlock := bc.GetBlockByHash(head)
	}
	if headBlock == nil {
		// Corrupt or empty database, init from scratch
		log.Warn("Head block missing, resetting chain", "hash", head)
		return bc.Reset()
	}
	
	```

@fjl fjl added this to the 1.15.9 milestone Apr 15, 2025
@s1na s1na marked this pull request as ready for review April 16, 2025 14:03
@s1na
Copy link
Contributor Author

s1na commented Apr 16, 2025

I was trying to test this patch and hit this panic:

INFO [04-16|15:29:27.414] Loaded most recent local snap block      number=3,428,602 hash=e71201..025ec2 age=1y11mo3w
INFO [04-16|15:29:27.414] Loaded last snap-sync pivot marker       number=8,131,055
INFO [04-16|15:29:27.414] Genesis state is missing, wait state sync
WARN [04-16|15:29:27.414] Snapshot maintenance disabled (syncing)
INFO [04-16|15:29:27.414] Initialized transaction indexer          range="last 135000 blocks"
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0xaadcb9]

goroutine 54 [running]:
github.com/ethereum/go-ethereum/core/types.(*Block).NumberU64(...)
        github.com/ethereum/go-ethereum/core/types/block.go:392
github.com/ethereum/go-ethereum/core.(*txIndexer).loop(0xc0003b53c0, 0xc000429408)
        github.com/ethereum/go-ethereum/core/txindexer.go:208 +0x99
created by github.com/ethereum/go-ethereum/core.newTxIndexer in goroutine 1
        github.com/ethereum/go-ethereum/core/txindexer.go:70 +0x146

@s1na
Copy link
Contributor Author

s1na commented Apr 16, 2025

Ok after fixing this I am now getting this repeated in my logs and it doesn't proceed:

WARN [04-16|16:01:13.097] Rewinding blockchain to block            target=3,428,589
INFO [04-16|16:01:13.142] Loaded most recent local header          number=3,428,589 hash=0ff518..40de0d age=1y11mo3w
INFO [04-16|16:01:13.142] Loaded most recent local block           number=0         hash=25a5cc..3e6dd9 age=3y7mo1d
INFO [04-16|16:01:13.142] Loaded most recent local snap block      number=3,428,589 hash=0ff518..40de0d age=1y11mo3w
INFO [04-16|16:01:13.142] Loaded last snap-sync pivot marker       number=8,131,277
ERROR[04-16|16:01:13.142] Current block not found in database      block=0 hash=25a5cc..3e6dd9
ERROR[04-16|16:01:13.142] Beacon backfilling failed                err="current block missing: #0 [25a5cc10..]"

@s1na
Copy link
Contributor Author

s1na commented Apr 16, 2025

After adding the exception in SetHead:

INFO [04-16|16:12:05.492] Forkchoice requested sync to new head    number=8,131,372 hash=bc4275..c5b543 finalized=unknown
INFO [04-16|16:12:05.576] Syncing beacon headers                   downloaded=512 left=0 eta=0s
ERROR[04-16|16:12:05.576] Latest filled block is not available
INFO [04-16|16:12:05.577] Block synchronisation started
ERROR[04-16|16:12:05.577] Reject duplicated disable operation
WARN [04-16|16:12:05.590] Rewinding blockchain to block            target=3,428,563
INFO [04-16|16:12:05.626] Loaded most recent local header          number=3,428,563 hash=9513d0..81c1cb age=1y11mo3w
INFO [04-16|16:12:05.626] Loaded most recent local block           number=0         hash=25a5cc..3e6dd9 age=3y7mo1d
INFO [04-16|16:12:05.627] Loaded most recent local snap block      number=3,428,563 hash=9513d0..81c1cb age=1y11mo3w
INFO [04-16|16:12:05.627] Loaded last snap-sync pivot marker       number=8,131,308
INFO [04-16|16:12:05.627] Truncated excess ancient chain segment   oldhead=3,428,564 newhead=3,428,563
CRIT [04-16|16:12:05.627] Failed to reset txpool state             err="missing trie node 5eb6e371a698b8d68f665192350ffcecbbbf322916f4b51bd79bb6887da3f494 (path ) state 0x5eb6e371a698b8d68f665192350ffcecbbbf322916f4b51bd79bb6887da3f494 is not available"

rjl493456442
rjl493456442 previously approved these changes Apr 17, 2025
@s1na s1na changed the title core: back-up to kvdb for a pruned block core: fix sync reset in pruned nodes Apr 17, 2025
@s1na
Copy link
Contributor Author

s1na commented Apr 17, 2025

It works now 👍

@rjl493456442 rjl493456442 merged commit e444823 into ethereum:master Apr 17, 2025
3 of 4 checks passed
sivaratrisrinivas pushed a commit to sivaratrisrinivas/go-ethereum that referenced this pull request Apr 21, 2025
This is an attempt at fixing ethereum#31601. I think what happens is the startup
logic will try to get the full block body (it's `bc.loadLastState`) and
fail because genesis block has been pruned from the freezer. This will
cause it to keep repeating the reset logic, causing a deadlock.

This can happen when due to an unsuccessful sync we don't have the state
for the head (or any other state) fully, and try to redo the snap sync.

---------

Co-authored-by: Gary Rong <garyrong0905@gmail.com>
0g-wh pushed a commit to 0glabs/0g-geth that referenced this pull request Apr 22, 2025
This is an attempt at fixing ethereum#31601. I think what happens is the startup
logic will try to get the full block body (it's `bc.loadLastState`) and
fail because genesis block has been pruned from the freezer. This will
cause it to keep repeating the reset logic, causing a deadlock.

This can happen when due to an unsuccessful sync we don't have the state
for the head (or any other state) fully, and try to redo the snap sync.

---------

Co-authored-by: Gary Rong <garyrong0905@gmail.com>
0g-wh pushed a commit to 0g-wh/0g-geth that referenced this pull request May 8, 2025
This is an attempt at fixing ethereum#31601. I think what happens is the startup
logic will try to get the full block body (it's `bc.loadLastState`) and
fail because genesis block has been pruned from the freezer. This will
cause it to keep repeating the reset logic, causing a deadlock.

This can happen when due to an unsuccessful sync we don't have the state
for the head (or any other state) fully, and try to redo the snap sync.

---------

Co-authored-by: Gary Rong <garyrong0905@gmail.com>
Rampex1 pushed a commit to streamingfast/go-ethereum that referenced this pull request May 15, 2025
This is an attempt at fixing ethereum#31601. I think what happens is the startup
logic will try to get the full block body (it's `bc.loadLastState`) and
fail because genesis block has been pruned from the freezer. This will
cause it to keep repeating the reset logic, causing a deadlock.

This can happen when due to an unsuccessful sync we don't have the state
for the head (or any other state) fully, and try to redo the snap sync.

---------

Co-authored-by: Gary Rong <garyrong0905@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants