core/filtermaps: fix map renderer reorg issue #31642

zsfelfoldi · 2025-04-15T14:37:12Z

This PR fixes a bug in the map renderer that sometimes used an obsolete block log value pointer to initialize the iterator for rendering from a snapshot. This bug was triggered by chain reorgs and sometimes caused indexing errors and invalid search results. A few other conditions are also made safer that were not reported to cause issues yet but could potentially be unsafe in some corner cases. A new unit test is also added that reproduced the bug but passes with the new fixes.

Fixes #31593
Might also fix #31589 though this issue has not been reproduced yet, but it appears to be related to a log index database corruption around a specific block, similarly to the other issue.

Note that running this branch resets and regenerates the log index database. For this purpose a Version field has been added to rawdb.FilterMapsRange which will also make this easier in the future if a breaking database change is needed or the existing one is considered potentially broken due to a bug, like in this case.

rjl493456442 · 2025-04-16T01:44:49Z

core/filtermaps/filtermaps.go

 // (which happens during map rendering) without affecting the validity of
 // copies made for snapshots during rendering.
-func (fm filterMap) copy() filterMap {
+// Appending to the rows of both the original map and the fast copy, or two fast


Modification to the fast copy will affect the original map;
Appending to the fast copy won't affect the original map, isn't it?

Technically true but the point is you should not append to both as they share the underlying array and would corrupt each other. In my use case the original keeps being appended as it is the current rendered map and snapsnots are made during the process. So I defined the semantics of fastCopy so that the original can be appended to and the copy should be used as read only. This rule guarantees safety.

rjl493456442 · 2025-04-16T02:17:32Z

core/filtermaps/map_renderer.go

 		mapIndex:      r.currentMap.mapIndex,
-		lastBlock:     r.iterator.blockNumber,
+		lastBlock:     r.currentMap.lastBlock,
 		lastBlockId:   r.f.targetView.getBlockId(r.currentMap.lastBlock),


Honestly it's a bit weird to get the block id with r.f.targetView, r.iterator.chainView might be a better candidate.

You are right, it does look better. It is an assumption that the two should always be the same, guaranteed by iterator.updateChainView that is always called after r.f.targetView could have been changed. But it is better to explicitly check rather that quietly assume so I also added a check.

rjl493456442 · 2025-04-16T03:07:46Z

Generally lgtm. Could you please point out which part of code was wrong before?

zsfelfoldi · 2025-04-16T07:33:44Z

Generally lgtm. Could you please point out which part of code was wrong before?

It was newLogIteratorFromBlockDelimiter which should initialize an iterator at a block boundary. It is used when there is a recent snapshot available. Practically it is used every time we are adding a new block to the head of the existing chain, or if there was a small reorg but a valid snapshot is still available. It needs the log value pointer to the block boundary after the snapshot block so that it can resume rendering from there. Now this is available in the snapshot (renderedMap.headDelimiter) but eariler it wasn't stored there and originally I implemented it in newLogIteratorFromBlockDelimiter so that it only received the block number and determined the pointer after the block itself. In the head append case this came from filterMapsRange.headDelimiter but in the reorg case it did read getBlockLvPointer(blockNumber + 1). Now if it's reorging to previously unseen fork then it's fine because then the snapshot is a common ancestor and the pointer after it is correct in the database. But if it's reorging back to a previously processed fork (like A->B, then A->B2, then A->B2->C2, then A->B->C) then it will use the snapshot of B but initialize the pointer before C2 which will be incorrect if B and B2 generated a different number of log values.
Now it's simpler because headDelimiter is already available with every checkpoint and always consistent with the right fork, I just forgot to change newLogIteratorFromBlockDelimiter so that it actually uses that.

This PR makes the conditions for using a map rendering snapshot stricter so that whenever a reorg happens, only a snapshot of a common ancestor block can be used. The issue fixed in #31642 originated from using a snapshot that wasn't a common ancestor. For example in the following reorg scenario: `A->B`, then `A->B2`, then `A->B2->C2`, then `A->B->C` the last reorg triggered a render from snapshot `B` saved earlier. Now this is possible under certain conditions but extra care is needed, for example if block `B` crosses a map boundary then it should not be allowed. With the latest fix the checks are sufficient but I realized I would just feel safer if we disallowed this rare and risky scenario altogether and just render from snapshot `A` after the last reorg in the example above. The performance difference if a few milliseconds and it occurs rarely (about once a day on Holesky, probably much more rare on Mainnet). Note that this PR only makes the snapshot conditions stricter and `TestIndexerRandomRange` does check that snapshots are still used whenever it's obviously possible (adding blocks after the current head without a reorg) so this change can be considered safe. Also I am running the unit tests and the fuzzer and everything seems to be fine.

This PR makes the conditions for using a map rendering snapshot stricter so that whenever a reorg happens, only a snapshot of a common ancestor block can be used. The issue fixed in ethereum/go-ethereum#31642 originated from using a snapshot that wasn't a common ancestor. For example in the following reorg scenario: `A->B`, then `A->B2`, then `A->B2->C2`, then `A->B->C` the last reorg triggered a render from snapshot `B` saved earlier. Now this is possible under certain conditions but extra care is needed, for example if block `B` crosses a map boundary then it should not be allowed. With the latest fix the checks are sufficient but I realized I would just feel safer if we disallowed this rare and risky scenario altogether and just render from snapshot `A` after the last reorg in the example above. The performance difference if a few milliseconds and it occurs rarely (about once a day on Holesky, probably much more rare on Mainnet). Note that this PR only makes the snapshot conditions stricter and `TestIndexerRandomRange` does check that snapshots are still used whenever it's obviously possible (adding blocks after the current head without a reorg) so this change can be considered safe. Also I am running the unit tests and the fuzzer and everything seems to be fine.

This PR changes the chain view update mechanism of the log filter. Previously the head updates were all wired through the indexer, even in unindexed mode. This was both a bit weird and also unsafe as the indexer's chain view was updates asynchronously with some delay, making some log related tests flaky. Also, the reorg safety of the indexed search was integrated with unindexed search in a weird way, relying on `syncRange.ValidBlocks` in the unindexed case too, with a special condition added to only consider the head of the valid range but not the tail in the unindexed case. In this PR the current chain view is directly accessible through the filter backend and unindexed search is also chain view based, making it inherently safe. The matcher sync mechanism is now only used for indexed search as originally intended, removing a few ugly special conditions. The PR is currently based on top of #31642 Together they fix #31518 and replace #31542 --------- Co-authored-by: Gary Rong <garyrong0905@gmail.com>

This PR fixes a bug in the map renderer that sometimes used an obsolete block log value pointer to initialize the iterator for rendering from a snapshot. This bug was triggered by chain reorgs and sometimes caused indexing errors and invalid search results. A few other conditions are also made safer that were not reported to cause issues yet but could potentially be unsafe in some corner cases. A new unit test is also added that reproduced the bug but passes with the new fixes. Fixes ethereum#31593 Might also fix ethereum#31589 though this issue has not been reproduced yet, but it appears to be related to a log index database corruption around a specific block, similarly to the other issue. Note that running this branch resets and regenerates the log index database. For this purpose a `Version` field has been added to `rawdb.FilterMapsRange` which will also make this easier in the future if a breaking database change is needed or the existing one is considered potentially broken due to a bug, like in this case.

This PR makes the conditions for using a map rendering snapshot stricter so that whenever a reorg happens, only a snapshot of a common ancestor block can be used. The issue fixed in ethereum#31642 originated from using a snapshot that wasn't a common ancestor. For example in the following reorg scenario: `A->B`, then `A->B2`, then `A->B2->C2`, then `A->B->C` the last reorg triggered a render from snapshot `B` saved earlier. Now this is possible under certain conditions but extra care is needed, for example if block `B` crosses a map boundary then it should not be allowed. With the latest fix the checks are sufficient but I realized I would just feel safer if we disallowed this rare and risky scenario altogether and just render from snapshot `A` after the last reorg in the example above. The performance difference if a few milliseconds and it occurs rarely (about once a day on Holesky, probably much more rare on Mainnet). Note that this PR only makes the snapshot conditions stricter and `TestIndexerRandomRange` does check that snapshots are still used whenever it's obviously possible (adding blocks after the current head without a reorg) so this change can be considered safe. Also I am running the unit tests and the fuzzer and everything seems to be fine.

This PR changes the chain view update mechanism of the log filter. Previously the head updates were all wired through the indexer, even in unindexed mode. This was both a bit weird and also unsafe as the indexer's chain view was updates asynchronously with some delay, making some log related tests flaky. Also, the reorg safety of the indexed search was integrated with unindexed search in a weird way, relying on `syncRange.ValidBlocks` in the unindexed case too, with a special condition added to only consider the head of the valid range but not the tail in the unindexed case. In this PR the current chain view is directly accessible through the filter backend and unindexed search is also chain view based, making it inherently safe. The matcher sync mechanism is now only used for indexed search as originally intended, removing a few ugly special conditions. The PR is currently based on top of ethereum#31642 Together they fix ethereum#31518 and replace ethereum#31542 --------- Co-authored-by: Gary Rong <garyrong0905@gmail.com>