Skip to content

core/filtermaps: fix map renderer reorg issue #31642

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Apr 16, 2025

Conversation

zsfelfoldi
Copy link
Contributor

@zsfelfoldi zsfelfoldi commented Apr 15, 2025

This PR fixes a bug in the map renderer that sometimes used an obsolete block log value pointer to initialize the iterator for rendering from a snapshot. This bug was triggered by chain reorgs and sometimes caused indexing errors and invalid search results. A few other conditions are also made safer that were not reported to cause issues yet but could potentially be unsafe in some corner cases. A new unit test is also added that reproduced the bug but passes with the new fixes.

Fixes #31593
Might also fix #31589 though this issue has not been reproduced yet, but it appears to be related to a log index database corruption around a specific block, similarly to the other issue.

Note that running this branch resets and regenerates the log index database. For this purpose a Version field has been added to rawdb.FilterMapsRange which will also make this easier in the future if a breaking database change is needed or the existing one is considered potentially broken due to a bug, like in this case.

// (which happens during map rendering) without affecting the validity of
// copies made for snapshots during rendering.
func (fm filterMap) copy() filterMap {
// Appending to the rows of both the original map and the fast copy, or two fast
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Modification to the fast copy will affect the original map;
Appending to the fast copy won't affect the original map, isn't it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Technically true but the point is you should not append to both as they share the underlying array and would corrupt each other. In my use case the original keeps being appended as it is the current rendered map and snapsnots are made during the process. So I defined the semantics of fastCopy so that the original can be appended to and the copy should be used as read only. This rule guarantees safety.

mapIndex: r.currentMap.mapIndex,
lastBlock: r.iterator.blockNumber,
lastBlock: r.currentMap.lastBlock,
lastBlockId: r.f.targetView.getBlockId(r.currentMap.lastBlock),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Honestly it's a bit weird to get the block id with r.f.targetView, r.iterator.chainView might be a better candidate.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right, it does look better. It is an assumption that the two should always be the same, guaranteed by iterator.updateChainView that is always called after r.f.targetView could have been changed. But it is better to explicitly check rather that quietly assume so I also added a check.

@rjl493456442
Copy link
Member

Generally lgtm. Could you please point out which part of code was wrong before?

@zsfelfoldi
Copy link
Contributor Author

Generally lgtm. Could you please point out which part of code was wrong before?

It was newLogIteratorFromBlockDelimiter which should initialize an iterator at a block boundary. It is used when there is a recent snapshot available. Practically it is used every time we are adding a new block to the head of the existing chain, or if there was a small reorg but a valid snapshot is still available. It needs the log value pointer to the block boundary after the snapshot block so that it can resume rendering from there. Now this is available in the snapshot (renderedMap.headDelimiter) but eariler it wasn't stored there and originally I implemented it in newLogIteratorFromBlockDelimiter so that it only received the block number and determined the pointer after the block itself. In the head append case this came from filterMapsRange.headDelimiter but in the reorg case it did read getBlockLvPointer(blockNumber + 1). Now if it's reorging to previously unseen fork then it's fine because then the snapshot is a common ancestor and the pointer after it is correct in the database. But if it's reorging back to a previously processed fork (like A->B, then A->B2, then A->B2->C2, then A->B->C) then it will use the snapshot of B but initialize the pointer before C2 which will be incorrect if B and B2 generated a different number of log values.
Now it's simpler because headDelimiter is already available with every checkpoint and always consistent with the right fork, I just forgot to change newLogIteratorFromBlockDelimiter so that it actually uses that.

@fjl fjl merged commit ebb3eb2 into ethereum:master Apr 16, 2025
4 checks passed
@fjl fjl added this to the 1.15.9 milestone Apr 16, 2025
zsfelfoldi added a commit that referenced this pull request Apr 18, 2025
This PR makes the conditions for using a map rendering snapshot stricter
so that whenever a reorg happens, only a snapshot of a common ancestor
block can be used. The issue fixed in
#31642 originated from using
a snapshot that wasn't a common ancestor. For example in the following
reorg scenario: `A->B`, then `A->B2`, then `A->B2->C2`, then `A->B->C`
the last reorg triggered a render from snapshot `B` saved earlier. Now
this is possible under certain conditions but extra care is needed, for
example if block `B` crosses a map boundary then it should not be
allowed. With the latest fix the checks are sufficient but I realized I
would just feel safer if we disallowed this rare and risky scenario
altogether and just render from snapshot `A` after the last reorg in the
example above. The performance difference if a few milliseconds and it
occurs rarely (about once a day on Holesky, probably much more rare on
Mainnet).
Note that this PR only makes the snapshot conditions stricter and
`TestIndexerRandomRange` does check that snapshots are still used
whenever it's obviously possible (adding blocks after the current head
without a reorg) so this change can be considered safe. Also I am
running the unit tests and the fuzzer and everything seems to be fine.
ucwong pushed a commit to CortexFoundation/CortexTheseus that referenced this pull request Apr 18, 2025
This PR makes the conditions for using a map rendering snapshot stricter
so that whenever a reorg happens, only a snapshot of a common ancestor
block can be used. The issue fixed in
ethereum/go-ethereum#31642 originated from using
a snapshot that wasn't a common ancestor. For example in the following
reorg scenario: `A->B`, then `A->B2`, then `A->B2->C2`, then `A->B->C`
the last reorg triggered a render from snapshot `B` saved earlier. Now
this is possible under certain conditions but extra care is needed, for
example if block `B` crosses a map boundary then it should not be
allowed. With the latest fix the checks are sufficient but I realized I
would just feel safer if we disallowed this rare and risky scenario
altogether and just render from snapshot `A` after the last reorg in the
example above. The performance difference if a few milliseconds and it
occurs rarely (about once a day on Holesky, probably much more rare on
Mainnet).
Note that this PR only makes the snapshot conditions stricter and
`TestIndexerRandomRange` does check that snapshots are still used
whenever it's obviously possible (adding blocks after the current head
without a reorg) so this change can be considered safe. Also I am
running the unit tests and the fuzzer and everything seems to be fine.
zsfelfoldi added a commit that referenced this pull request Apr 20, 2025
This PR changes the chain view update mechanism of the log filter.
Previously the head updates were all wired through the indexer, even in
unindexed mode. This was both a bit weird and also unsafe as the
indexer's chain view was updates asynchronously with some delay, making
some log related tests flaky. Also, the reorg safety of the indexed
search was integrated with unindexed search in a weird way, relying on
`syncRange.ValidBlocks` in the unindexed case too, with a special
condition added to only consider the head of the valid range but not the
tail in the unindexed case.

In this PR the current chain view is directly accessible through the
filter backend and unindexed search is also chain view based, making it
inherently safe. The matcher sync mechanism is now only used for indexed
search as originally intended, removing a few ugly special conditions.

The PR is currently based on top of
#31642
Together they fix #31518
and replace #31542

---------

Co-authored-by: Gary Rong <garyrong0905@gmail.com>
sivaratrisrinivas pushed a commit to sivaratrisrinivas/go-ethereum that referenced this pull request Apr 21, 2025
This PR fixes a bug in the map renderer that sometimes used an obsolete
block log value pointer to initialize the iterator for rendering from a
snapshot. This bug was triggered by chain reorgs and sometimes caused
indexing errors and invalid search results. A few other conditions are
also made safer that were not reported to cause issues yet but could
potentially be unsafe in some corner cases. A new unit test is also
added that reproduced the bug but passes with the new fixes.

Fixes ethereum#31593
Might also fix ethereum#31589
though this issue has not been reproduced yet, but it appears to be
related to a log index database corruption around a specific block,
similarly to the other issue.

Note that running this branch resets and regenerates the log index
database. For this purpose a `Version` field has been added to
`rawdb.FilterMapsRange` which will also make this easier in the future
if a breaking database change is needed or the existing one is
considered potentially broken due to a bug, like in this case.
sivaratrisrinivas pushed a commit to sivaratrisrinivas/go-ethereum that referenced this pull request Apr 21, 2025
This PR makes the conditions for using a map rendering snapshot stricter
so that whenever a reorg happens, only a snapshot of a common ancestor
block can be used. The issue fixed in
ethereum#31642 originated from using
a snapshot that wasn't a common ancestor. For example in the following
reorg scenario: `A->B`, then `A->B2`, then `A->B2->C2`, then `A->B->C`
the last reorg triggered a render from snapshot `B` saved earlier. Now
this is possible under certain conditions but extra care is needed, for
example if block `B` crosses a map boundary then it should not be
allowed. With the latest fix the checks are sufficient but I realized I
would just feel safer if we disallowed this rare and risky scenario
altogether and just render from snapshot `A` after the last reorg in the
example above. The performance difference if a few milliseconds and it
occurs rarely (about once a day on Holesky, probably much more rare on
Mainnet).
Note that this PR only makes the snapshot conditions stricter and
`TestIndexerRandomRange` does check that snapshots are still used
whenever it's obviously possible (adding blocks after the current head
without a reorg) so this change can be considered safe. Also I am
running the unit tests and the fuzzer and everything seems to be fine.
sivaratrisrinivas pushed a commit to sivaratrisrinivas/go-ethereum that referenced this pull request Apr 21, 2025
This PR changes the chain view update mechanism of the log filter.
Previously the head updates were all wired through the indexer, even in
unindexed mode. This was both a bit weird and also unsafe as the
indexer's chain view was updates asynchronously with some delay, making
some log related tests flaky. Also, the reorg safety of the indexed
search was integrated with unindexed search in a weird way, relying on
`syncRange.ValidBlocks` in the unindexed case too, with a special
condition added to only consider the head of the valid range but not the
tail in the unindexed case.

In this PR the current chain view is directly accessible through the
filter backend and unindexed search is also chain view based, making it
inherently safe. The matcher sync mechanism is now only used for indexed
search as originally intended, removing a few ugly special conditions.

The PR is currently based on top of
ethereum#31642
Together they fix ethereum#31518
and replace ethereum#31542

---------

Co-authored-by: Gary Rong <garyrong0905@gmail.com>
0g-wh pushed a commit to 0glabs/0g-geth that referenced this pull request Apr 22, 2025
This PR fixes a bug in the map renderer that sometimes used an obsolete
block log value pointer to initialize the iterator for rendering from a
snapshot. This bug was triggered by chain reorgs and sometimes caused
indexing errors and invalid search results. A few other conditions are
also made safer that were not reported to cause issues yet but could
potentially be unsafe in some corner cases. A new unit test is also
added that reproduced the bug but passes with the new fixes.

Fixes ethereum#31593
Might also fix ethereum#31589
though this issue has not been reproduced yet, but it appears to be
related to a log index database corruption around a specific block,
similarly to the other issue.

Note that running this branch resets and regenerates the log index
database. For this purpose a `Version` field has been added to
`rawdb.FilterMapsRange` which will also make this easier in the future
if a breaking database change is needed or the existing one is
considered potentially broken due to a bug, like in this case.
0g-wh pushed a commit to 0glabs/0g-geth that referenced this pull request Apr 22, 2025
This PR makes the conditions for using a map rendering snapshot stricter
so that whenever a reorg happens, only a snapshot of a common ancestor
block can be used. The issue fixed in
ethereum#31642 originated from using
a snapshot that wasn't a common ancestor. For example in the following
reorg scenario: `A->B`, then `A->B2`, then `A->B2->C2`, then `A->B->C`
the last reorg triggered a render from snapshot `B` saved earlier. Now
this is possible under certain conditions but extra care is needed, for
example if block `B` crosses a map boundary then it should not be
allowed. With the latest fix the checks are sufficient but I realized I
would just feel safer if we disallowed this rare and risky scenario
altogether and just render from snapshot `A` after the last reorg in the
example above. The performance difference if a few milliseconds and it
occurs rarely (about once a day on Holesky, probably much more rare on
Mainnet).
Note that this PR only makes the snapshot conditions stricter and
`TestIndexerRandomRange` does check that snapshots are still used
whenever it's obviously possible (adding blocks after the current head
without a reorg) so this change can be considered safe. Also I am
running the unit tests and the fuzzer and everything seems to be fine.
0g-wh pushed a commit to 0glabs/0g-geth that referenced this pull request Apr 22, 2025
This PR changes the chain view update mechanism of the log filter.
Previously the head updates were all wired through the indexer, even in
unindexed mode. This was both a bit weird and also unsafe as the
indexer's chain view was updates asynchronously with some delay, making
some log related tests flaky. Also, the reorg safety of the indexed
search was integrated with unindexed search in a weird way, relying on
`syncRange.ValidBlocks` in the unindexed case too, with a special
condition added to only consider the head of the valid range but not the
tail in the unindexed case.

In this PR the current chain view is directly accessible through the
filter backend and unindexed search is also chain view based, making it
inherently safe. The matcher sync mechanism is now only used for indexed
search as originally intended, removing a few ugly special conditions.

The PR is currently based on top of
ethereum#31642
Together they fix ethereum#31518
and replace ethereum#31542

---------

Co-authored-by: Gary Rong <garyrong0905@gmail.com>
0g-wh pushed a commit to 0g-wh/0g-geth that referenced this pull request May 8, 2025
This PR fixes a bug in the map renderer that sometimes used an obsolete
block log value pointer to initialize the iterator for rendering from a
snapshot. This bug was triggered by chain reorgs and sometimes caused
indexing errors and invalid search results. A few other conditions are
also made safer that were not reported to cause issues yet but could
potentially be unsafe in some corner cases. A new unit test is also
added that reproduced the bug but passes with the new fixes.

Fixes ethereum#31593
Might also fix ethereum#31589
though this issue has not been reproduced yet, but it appears to be
related to a log index database corruption around a specific block,
similarly to the other issue.

Note that running this branch resets and regenerates the log index
database. For this purpose a `Version` field has been added to
`rawdb.FilterMapsRange` which will also make this easier in the future
if a breaking database change is needed or the existing one is
considered potentially broken due to a bug, like in this case.
0g-wh pushed a commit to 0g-wh/0g-geth that referenced this pull request May 8, 2025
This PR makes the conditions for using a map rendering snapshot stricter
so that whenever a reorg happens, only a snapshot of a common ancestor
block can be used. The issue fixed in
ethereum#31642 originated from using
a snapshot that wasn't a common ancestor. For example in the following
reorg scenario: `A->B`, then `A->B2`, then `A->B2->C2`, then `A->B->C`
the last reorg triggered a render from snapshot `B` saved earlier. Now
this is possible under certain conditions but extra care is needed, for
example if block `B` crosses a map boundary then it should not be
allowed. With the latest fix the checks are sufficient but I realized I
would just feel safer if we disallowed this rare and risky scenario
altogether and just render from snapshot `A` after the last reorg in the
example above. The performance difference if a few milliseconds and it
occurs rarely (about once a day on Holesky, probably much more rare on
Mainnet).
Note that this PR only makes the snapshot conditions stricter and
`TestIndexerRandomRange` does check that snapshots are still used
whenever it's obviously possible (adding blocks after the current head
without a reorg) so this change can be considered safe. Also I am
running the unit tests and the fuzzer and everything seems to be fine.
0g-wh pushed a commit to 0g-wh/0g-geth that referenced this pull request May 8, 2025
This PR changes the chain view update mechanism of the log filter.
Previously the head updates were all wired through the indexer, even in
unindexed mode. This was both a bit weird and also unsafe as the
indexer's chain view was updates asynchronously with some delay, making
some log related tests flaky. Also, the reorg safety of the indexed
search was integrated with unindexed search in a weird way, relying on
`syncRange.ValidBlocks` in the unindexed case too, with a special
condition added to only consider the head of the valid range but not the
tail in the unindexed case.

In this PR the current chain view is directly accessible through the
filter backend and unindexed search is also chain view based, making it
inherently safe. The matcher sync mechanism is now only used for indexed
search as originally intended, removing a few ugly special conditions.

The PR is currently based on top of
ethereum#31642
Together they fix ethereum#31518
and replace ethereum#31542

---------

Co-authored-by: Gary Rong <garyrong0905@gmail.com>
Rampex1 pushed a commit to streamingfast/go-ethereum that referenced this pull request May 15, 2025
This PR fixes a bug in the map renderer that sometimes used an obsolete
block log value pointer to initialize the iterator for rendering from a
snapshot. This bug was triggered by chain reorgs and sometimes caused
indexing errors and invalid search results. A few other conditions are
also made safer that were not reported to cause issues yet but could
potentially be unsafe in some corner cases. A new unit test is also
added that reproduced the bug but passes with the new fixes.

Fixes ethereum#31593
Might also fix ethereum#31589
though this issue has not been reproduced yet, but it appears to be
related to a log index database corruption around a specific block,
similarly to the other issue.

Note that running this branch resets and regenerates the log index
database. For this purpose a `Version` field has been added to
`rawdb.FilterMapsRange` which will also make this easier in the future
if a breaking database change is needed or the existing one is
considered potentially broken due to a bug, like in this case.
Rampex1 pushed a commit to streamingfast/go-ethereum that referenced this pull request May 15, 2025
This PR makes the conditions for using a map rendering snapshot stricter
so that whenever a reorg happens, only a snapshot of a common ancestor
block can be used. The issue fixed in
ethereum#31642 originated from using
a snapshot that wasn't a common ancestor. For example in the following
reorg scenario: `A->B`, then `A->B2`, then `A->B2->C2`, then `A->B->C`
the last reorg triggered a render from snapshot `B` saved earlier. Now
this is possible under certain conditions but extra care is needed, for
example if block `B` crosses a map boundary then it should not be
allowed. With the latest fix the checks are sufficient but I realized I
would just feel safer if we disallowed this rare and risky scenario
altogether and just render from snapshot `A` after the last reorg in the
example above. The performance difference if a few milliseconds and it
occurs rarely (about once a day on Holesky, probably much more rare on
Mainnet).
Note that this PR only makes the snapshot conditions stricter and
`TestIndexerRandomRange` does check that snapshots are still used
whenever it's obviously possible (adding blocks after the current head
without a reorg) so this change can be considered safe. Also I am
running the unit tests and the fuzzer and everything seems to be fine.
Rampex1 pushed a commit to streamingfast/go-ethereum that referenced this pull request May 15, 2025
This PR changes the chain view update mechanism of the log filter.
Previously the head updates were all wired through the indexer, even in
unindexed mode. This was both a bit weird and also unsafe as the
indexer's chain view was updates asynchronously with some delay, making
some log related tests flaky. Also, the reorg safety of the indexed
search was integrated with unindexed search in a weird way, relying on
`syncRange.ValidBlocks` in the unindexed case too, with a special
condition added to only consider the head of the valid range but not the
tail in the unindexed case.

In this PR the current chain view is directly accessible through the
filter backend and unindexed search is also chain view based, making it
inherently safe. The matcher sync mechanism is now only used for indexed
search as originally intended, removing a few ugly special conditions.

The PR is currently based on top of
ethereum#31642
Together they fix ethereum#31518
and replace ethereum#31542

---------

Co-authored-by: Gary Rong <garyrong0905@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Can't fetch some logs in new fork after a reorg (go-ethereum 1.15.7) Timeout Issue After Upgrading to Geth 1.15.7
3 participants