Skip to content

core/filtermaps: only use common ancestor snapshots #31668

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Apr 18, 2025

Conversation

zsfelfoldi
Copy link
Contributor

This PR makes the conditions for using a map rendering snapshot stricter so that whenever a reorg happens, only a snapshot of a common ancestor block can be used. The issue fixed in #31642 originated from using a snapshot that wasn't a common ancestor. For example in the following reorg scenario: A->B, then A->B2, then A->B2->C2, then A->B->C the last reorg triggered a render from snapshot B saved earlier. Now this is possible under certain conditions but extra care is needed, for example if block B crosses a map boundary then it should not be allowed. With the latest fix the checks are sufficient but I realized I would just feel safer if we disallowed this rare and risky scenario altogether and just render from snapshot A after the last reorg in the example above. The performance difference if a few milliseconds and it occurs rarely (about once a day on Holesky, probably much more rare on Mainnet).
Note that this PR only makes the snapshot conditions stricter and TestIndexerRandomRange does check that snapshots are still used whenever it's obviously possible (adding blocks after the current head without a reorg) so this change can be considered safe. Also I am running the unit tests and the fuzzer and everything seems to be fine.

@zsfelfoldi zsfelfoldi added this to the 1.15.9 milestone Apr 17, 2025
@rjl493456442
Copy link
Member

Can you elaborate why this extra check can filter out the scenario you mention? (the block cross the map)

e.g., A->B, block B crosses the map (i), the snapshot of B is made with map (i+1).

If mapIndex(i+1) is regarded as canonical (all the blocks within the i+1 are canonical) and mapIndex(i+2) is required for rendering, why B can't be chosen as the starting point?

@zsfelfoldi
Copy link
Contributor Author

Can you elaborate why this extra check can filter out the scenario you mention? (the block cross the map)

e.g., A->B, block B crosses the map (i), the snapshot of B is made with map (i+1).

If mapIndex(i+1) is regarded as canonical (all the blocks within the i+1 are canonical) and mapIndex(i+2) is required for rendering, why B can't be chosen as the starting point?

If block B is rendered on maps i and i+1 then rendering from snapshot B would start from map i+1. And it is fine if B is a common ancestor of the old and new heads and therefore it is part of both the current indexed and target views. On the other hand, in the example shown in the PR desc, if the chain was previously on a fork A->B2->C2 then the problem is that even map i will not match the target view because the last part of it corresponds to the first half of B2. So then the best thing we can do is render from snapshot A which is a snapshot of map i after block A has been rendered, and then start re-rendering block B from there.
Now in this case lastCanonicalMapBoundaryBefore should also detect that map i is not canonical so we should start rendering i and then lastCanonicalSnapshotOfMap will not even consider snapshot B which was made from map i+1 and will select snapshot A if available. I am just suggesting to add this extra check so that the correctness does not solely rely on the check in the other function but would also be checked where the snapshot is actually selected.

@rjl493456442
Copy link
Member

If block B is rendered on maps i and i+1 then rendering from snapshot B would start from map i+1. And it is fine if B is a common ancestor of the old and new heads and therefore it is part of both the current indexed and target views. On the other hand, in the example shown in the PR desc, if the chain was previously on a fork A->B2->C2 then the problem is that even map i will not match the target view because the last part of it corresponds to the first half of B2. So then the best thing we can do is render from snapshot A which is a snapshot of map i after block A has been rendered, and then start re-rendering block B from there.

Yes, it's absolutely right.

I am just suggesting to add this extra check so that the correctness does not solely rely on the check in the other function but would also be checked where the snapshot is actually selected.

I am fine with adding additional check, it just can't be wrong in practice.

@rjl493456442
Copy link
Member

If we decide to restart rendering from map i+1, which means the last block in map i is canonical.

In this scenario you mentioned, If block B is rendered on map i and i+1 and we start to render
mapi+1, at least it means the last block of map i matches with the canonical chain?

Sorry i am just a bit confused with so many conditions here.

@zsfelfoldi zsfelfoldi merged commit 2e0ad2c into ethereum:master Apr 18, 2025
4 checks passed
sivaratrisrinivas pushed a commit to sivaratrisrinivas/go-ethereum that referenced this pull request Apr 21, 2025
This PR makes the conditions for using a map rendering snapshot stricter
so that whenever a reorg happens, only a snapshot of a common ancestor
block can be used. The issue fixed in
ethereum#31642 originated from using
a snapshot that wasn't a common ancestor. For example in the following
reorg scenario: `A->B`, then `A->B2`, then `A->B2->C2`, then `A->B->C`
the last reorg triggered a render from snapshot `B` saved earlier. Now
this is possible under certain conditions but extra care is needed, for
example if block `B` crosses a map boundary then it should not be
allowed. With the latest fix the checks are sufficient but I realized I
would just feel safer if we disallowed this rare and risky scenario
altogether and just render from snapshot `A` after the last reorg in the
example above. The performance difference if a few milliseconds and it
occurs rarely (about once a day on Holesky, probably much more rare on
Mainnet).
Note that this PR only makes the snapshot conditions stricter and
`TestIndexerRandomRange` does check that snapshots are still used
whenever it's obviously possible (adding blocks after the current head
without a reorg) so this change can be considered safe. Also I am
running the unit tests and the fuzzer and everything seems to be fine.
0g-wh pushed a commit to 0glabs/0g-geth that referenced this pull request Apr 22, 2025
This PR makes the conditions for using a map rendering snapshot stricter
so that whenever a reorg happens, only a snapshot of a common ancestor
block can be used. The issue fixed in
ethereum#31642 originated from using
a snapshot that wasn't a common ancestor. For example in the following
reorg scenario: `A->B`, then `A->B2`, then `A->B2->C2`, then `A->B->C`
the last reorg triggered a render from snapshot `B` saved earlier. Now
this is possible under certain conditions but extra care is needed, for
example if block `B` crosses a map boundary then it should not be
allowed. With the latest fix the checks are sufficient but I realized I
would just feel safer if we disallowed this rare and risky scenario
altogether and just render from snapshot `A` after the last reorg in the
example above. The performance difference if a few milliseconds and it
occurs rarely (about once a day on Holesky, probably much more rare on
Mainnet).
Note that this PR only makes the snapshot conditions stricter and
`TestIndexerRandomRange` does check that snapshots are still used
whenever it's obviously possible (adding blocks after the current head
without a reorg) so this change can be considered safe. Also I am
running the unit tests and the fuzzer and everything seems to be fine.
0g-wh pushed a commit to 0g-wh/0g-geth that referenced this pull request May 8, 2025
This PR makes the conditions for using a map rendering snapshot stricter
so that whenever a reorg happens, only a snapshot of a common ancestor
block can be used. The issue fixed in
ethereum#31642 originated from using
a snapshot that wasn't a common ancestor. For example in the following
reorg scenario: `A->B`, then `A->B2`, then `A->B2->C2`, then `A->B->C`
the last reorg triggered a render from snapshot `B` saved earlier. Now
this is possible under certain conditions but extra care is needed, for
example if block `B` crosses a map boundary then it should not be
allowed. With the latest fix the checks are sufficient but I realized I
would just feel safer if we disallowed this rare and risky scenario
altogether and just render from snapshot `A` after the last reorg in the
example above. The performance difference if a few milliseconds and it
occurs rarely (about once a day on Holesky, probably much more rare on
Mainnet).
Note that this PR only makes the snapshot conditions stricter and
`TestIndexerRandomRange` does check that snapshots are still used
whenever it's obviously possible (adding blocks after the current head
without a reorg) so this change can be considered safe. Also I am
running the unit tests and the fuzzer and everything seems to be fine.
Rampex1 pushed a commit to streamingfast/go-ethereum that referenced this pull request May 15, 2025
This PR makes the conditions for using a map rendering snapshot stricter
so that whenever a reorg happens, only a snapshot of a common ancestor
block can be used. The issue fixed in
ethereum#31642 originated from using
a snapshot that wasn't a common ancestor. For example in the following
reorg scenario: `A->B`, then `A->B2`, then `A->B2->C2`, then `A->B->C`
the last reorg triggered a render from snapshot `B` saved earlier. Now
this is possible under certain conditions but extra care is needed, for
example if block `B` crosses a map boundary then it should not be
allowed. With the latest fix the checks are sufficient but I realized I
would just feel safer if we disallowed this rare and risky scenario
altogether and just render from snapshot `A` after the last reorg in the
example above. The performance difference if a few milliseconds and it
occurs rarely (about once a day on Holesky, probably much more rare on
Mainnet).
Note that this PR only makes the snapshot conditions stricter and
`TestIndexerRandomRange` does check that snapshots are still used
whenever it's obviously possible (adding blocks after the current head
without a reorg) so this change can be considered safe. Also I am
running the unit tests and the fuzzer and everything seems to be fine.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants