Skip to content

core/filtermaps: clone cached slices, fix tempRange #31680

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Apr 21, 2025

Conversation

zsfelfoldi
Copy link
Contributor

This PR ensures that caching a slice or a slice of slices will never affect the original version by always cloning a slice fetched from cache if it is not used in a guaranteed read only way.

@zsfelfoldi zsfelfoldi added this to the 1.15.9 milestone Apr 20, 2025
@@ -610,7 +610,9 @@ func (f *FilterMaps) storeFilterMapRowsOfGroup(batch ethdb.Batch, mapIndices []u
if uint32(len(mapIndices)) != f.baseRowGroupLength { // skip base rows read if all rows are replaced
var ok bool
baseRows, ok = f.baseRowsCache.Get(baseMapRowIndex)
if !ok {
if ok {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we also clone the slice in getFilterMapRow?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do clone baseRow and if required then we append extRow which is always retrieved from the database so it's guaranteed that the returned result does not share an underlying array with anything.

rjl493456442
rjl493456442 previously approved these changes Apr 20, 2025
Copy link
Member

@rjl493456442 rjl493456442 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@zsfelfoldi zsfelfoldi changed the title core/filtermaps: always clone cached slices before modification core/filtermaps: clone cached slices, fix tempRange Apr 20, 2025
@zsfelfoldi
Copy link
Contributor Author

The slice cloning, though definitely necessary to be safe, did not solve the issue so I added a lot of debug prints and finally conclusively traced the error back to its origin. When I add a new block and overwrite the last map with a new rendered one, I temporarily use tempRange as the indexed range which excludes the last map that can be inconsistent during the write (unfortunately it can take too long to make it atomic). During this time, even if the matcher did find some potential matches, getLogByLvIndex will not return results for this temporarily invalidated map (and even if it did, the filter logic would trim them as potentially invalid). This is fine, but what I noticed in my debug prints is that the logs of the last fully rendered block in the previous, untouched map are also unavailable during this period, and the mismatches exactly correspond with these blocks.
I added a unit test that reproduces this error by trying to fetch the first and last valid logs of the temporary range from the map renderer callback. It always and easily reproduced the error which traced back to tempRange being invalid. headDelimiter was set to 0 as the indexed view's head is not indexed in this temporary state, but headIndexed was still true. This was an inconsistent state which caused getBlockLvPointer to return 0 for the last, partially indexed block which caused the binary search if getLogByLvIndex to find the wrong block and not find the logs at the requested position.
I believe the cause of this issue has been conclusively determined now, and it was a different issue from the previously fixed one related to the render snapshots. That one caused indexing errors and permanent incorrect results while this one only affected the search results and only happened once if the search coincided with the map write.

@zsfelfoldi
Copy link
Contributor Author

I also changed getBlockLvPointer to return headDelimiter + 1 instead of headDelimiter for future blocks if the head is rendered; this difference currently does not affect the operation at all but headDelimiter + 1 is the position where the next block will start once added so this is more logically consistent.

@@ -656,7 +659,7 @@ func (f *FilterMaps) mapRowIndex(mapIndex, rowIndex uint32) uint64 {
// called from outside the indexerLoop goroutine.
func (f *FilterMaps) getBlockLvPointer(blockNumber uint64) (uint64, error) {
if blockNumber >= f.indexedRange.blocks.AfterLast() && f.indexedRange.headIndexed {
return f.indexedRange.headDelimiter, nil
return f.indexedRange.headDelimiter + 1, nil
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we error out if blockNumber > f.indexedRange.blocks.AfterLast()?

Theoretically, the returned log index is for the next block and it must be consecutive with the latest indexed one.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the search works async (range might change any time during the process) the underlying functions of the matcher backend do not fail when they are queried out of range, they just return results so that the search will not find any additional items (the head part of the results will be invalidated anyways in this case). getLogByLvIndex also silently returns with no result end no errors.
getBlockLvPointer also just returns the beginning of the next future block, even if queried further in the future, so that the search will just search until the end of the indexed range. The point is that if it's a long search then the majority of results will still be valuable and the head part will be searched again anyways so just do something that lets the operation finish safely.
We could have a different mechanism but this seems to work safely for now and we have to do the fork release today so I would not change it fundamentally right now.

@@ -536,6 +536,7 @@ func (r *mapRenderer) getTempRange() (filterMapsRange, error) {
} else {
tempRange.blocks.SetAfterLast(0)
}
tempRange.headIndexed = false
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it the real root cause?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes

@zsfelfoldi zsfelfoldi merged commit 14f1543 into ethereum:master Apr 21, 2025
3 of 4 checks passed
0g-wh pushed a commit to 0glabs/0g-geth that referenced this pull request Apr 22, 2025
This PR ensures that caching a slice or a slice of slices will never
affect the original version by always cloning a slice fetched from cache
if it is not used in a guaranteed read only way.
zsfelfoldi added a commit that referenced this pull request May 2, 2025
This PR fixes the out-of-range block number logic of `getBlockLvPointer`
which sometimes caused searches to fail if the head was updated in the
wrong moment. This logic ensures that querying the pointer of a future
block returns the pointer after the last fully indexed block (instead of
failing) and therefore an async range update will not cause the search
to fail. Earier this behaviour only worked when `headIndexed` was true
and `headDelimiter` pointed to the end of the indexed range. Now it also
works for an unfinished index.

This logic is also moved from `FilterMaps.getBlockLvPointer` to
`FilterMapsMatcherBackend.GetBlockLvPointer` because it is only required
by the search anyways. `FilterMaps.getBlockLvPointer` now only returns a
pointer for existing blocks, consistently with how it is used in the
indexer/renderer.

Note that this unhandled case has been present in the code for a long
time but went unnoticed because either one of two previously fixed bugs
did prevent it from being triggered; the incorrectly positive
`tempRange.headIndexed` (fixed in
#31680), though caused other
problems, prevented this one from being triggered as with a positive
`headIndexed` no database read was triggered in `getBlockLvPointer`.
Also, the unnecessary `indexLock` in `synced()` (fixed in
#31708) usually did prevent
the search seeing the temp range and therefore avoided noticeable
issues.
ucwong pushed a commit to CortexFoundation/CortexTheseus that referenced this pull request May 7, 2025
This PR fixes the out-of-range block number logic of `getBlockLvPointer`
which sometimes caused searches to fail if the head was updated in the
wrong moment. This logic ensures that querying the pointer of a future
block returns the pointer after the last fully indexed block (instead of
failing) and therefore an async range update will not cause the search
to fail. Earier this behaviour only worked when `headIndexed` was true
and `headDelimiter` pointed to the end of the indexed range. Now it also
works for an unfinished index.

This logic is also moved from `FilterMaps.getBlockLvPointer` to
`FilterMapsMatcherBackend.GetBlockLvPointer` because it is only required
by the search anyways. `FilterMaps.getBlockLvPointer` now only returns a
pointer for existing blocks, consistently with how it is used in the
indexer/renderer.

Note that this unhandled case has been present in the code for a long
time but went unnoticed because either one of two previously fixed bugs
did prevent it from being triggered; the incorrectly positive
`tempRange.headIndexed` (fixed in
ethereum/go-ethereum#31680), though caused other
problems, prevented this one from being triggered as with a positive
`headIndexed` no database read was triggered in `getBlockLvPointer`.
Also, the unnecessary `indexLock` in `synced()` (fixed in
ethereum/go-ethereum#31708) usually did prevent
the search seeing the temp range and therefore avoided noticeable
issues.
0g-wh pushed a commit to 0g-wh/0g-geth that referenced this pull request May 8, 2025
This PR ensures that caching a slice or a slice of slices will never
affect the original version by always cloning a slice fetched from cache
if it is not used in a guaranteed read only way.
0g-wh pushed a commit to 0g-wh/0g-geth that referenced this pull request May 8, 2025
This PR fixes the out-of-range block number logic of `getBlockLvPointer`
which sometimes caused searches to fail if the head was updated in the
wrong moment. This logic ensures that querying the pointer of a future
block returns the pointer after the last fully indexed block (instead of
failing) and therefore an async range update will not cause the search
to fail. Earier this behaviour only worked when `headIndexed` was true
and `headDelimiter` pointed to the end of the indexed range. Now it also
works for an unfinished index.

This logic is also moved from `FilterMaps.getBlockLvPointer` to
`FilterMapsMatcherBackend.GetBlockLvPointer` because it is only required
by the search anyways. `FilterMaps.getBlockLvPointer` now only returns a
pointer for existing blocks, consistently with how it is used in the
indexer/renderer.

Note that this unhandled case has been present in the code for a long
time but went unnoticed because either one of two previously fixed bugs
did prevent it from being triggered; the incorrectly positive
`tempRange.headIndexed` (fixed in
ethereum#31680), though caused other
problems, prevented this one from being triggered as with a positive
`headIndexed` no database read was triggered in `getBlockLvPointer`.
Also, the unnecessary `indexLock` in `synced()` (fixed in
ethereum#31708) usually did prevent
the search seeing the temp range and therefore avoided noticeable
issues.
Rampex1 pushed a commit to streamingfast/go-ethereum that referenced this pull request May 15, 2025
This PR ensures that caching a slice or a slice of slices will never
affect the original version by always cloning a slice fetched from cache
if it is not used in a guaranteed read only way.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants