-
Notifications
You must be signed in to change notification settings - Fork 20.9k
triedb/pathdb: introduce lookup structure to optimize state access #30971
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
With this PR (on top of #30661), the block execution performance matches the current master branch. The state access (storage, account) is slightly faster. The triedb commit is about 2ms slower due to the lookup overhead. Bench07: PR ![]() ![]() ![]() Without this PR, #30661 vs master, the state access is significantly slower ![]() ![]() ![]() |
deb27b6
to
554727e
Compare
554727e
to
07aadaf
Compare
d88a197
to
d7b5411
Compare
triedb/pathdb/lookup.go
Outdated
if i == 0 { | ||
list = list[1:] | ||
if cap(list) > 1024 { | ||
list = append(make([]common.Hash, 0, len(list)), list...) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we have 1024 non-finalized diff layers floating around, we would hit this every time, right?
I get the idea of pruning away some unnecessary memory if we ever hit a lot of parallel forks. I guess its not really a big deal either way
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we have 1024 non-finalized diff layers floating around
The number of diff layers is defined as 128, regardless of the associated state is finalized or not.
But for sure, with this PR, we can unlock the potential to pile up arbitrary number of diff layers
on top (if the memory is sufficient to host that many diff layers). In such cases, we need to adjust
this constant.
The main motivation to add this additional check is: for some storage slots frequently being updated,
it might be modified in every blocks. The underlying slice might be held forever. This mechanism just
tries to make sure the dangling slices won't be referenced forever.
Generally looks good, the only thing I stumbled upon was that we are storing all descendants in the map instead of just pointers to the next descendant. I think the map approach is probably significantly faster for the lookup compared to the tree/pointer based approach |
The advantage of maintaining the ancestor-descendant relationship is that it allows O(1) queries to determine whether The downside is that for each new layer, it must recursively traverse all ancestors up to the disk layer (up to 128 layers). However, this cost occurs once per block and is entirely acceptable. |
This pull request introduces a mechanism to improve state lookup efficiency in pathdb
by maintaining a lookup structure that eliminates unnecessary iteration over diff layers.
The core idea is to track a mutation history for each dirty state entry residing in the
diff layers. This history records the state roots of all layers in which the entry was modified,
sorted from oldest to newest.
During state lookup, this mutation history is queried to find the most recent layer whose
state root either matches the target root or is a descendant of it. This allows us to quickly
identify the layer containing the relevant data, avoiding the need to iterate through all diff
layers from top to bottom.
Besides, the overhead for state lookup is constant, no matter how many diff layers are retained
in the pathdb, which unlocks the potential to hold more diff layers.
Of course, maintaining this lookup structure introduces some overhead. For each state transition,
we need to:
(a) update the mutation records for the modified state entries, and
(b) remove stale mutation records associated with outdated layers.
On our benchmark machine, it will introduce around 1ms overhead which is acceptable.
