Skip to content

Binary tries for substrate/Towards NOMT #5614

@eskimor

Description

@eskimor

It is known that our radix-16 trie does not scale too well with regards to proof size. TPS numbers are dropping (on a high level) already when reaching millions of accounts. By migrating to a binary trie, we can push this immediately by three orders of magnitude, reaching global scale. This becomes hugely relevant now as we are gaining traction and projects start pushing numbers. We should aim to have a binary trie production ready, including a migration path by 2025.

NOMT implements a binary trie and also makes sure to resolve other bottlenecks on the collator in block building, but it is unclear whether we can port it to substrate in that time frame. What we should be doing though, is move towards NOMT: It should be evaluated whether we can implement a binary trie, that is easy enough to include into substrate, that is already binary compatible with NOMT.

Considerations for this from @rphmeier :

The main thing is just to have the node format of the trie be consistent, insofar as what actually gets hashed. The database backend doesn't matter too much. But stuff like SCALE-encoding nodes is going to break compatibility with NOMT.

We have two kinds of nodes currently. Leaves and internal. Leaves always start with MSB 1 and internal nodes always start with MSB 0.

The internal hash is computed by H(child1, child2) and the leaf hash is computed by H(key, value_hash).

Since you're dealing with long keys you'd also want to add extension nodes to that format.

An extension node might be represented as H(extension_size, extension_bits). I think that we should probably limit each extension to something like 256-8 bits so you can pack the length and extension bits into a single hash-sized slot on disk and also steal another MSB from the node format to support extensions.

256-8 bits per extension should be fine; you'd only need maximum 9 extensions to cover 2KB of shared key material. I expect the actual amount of shared key material to be more like 512 bits with the current substrate format (256 bits for pallet, 256 bits for map)

Having an extremely compact format is also going to make it simpler to do stuff like ZK prove the merkle proofs in the future.

oh, last thing: the choice of hash function will be impactful for ZK proving in the long term. In October we'll start to try proving large merkle trie reads/writes with Poseidon2

Timeline

If we plan for success, we have to assume that next year there will exist teams that would benefit a lot from binary tries. Sustaining tps numbers into the billions, instead of millions can make a lot of a difference. Achieving NOMT compatibility allows an easier future upgrade, lowering hardware requirements on collators a lot and pushing possible throughput on a single parachain to new levels.

Therefore I would aim for the following:

2025 H1: Have a binary trie implementation in Substrate that is compatible with Frame. Any new parachain can already use this.
2025 H1/H2: Provide some migration plan for existing parachains to the new trie.
2026: Optimize the database/storage -> Go full NOMT.

Related tickets:

#1498

Further Material

https://www.rob.tech/blog/nearly-optimal-polkadot-merklization/

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions