- Overview
- Merkle Tree requirements
- What does it mean - page-based?
- Why page-based?
- Hashing
- Insertion Workflow
- Proof Generation
- Merkle CLI Commands
This repository holds an efficient implementation of on-chain Merkle Tree calculation implemented using vanilla Solana.
The main optimization is making the Merkle Tree page-based.
The repo consists of 4 components:
merkle_program
- The on-chain program.merkle_common
- Code common between on-chain program and off-chain SDK, tree calculation, hashing and etc.merkle_sdk
- SDK for communicating withmerkle_program
in an easier and more ergonomic way.merkle_cli
- CLI for interaction with the program.
There is a deployed instance of the program on devnet
with address mrktCtrz8auQHeBbdwoFmJ8TbAcR55TkRqaGF5KQ5Td
, it is initialized and used by default in the CLI.
Developed with solana-ci 2.1.21
.
- Insert leaves dynamically into the tree (append-only).
- Recalculate and store the root after each insertion.
- Emit event messages with the updated root (via
msg!
logs). - Allow later generation of Merkle proofs for leaves.
Instead of simply storing all leaf hashes and recalculating the whole tree each time, this implemetation stores pages of configurable number of leaves that keep their hashes, so that only a small portion of the tree must be recalculated on each insertion.
// or 8, 16, 64, depending on expected data size,
// total leaf count and rent/realloc considerations.
const PAGE_SIZE: usize = 32;
/// Global state for the Merkle tree, storing the root and page information.
#[derive(BorshSerialize, BorshDeserialize, Debug, Clone, PartialEq)]
pub struct MerkleTreeState {
/// The current root hash of the entire Merkle tree.
pub root: [u8; 32],
/// Public keys of the accounts storing the Merkle pages.
pub pages: Vec<MerklePageState>,
/// Total number of leaves inserted into the tree across all pages.
pub total_leaf_count: u32,
}
#[derive(BorshSerialize, BorshDeserialize, Debug, Clone, PartialEq)]
pub struct MerklePageState {
pub pubkey: Pubkey,
pub page_hash: [u8; 32],
}
/// Represents a single page of the Merkle tree, storing a batch of leaf hashes
/// in a fixed-size array.
#[derive(BorshSerialize, BorshDeserialize, Debug, Clone, PartialEq)]
pub struct MerklePage {
/// Fixed-size array storing the leaf hashes in this page.
pub leaves: [[u8; 32]; PAGE_CAPACITY],
/// The number of leaves currently stored in this page.
pub leaf_count: u32,
}
There are two main benefits:
Storing all leaves in one account will soon result in hitting Solana memory limits.
Because page becomes immutable after becoming full, we can save a lot of computation effort by saving each page's hash. The efficiency increase is huge!
See this comparison in number of hashing operations for recalculating Merkle Tree root with recalculating the whole tree and using the page-based approach:
Here is comparison in number of operations using different page sizes:
In this implemetation, 32 leaves per page is used as the most balanced default.
SHA-256 is used for all hashing.
In order to protect from the second preimage attack, a 0x00 byte is prepended to a leaf when hashing, while 0x01 is prepended when computing internal (branch) hashes, as defined in Certificate Transparency.
- If the current page is full (
leaf_count == PAGE_SIZE
), create a newMerklePage
account. - Update the page by writing the new leaf at position
leaf_count
. - Increment
leaf_count
and tree'stotal_leaf_count
by 1. - Recompute the root of last page and global Merkle Tree root.
if page.leaf_count < PAGE_SIZE as u32 {
page.leaves[page.leaf_count as usize] = new_leaf;
page.leaf_count += 1;
} else {
// Create a new page
}
Here is an example of tree layout if PAGE_CAPACITY == 8
and there are 19
leaves.
Because of the page-based tree example, we construct the proof in two steps:
- Generate the proof from the leaf to the page hash.
- Generate the proof from the page hash to the main root.
This way we can both:
- Recalculate the tree root in an optimal amount of hash operations.
- Verify the tree root in an optimal amount of hash operations - the verification is dealing with one optimal proof array just like in an ordinary Merkle Tree.
Runs the Initialize
instruction of the merkle_program
.
Usage:
initialize --signer <path_to_signer_file> [--address <program_address>] [-u <URL_OR_MONIKER>]
Arguments:
--signer <path_to_signer_file>
: Path to the signer's wallet file.--address <program_address>
(optional): The address of the deployed program. If not provided, it might use a default or configured address.-u, --url <URL_OR_MONIKER>
(optional): URL for Solana's JSON RPC or moniker (e.g., mainnet-beta, testnet, devnet, localhost). Defaults todevnet
.
Generates a random 16-byte hexadecimal string (32 symbols).
Usage:
rand
Runs the InsertLeaf
instruction of the merkle_program
. It can either use a provided 32-symbol hexadecimal string or generate a random one.
Usage:
insert --signer <path_to_signer_file> [--rand | <hex_of_32_symbols>] [-u <URL_OR_MONIKER>]
Arguments:
--signer <path_to_signer_file>
: Path to the signer's wallet file.--rand
: If specified, a random 32-symbol hex string will be generated and used.<hex_of_32_symbols>
: A 32-symbol hexadecimal string to be inserted.-u, --url <URL_OR_MONIKER>
(optional): URL for Solana's JSON RPC or moniker (e.g., mainnet-beta, testnet, devnet, localhost). Defaults todevnet
.
You should either provide either the 32-symbol hexadecimal string or --rand argument.
Runs the VerifyProof
instruction of the merkle_program
using the provided hexadecimal string as data.
Usage:
verify --signer <path_to_signer_file> <hex_of_32_symbols> [-u <URL_OR_MONIKER>]
Arguments:
--signer <path_to_signer_file>
: Path to the signer's wallet file.<hex_of_32_symbols>
: A 32-symbol hexadecimal string to be used for verification.-u, --url <URL_OR_MONIKER>
(optional): URL for Solana's JSON RPC or moniker (e.g., mainnet-beta, testnet, devnet, localhost). Defaults todevnet
.