Plonky3 #1864

adr1anh · 2025-06-06T08:51:15Z

adr1anh
Jun 6, 2025
Collaborator

The goal of this discussion is to outline what a migration to plonky3 would look like and the steps involved to get there.

Note: I will be posting one reply per issue/task to avoid interleaved replies. It is still a work-in-process. To make it easier for others to catch up, I'll update individual topics with any clarification that come up. Later, we will have issues for each of the topics which can be split up into concrete action points.
For now, please respond as a reply to this initial post while I finalize individual points.

adr1anh · 2025-06-06T10:10:09Z

adr1anh
Jun 6, 2025
Collaborator Author

Row-wise trace generation (Issue #1763)

Winterfell currently expects traces to be provided in column-major format. This aligns well with FFT-based workflows, since columns can be processed independently and in parallel.

In our current codebase, traces are built in column-major order, which forces every field element of the VM state to be written to a different memory location. That means one memory write per element, poor spatial locality, and a cache miss on almost every push. The layout also pushes complexity into the API: we need a dedicated getter for each column, and every constant shows up as an element-wise definition, adding boilerplate that makes refactors painful.

Switching to a row-major layout aligns storage with the way states are produced: once a new VM state is computed, the whole row can be copied into a contiguous buffer with a single write. Because successive states sit next to one another, the next row is usually still warm in cache, and a State struct can be safely re-cast to a byte slice instead of pushing fields one by one—far cleaner and faster.

The transition is non-trivial. An initial attempt began in PR #1642 , but finishing it will touch large swaths of the codebase, so it needs careful coordination to avoid endless rebases. A short, dedicated milestone—implemented after current parallel-trace generation work stabilizes—would minimize disruption. That parallel work may actually help: if the “fast processor” already records intermediate states sequentially, each worker can dump its rows straight into disjoint buffers, giving us row-major traces almost for free.

There may be some overlap with the recent parallel trace generation work using the fast processor. If the processor saves intermediary states during evaluation in a sequential way, they can then be written to independent trace chunks in parallel.

For Plonky3 itself, generating the trace in row-major storage is highly convenient but not an immediate blocker. We could transpose the existing column-major traces just before passing them to Plonky3 and, if Winterfell compatibility remains important, transpose them back afterwards. The cost of that extra step still has to be measured; if latency is low enough, we may defer the full refactor until the rest of the migration pressure eases.

1 reply

bobbinth Jun 15, 2025
Maintainer

Yeah - I would probably do this as one of the last steps because transposing from column-major to row-major form is relatively easy, and unlikely to have a huge impact on the overall proof generation.

adr1anh · 2025-06-06T13:29:29Z

adr1anh
Jun 6, 2025
Collaborator Author

Hash function

Our stack still relies on Rescue-Prime-Optimized (RPO), but benchmarks suggest a width-12 RPO permutation is approximately 7× slower than Poseidon, and roughly 14× slower than Poseidon2. Most modern ZK-VMs have standardized on Poseidon2 precisely because of this performance gap.

A wholesale switch is delicate because every on-chain commitment—account roots, contract storage—has already been defined in terms of Rescue. To keep backward compatibility while we migrate to Plonky3, the plan is pragmatic: introduce Poseidon2 only inside the Plonky3 prover first, leave the outer VM and its state commitments on RPO for now, and maintain both hash chiplets side-by-side.

Concretely, that means three additions to the Miden VM:

A new Poseidon2 chiplet that the recursive verifier can call; the existing Rescue chiplet stays until we are ready for a staggered, chain-level upgrade.
Poseidon-based Merkle verification for the FRI query openings that Plonky3 emits.
A Poseidon2 transcript sponge (the prover will commit to execution traces with Poseidon2 instead of Rescue).

The trade-off comes primarily in the arithmetization. Poseidon2 permutations can be arithmetized in either “skinny” (one round per row) or “wide” (entire permutation per row) variants. A skinny permutation doesn’t increase the chiplet trace width significantly (assuming stacked chiplets), but a single permutation would span roughly 32 rows instead of Rescue’s current 8. Plonky3 currently includes only the wide Poseidon2 AIR. This limitation is partly due to Plonky3’s lack of support for verifier-supplied periodic columns. Succinct’s SP1 VM does provide the skinny variant, which we could adapt for our use case. Note that the wide variant of Poseidon2 over Goldilocks has the following trace widths (ref):

Degree 9: 131
Degree 3: 249
If we decide to continue supporting RPO, deriving an RPO AIR from Plonky3’s Poseidon2 template would be straightforward.

Currently, native Poseidon2 evaluation for Goldilocks isn’t fully optimized, primarily due to the lack of specialized MDS multiplication assembly code. Such optimizations do exist for popular 31-bit fields. Moreover, there is a strong incentive to opt for KoalaBear as the SBox has degree 3 rather than 7, leading to narrower trace size when using degree 5 AIRs.

Additionally, because Poseidon2’s design is simpler than Rescue’s, there remains some mild skepticism regarding its security among cryptographers. Before fully committing, we should briefly consult external experts for additional input.

Long-term compatibility has no clear consensus yet. Using a universal hash (such as SHA-256, Keccak, or Blake) for some commitments could facilitate backward compatibility but would also significantly increase costs in the VM. Alternatively, we might use “hash conversion” mechanisms—proving equivalence between hashes computed with different functions for the same pre-image—to manage state transitions. We may also consider tagging hashes to identify the hash function used. Regardless, we should always provide users with the option to commit to state using efficient arithmetic hashes in scenarios where universal hashes aren’t strictly necessary. Further investigation is needed to clarify these trade-offs.

1 reply

bobbinth Jun 15, 2025
Maintainer

introduce Poseidon2 only inside the Plonky3 prover first, leave the outer VM and its state commitments on RPO for now, and maintain both hash chiplets side-by-side.

I think if we decide to switch to Poseidon2, we'd probably replace RPO with it (so, no need to maintain both).

The trade-off comes primarily in the arithmetization. Poseidon2 permutations can be arithmetized in either “skinny” (one round per row) or “wide” (entire permutation per row) variants. A skinny permutation doesn’t increase the chiplet trace width significantly (assuming stacked chiplets), but a single permutation would span roughly 32 rows instead of Rescue’s current 8. Plonky3 currently includes only the wide Poseidon2 AIR.

Most likely, we'll go with the "skinny" approach because it would fit well with our chiplet design. Nevertheless, I'm curious if there is any significant performance difference between the "skinny" and the "wide" approaches.

Our stack still relies on Rescue-Prime-Optimized (RPO), but benchmarks suggest a width-12 RPO permutation is approximately 7× slower than Poseidon, and roughly 14× slower than Poseidon2. Most modern ZK-VMs have standardized on Poseidon2 precisely because of this performance gap.

Actually, based on more recent benchmarks, RPO is less than 4x slower than Poseidon, and we also have option of RPX hash function (already implemented), which is about 2x slower than Poseidon. So, we should implement and benchmark Poseidon2. I've created 0xMiden/crypto#429 for this. The hope is that it is at least 4x faster than RPX (but ideally more like 5x faster).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Plonky3 #1864

Uh oh!

{{title}}

Uh oh!

Replies: 2 comments 2 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Plonky3 #1864

Uh oh!

adr1anh Jun 6, 2025 Collaborator

Replies: 2 comments · 2 replies

Uh oh!

adr1anh Jun 6, 2025 Collaborator Author

Row-wise trace generation (Issue #1763)

Uh oh!

bobbinth Jun 15, 2025 Maintainer

Uh oh!

adr1anh Jun 6, 2025 Collaborator Author

Hash function

Uh oh!

bobbinth Jun 15, 2025 Maintainer

adr1anh
Jun 6, 2025
Collaborator

Replies: 2 comments 2 replies

adr1anh
Jun 6, 2025
Collaborator Author

bobbinth Jun 15, 2025
Maintainer

adr1anh
Jun 6, 2025
Collaborator Author

bobbinth Jun 15, 2025
Maintainer