Replies: 2 comments 2 replies
-
Row-wise trace generation (Issue #1763)Winterfell currently expects traces to be provided in column-major format. This aligns well with FFT-based workflows, since columns can be processed independently and in parallel. In our current codebase, traces are built in column-major order, which forces every field element of the VM state to be written to a different memory location. That means one memory write per element, poor spatial locality, and a cache miss on almost every push. The layout also pushes complexity into the API: we need a dedicated getter for each column, and every constant shows up as an element-wise definition, adding boilerplate that makes refactors painful. Switching to a row-major layout aligns storage with the way states are produced: once a new VM state is computed, the whole row can be copied into a contiguous buffer with a single write. Because successive states sit next to one another, the next row is usually still warm in cache, and a State struct can be safely re-cast to a byte slice instead of pushing fields one by one—far cleaner and faster. The transition is non-trivial. An initial attempt began in PR #1642 , but finishing it will touch large swaths of the codebase, so it needs careful coordination to avoid endless rebases. A short, dedicated milestone—implemented after current parallel-trace generation work stabilizes—would minimize disruption. That parallel work may actually help: if the “fast processor” already records intermediate states sequentially, each worker can dump its rows straight into disjoint buffers, giving us row-major traces almost for free. There may be some overlap with the recent parallel trace generation work using the fast processor. If the processor saves intermediary states during evaluation in a sequential way, they can then be written to independent trace chunks in parallel. For Plonky3 itself, generating the trace in row-major storage is highly convenient but not an immediate blocker. We could transpose the existing column-major traces just before passing them to Plonky3 and, if Winterfell compatibility remains important, transpose them back afterwards. The cost of that extra step still has to be measured; if latency is low enough, we may defer the full refactor until the rest of the migration pressure eases. |
Beta Was this translation helpful? Give feedback.
-
Hash functionOur stack still relies on Rescue-Prime-Optimized (RPO), but benchmarks suggest a width-12 RPO permutation is approximately 7× slower than Poseidon, and roughly 14× slower than Poseidon2. Most modern ZK-VMs have standardized on Poseidon2 precisely because of this performance gap. A wholesale switch is delicate because every on-chain commitment—account roots, contract storage—has already been defined in terms of Rescue. To keep backward compatibility while we migrate to Plonky3, the plan is pragmatic: introduce Poseidon2 only inside the Plonky3 prover first, leave the outer VM and its state commitments on RPO for now, and maintain both hash chiplets side-by-side. Concretely, that means three additions to the Miden VM:
The trade-off comes primarily in the arithmetization. Poseidon2 permutations can be arithmetized in either “skinny” (one round per row) or “wide” (entire permutation per row) variants. A skinny permutation doesn’t increase the chiplet trace width significantly (assuming stacked chiplets), but a single permutation would span roughly 32 rows instead of Rescue’s current 8. Plonky3 currently includes only the wide Poseidon2 AIR. This limitation is partly due to Plonky3’s lack of support for verifier-supplied periodic columns. Succinct’s SP1 VM does provide the skinny variant, which we could adapt for our use case. Note that the wide variant of Poseidon2 over Goldilocks has the following trace widths (ref):
Currently, native Poseidon2 evaluation for Goldilocks isn’t fully optimized, primarily due to the lack of specialized MDS multiplication assembly code. Such optimizations do exist for popular 31-bit fields. Moreover, there is a strong incentive to opt for KoalaBear as the SBox has degree 3 rather than 7, leading to narrower trace size when using degree 5 AIRs. Additionally, because Poseidon2’s design is simpler than Rescue’s, there remains some mild skepticism regarding its security among cryptographers. Before fully committing, we should briefly consult external experts for additional input. Long-term compatibility has no clear consensus yet. Using a universal hash (such as SHA-256, Keccak, or Blake) for some commitments could facilitate backward compatibility but would also significantly increase costs in the VM. Alternatively, we might use “hash conversion” mechanisms—proving equivalence between hashes computed with different functions for the same pre-image—to manage state transitions. We may also consider tagging hashes to identify the hash function used. Regardless, we should always provide users with the option to commit to state using efficient arithmetic hashes in scenarios where universal hashes aren’t strictly necessary. Further investigation is needed to clarify these trade-offs. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
The goal of this discussion is to outline what a migration to plonky3 would look like and the steps involved to get there.
Note: I will be posting one reply per issue/task to avoid interleaved replies. It is still a work-in-process. To make it easier for others to catch up, I'll update individual topics with any clarification that come up. Later, we will have issues for each of the topics which can be split up into concrete action points.
For now, please respond as a reply to this initial post while I finalize individual points.
Beta Was this translation helpful? Give feedback.
All reactions