Precompile support #1876

adr1anh · 2025-05-05T14:58:14Z

adr1anh
May 5, 2025
Collaborator

Feature description

Many zkVMs introduce custom AIRs to efficiently handle operations that would otherwise consume many cycles in the existing general-purpose virtual machine. These are commonly referred to as precompiles, accelerators, or builtins. In this issue, we’ll use the term precompiles, as it is currently the most widely recognized.

This issue outlines the motivation and design considerations for supporting precompiles in the Miden VM. Specifically, we aim to:

Identify operations that justify custom AIRs due to performance or feasibility concerns.
Discuss architectures for generating and verifying precompiles, especially in distributed or prover-separated contexts.
Specify the interface within the Miden VM for invoking such operations.
- Propose a roadmap for incremental support and adoption.

Motivating Use Cases

The primary motivation for precompiles is to enable efficient support for cryptographic primitives—particularly hashing and signature verification—that are computationally intensive when expressed in general-which are expensive to express directly in VM instructions. These operations are essential for:

Verifying blockchain state or cross-chain messages (e.g., Merkle proofs, signatures)
Enabling applications like bridges, light clients, and cross-chain oracles
Securely processing external data in a trust-minimized context

Case study: SP1 precompile architecture

As an example, SP1 implements the following precompiles, whose chips have the following width.

- `FpOpCols<256>`: 406
- `FpOpCols<384>`: 606
- `WeierstrassAddAssignCols<256>`: 2305
- `WeierstrassDecompressCols<256>`: 1412
- `WeierstrassDoubleAssignCols<256>`: 2348
- `WeierstrassDecompressCols<384>`: 2305
- `WeierstrassAddAssignCols<384>`: 3473
- `WeierstrassDoubleAssignCols<384>`: 3540
- `EdAddAssignCols`: 1929
- `EdDecompressCols`: 1600
- `KeccakMemCols`: 3289
- `ShaExtendCols`: 176
- `ShaCompressCols`: 262
- `Uint256MulCols`: 480
- `U256x2048MulCols`: 3129 (used for RSA)

It is important to note that

Due to the use of a 31-bit field, any non-native field element must be decomposed into 8/16 bit limbs.
The above figures ignore any auxiliary columns required for lookups, of which there are likely many, for use of range checks and bitwise operations (these are provided by a pre-processed trusted table)
Constraints are limited to degree 3, as this allows for the lowest blowup factor, leading to a smaller trace commitment footprint.

Overall though, the widths are a few order of magnitude wider than the current Miden VM design. Combined with a low degree bound/blowup factor, this leads to very large proofs and increased cost for a recursive verifier.

One important feature of the SP1 design that we should also consider and perhaps take further, is that precompiles should focus on small operations that can be called from the VM. This helps keep proofs small, as an increase in trace length has a much smaller verification cost increase compared to widening traces.

In order to better model the costs of these approaches, we would need to compare the relative costs between

Naive computation in the VM,
Number of cycles required for verifying a proof VM and a precompile proof.

This can lead to different situations

We should prefer evaluation of precompiles in the VM when the number of cycles required is smaller than verifying a precompile proof.
Even when there are many precompile calls, it may be beneficial to create a new VM proof focusing solely on evaluating them, as this proof can be generated in parallel.

Signature verification

The main focus would be support for ECDSA signatures, as these are widely used across existing chains.

Elliptic curve-based signature schemes require support for group operations, which themselves operate on non-native 256/384 bit fields. We would in any case require a precompile for the latter, which would be implemented by decomposing these bignums into 8/16 bit limbs, and allowing operations modulo the prime characteristic.

There are two ways to implement the curve operations

let the VM compute each group operations by only delegating the non-native arithmetic
implement a chiplet that performs the group operations which internally calls into the non-native field chiplet.

The tradeoff between both techniques affects trace sizes. In the former, we would reduce the trace width relating to precompiles, at the expense of a longer VM trace due to the extra cycles required for each non-native field operation. The opposite happens in the second case.

One potential reason explaining the huge widths of the EC chips in SP1 is that they are storing all intermediary field elements in the chip. If these were instead stored in memory, the add/double chips could be narrower at the expense of a longer field operation chip.

In the context where the VM computes the group operation, we would still be able to apply non-deterministic tricks to efficiently evaluate curve operations. Moreover, this would avoid having to specify a chip for each curve, relying only on a generic non-native field chip. Once implemented it will likely be easier to implement support for EdDSA and BLS by specifying different moduli.

Hash function

Implementing both Keccak and SHA2 enables compatibility with a wide range of applications.

SHA2 is used in Bitcoin, in standardized signature schemes, and is also used by SP1 to commit to public inputs to their proofs. Including both precompiles for extend and compress requires the VM to perform book-keeping of the state, reducing the precompile to a small repeated operation that would be expensive to compute manually.

Keccak is used in many places in Ethereum, but is also much more complex to arithmetize in an AIR due to the large number of bit-decompositions it requires. There may however be ways to express the circuit more succinctly, when taking into account a higher AIR constraint degree, or finding ways to layout over multiple columns. Specializing the constraints to Goldilocks could also improve trace width.

VM Interface

This section focuses on the interaction between the classic existing Miden VM and precompile-specific VM. We assume for now that they are separate VMs and we would create separate proofs for each of them, making use of recursive verification to obtain a single proof at the end.

Depending on the context we may want to either

Generate VM and precompile proofs separately, and then run another instance of the VM purely to recursively verify both and ensure the latter proves the calls made by the former. This situation would be preferable when
- The initial VM proof will be aggregated by a sequencer before being posted to the blockchain.
- The aggregator of multiple VM proofs could generate a single proof for all deferred precompile calls, reducing the amount of cycles required by aggregation program since only a single precompile proof would need to be verified. It is also faster to create a batch proof of many precompile calls rather than separate proofs for each individual VM execution. This would also avoid the need to the VM to include logic for recursive verification as that would be handled by the aggregator.
- Bandwidth allows a client to send a VM proof alongside the precompile inputs to the aggregator or sequencer. In the short term, this would allow for an equivalent mechanism where the sequencer natively verifies the precompiles rather than a proof of their correctness. This requires the inputs to not be sensitive with regard to privacy.
Let the VM recursively verify a precompile proof in the epilog of the program execution.
- This requires one less recursive verification, at the cost of more cycles in the VM for verifying the precompile proof, which would otherwise add extra latency.
- However, this also causes latency issues because the VM proof could only start once the precompile proof has been finalized.
- This would be more interesting in the context where the client is able to generate the precompile proof themselves on their own machine in parallel, when for example the precompile inputs must remain private.

For simplicity though, we will focus on the later case as this simplifies the recursion discussion by limiting it to a single recursive verifier call.

In order to call a precompile, we would introduce an opcode for the VM which could be called from any context. The idea is that the VM will produce a commitment to all the precompile calls made over the course of the program execution, and pass this commitment as public input to the precompile VM. The precompile VM would have to unpack the commitment and verify each of them using the specialized chiplets. This requires both VMs to hash the same data, though we will explore an idea which can avoid this by introducing a mechanism simplifying efficient data sharing between two different proofs.

To implement the above, the kernel would expose a procedure which would be called with

An identifier for the precompile being requested
The inputs to the precompile, hashed to a single word. If the precompile "computes a result", then this result would be non-deterministically loaded from the advice provider, and included as an input to the precompile.
We'll refer to this as an instruction that can be verified by the precompile VM. The kernel maintains in memory a sponge from which we will squeeze a commitment to all instructions requested by the VM. At every precompile call, the kernel absorbs the instruction into the sponge. In the epilog of a VM program execution, the squeezed hash, along with the non-deterministic proof loaded from the advice are recursively verified. If the verification succeeds, it means that all precompile instructions were valid.

The above requires that the kernel knows the verification key for the precompile VM, which would describe what kind of instructions (and corresponding IDs) can be verified by the precompile VM. The easiest path, at least initially, would be to consider a monolithic precompile VM which supports all possible instructions. This is practical because we could simplify the recursive verification step by avoiding having to consider precompiles individually, at the expense of a more expensive verification procedure. There are likely ways this can be optimized, where we can consider the set of constraints for each precompile separately, so that the cost of verification is proportional to the the number of precompiles actually invoked.

As mentioned earlier, the precompile VM receives a hash of all instructions as public input. It must always include a RPO digest (or whatever sponge used by the VM to construct the commitment), and unpack it one by one, and sending the instructions to the relevant chip using a bus. Since the inputs to each precompile may have different sizes, the precompile chips will also be responsible for calling the hash chip to unpack the inputs before verifying their correctness.

The advantage of letting the VM generating a hash of all the precompile instructions is that it allows us to support this mechanism without having to actually implement the precompile VM. The idea is as follows: in the epilog of the VM, the kernel simply outputs the commitment to the precompile instruction as one of the public inputs of the VM proof. This proof is now incomplete, as it can only be verified when supplied with the instructions commitment. Therefore, we can wrap the STARK proof with a list of all the precompile inputs, from which the commitment can be derived. This enhanced proof can then be sent to a verifier who will, before verifying the STARK proof, iterate over all precompile inputs, verify them and absorb them into a sponge in order to derive the instructions commitment. While this does increase the proof size linearly with regard to the number of instructions, it also allows us to expose precompiles to users much faster, and only requires changes at the VM kernel level.

In an ideal world though, the VM would be able to call precompiles without having to hash the inputs, and instead share the inputs directly with the precompile VM. For this, we consider an approach which would be implemented at the prover level. In the same ways as we can share instruction between both VM using a commitment, we can equivalently commit to all instructions as a separate trace which is accessible by both VMs. The Miden VM would treat it as a read-only table, and a call to the precompile procedure would use a bus request to ensure the instruction is included in the table. On the precompile VM side, the constraints over this table would ensure that a bus request is made from each row to the relevant chip, such that this instruction table can be considered correct since all instructions have been verified by the relevant chiplets.
While including an addition trace would usually require an extra Merkle tree opening for the verifier when verifying FRI queries, we can avoid this by the following:

the verification of the VM proof checks the constraints over the instructions table using the out of domain evaluation, but does not verify this evaluation as part of its FRI invocation
instead, it defers the evaluation check to the precompile VM proof, which can batch this evaluation as part of the DEEP query
note that this is only valid when both proofs are verified together, as this ensures the evaluation point from the VM opening is random, a required condition for batching it in the deep composition polynomial.
Moreover, this technique can also be used in the context of a batch precompile proof for multiple VM executions. The instructions table may contain instructions that are unused by an individual execution, but as long as all instructions have been verified, we are sure that all deferred instructions across the different VM proofs are valid.

The main use case for deferred proving is for helping reduce the amount of proving performed by clients, and instead let a network prover prove the expensive computations. However, there are many different scenarios that enable this. We will need to evaluate them based off of

need for privacy
cost of proving, local and remote
latency
implementation complexity
precompile relevance for users.

Roadmap

Sponge based approach in kernel, outputting the instructions commitment as public input, and attaching instructions to proof to be verified by server
Recursive verification of precompile VM in kernel epilog, verifying a single type of precompile
Extend precompile VM to multiple instructions
Conditionally verify individual chiplet constraints
allow both local and remote precompile proving for privacy

Why is this feature needed?

Efficient verification of expensive operations (signature verification and hashes)

adr1anh · 2025-06-11T22:23:12Z

adr1anh
Jun 11, 2025
Collaborator Author

Tracking issue for naive implementation #1851

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Precompile support #1876

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Precompile support #1876

Uh oh!

adr1anh May 5, 2025 Collaborator

Feature description

Motivating Use Cases

Case study: SP1 precompile architecture

Signature verification

Hash function

VM Interface

Roadmap

Why is this feature needed?

Replies: 1 comment

Uh oh!

adr1anh Jun 11, 2025 Collaborator Author

adr1anh
May 5, 2025
Collaborator

adr1anh
Jun 11, 2025
Collaborator Author