Replies: 5 comments 1 reply
-
Moved from issues to discussion\ |
Beta Was this translation helpful? Give feedback.
-
I dont have any numbers off the top of my head but the way to think about it is that the initial execution, and to some degree even doing trace generation is nonparallelizable. In practice, and you can see this in the executor, we will "checkpoint" a very light weight version of the execution, which we can then use to start the execution at "arbitrary" spot in parallel. From this each "checkpoint" reexecutes and collects trace data. Pretty much everything after this is parallelizable. |
Beta Was this translation helpful? Give feedback.
-
Thanks for raising this, I was also curious about SP1’s scalability across multiple GPUs. It would be really helpful to see any benchmarks or performance scaling results the team might have internally. Especially interested in where the diminishing returns start to kick in as more GPUs are added. If some components of the proof system are inherently sequential, knowing that would help set realistic expectations when designing a prover setup. Looking forward to hearing the team’s insights! |
Beta Was this translation helpful? Give feedback.
-
Exactly! I think the community could benefit greatly from having access to performance results from multi-GPU benchmarks. These benchmarks would not only provide a baseline for comparing prover performance, but also help people make more informed economic decisions. At the end of the day, the ultimate goal for provers is profitability. Since adding an extra GPU to a cluster introduces new costs, such as purchasing/renting the hardware, increased electricity consumption, and ongoing maintenance, it is critical to understand at what point adding another GPU stops being financially beneficial. |
Beta Was this translation helpful? Give feedback.
-
Straight answer: SP1's GPU acceleration is closed-source, and zkVM itself does NOT support parallel proving. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Component
sp1-zkvm
Describe the feature you would like
I have recently started exploring the zkVM ecosystem and my main focus has been experimenting with RSP to better understand the performance of SP1. If I understand correctly, the out-of-box proving software does not use multiple GPUs on a single machine, which means it is not possible to quantify the performance improvement in proving due to GPU parallelization without setting up a cluster.
I was wondering if Succinct has any numbers on the parallelizability trade-off? Basically, what is the marginal benefit of increasing the number of discrete GPUs in the cluster, i.e., how much speed-up does SP1 get from doubling/tripling/quadrupling/etc. the number of GPUs in the cluster? I am assuming there will be some communication and coordination overhead due to adding more GPUs to the cluster. I am also wondering if there are any unparallelizable portions of the proving process that would not benefit from the additional GPUs. If you have any measurements, even if they are rough estimates, I would love to hear them.
Additional context
No response
Beta Was this translation helpful? Give feedback.
All reactions