[ZK Stack] - prover-job-monitor does not recover from compressor failure #1043
-
Team or ProjectNo response Module Affected
Rust Version1.87.0 Operating System
Issue DescriptionWhen processing a proof and the compressor fails to compress the proof (in my case, the gpu was 100% occupied by the prover), the prover-job-monitor does not reset/update the state of the current compressor job. When restarting the compressor/prover-job-monitor the job is not picked up again. Expected Behaviorcompressor/job-monitor update the state of the compressor job as failed and, in a best case scenario, reschedule it for a later time. when restarting the compressor, it should handle the failed job/in progress job. Current Behaviorjob-monitor still thinks the job is in progress, compressor does not pick up job after restarting nor updates the state of the job. Repository Link (if applicable)No response Additional DetailsNo response Prior Research
|
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 2 replies
-
@thecodingshrimp hey, thanks for reporting this problem. Would you be able to provide details on how to reproduce this issue? |
Beta Was this translation helpful? Give feedback.
they are different issues, yes.
I think I can close it for now since I noticed that after 5 attempts the compressor gives up on the batch. The monitor probably will stop reporting it by then too. let me double check and reopen the issue if necessary.