Feature Request/Question: Retry Logic for S3 Artifacts in Cross-Region Scenarios #1370
-
Hi Hera Developer Team, We have a workflow involving a computationally intensive pod that performs the following steps:
Previously, the compute instance (running the pod) and the S3 bucket were co-located in the same region. However, due to new business requirements, the S3 bucket is now located in Europe, while the compute instance remains in the US. This geographical separation has introduced network instability, leading to frequent S3 endpoint connection errors during the artifact download and upload phases. We are aware of Hera's pod-level retry strategy. However, restarting the entire pod (including the heavy computation) solely due to a transient S3 connection error during artifact transfer is highly inefficient and costly for us. Our question is: Does Hera offer, or is there a plan to introduce, built-in retry logic (ideally with configurable backoff periods or circuit breaker patterns) specifically for the S3 artifact download and upload operations? We are looking for a way to make the artifact handling itself more resilient to network issues without triggering a full pod restart. Thank you for considering this scenario and for any insights you can provide on handling S3 artifact transfers more robustly in cross-region setups. |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment
-
Hey @xiki-tempula, this is a question to raise on https://github.com/argoproj/argo-workflows itself, as Hera doesn't control the Artifact download/upload logic. The relevant part of the code is https://github.com/argoproj/argo-workflows/blob/2e4ca9369b38629ebdff619520df20cd1b745bd5/workflow/artifacts/s3/s3.go#L302-L305 |
Beta Was this translation helpful? Give feedback.
Hey @xiki-tempula, this is a question to raise on https://github.com/argoproj/argo-workflows itself, as Hera doesn't control the Artifact download/upload logic. The relevant part of the code is https://github.com/argoproj/argo-workflows/blob/2e4ca9369b38629ebdff619520df20cd1b745bd5/workflow/artifacts/s3/s3.go#L302-L305