Replies: 1 comment
-
This behavior is because of the ECS Deployment Circuit Breaker. This is always enabled in Copilot as a compromise between the default behavior of CFN (spin for 90 minutes, trying to kick off failing tasks until the resource times out and the stack rolls back) and a "fail fast" behavior for customers with larger task counts. The Circuit Breaker sets a minimum failure threshold of 10, which takes quite a while to reach if your desired count is 1, unfortunately. For higher desired counts, however, the failure threshold can be reached quite quickly, as it's defined by the following ranges:
That means that if your desired count is 10, the failure threshold is 10. If it's 100, the failure threshold is 50. If it's 500, the failure threshold is 200. If you want "fast failure" you could always try increasing the desired count of your service when you want a rapid dev cycle, for example setting it to 5, then scale it back down to 1 when you're finished making container runtime changes. When ECS attempts to launch 5 copies of a task that fails every time, you'll get to the required failure threshold in 1/5 the time. If you do this and rely on Copilot to keep your costs low (e.g. your service is low traffic and only needs 1 copy), it's extra important to scale back down at the end. Otherwise, the unused compute will stick around in your service and could incur unexpected charges. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Every time I make a mistake in the config, my load-balanced service retries over 8/9 times before annulling the deployment.
Are there any params to precise that no of retries?
Beta Was this translation helpful? Give feedback.
All reactions