Have sideband tasks in app container attempt indefinite reconnection to RabbitMQ #1201
+36
−3
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
If the RabbitMQ container dies for some reason, by default celery will retry connections 100 times then stop. This means a sufficiently long outage will leave the
appin a state where jobs can be submitted, but the file list never gets dispatched to the transformers because that worker has exited.The update here is only for the worker spun up in the
appcontainer. The DID finders in principle have the same problem but since the Celery workers are the primary tasks in those containers, when they exit k8s will restart them so we don't see the issue there. I guess something similar happens with the transformer sidecars although that is certainly going to be ugly.