-
Notifications
You must be signed in to change notification settings - Fork 11
Description
See #494.
As it stands today, when the broker gets a stop
command, it just exits.
A nicer approach would be for it to immediately send messages to all of the workers telling them to stop. The scheduler would probably also stop all work. Then, once all of the workers had disconnected, it would actually exit. If new workers connected while it was stopping, it would send messages to them immediately telling them to stop.
It's unclear whether we want to do the same thing for the clients. I think the answer in the long-term is yes, because we're going to want to them to reconnect on error, but not on stop.
It's also unclear what to do about outstanding artifact transfers. It's probably okay to just bail, especially on the worker's side. I image the worker will implement this by just canceling all outstanding jobs, like it does now.
We should probably make the broker's scheduler smart enough in this situation to just drop all messages on the floor.
When this is all said and done, we can change the github-action to actually pay attention to how the worker exits.