Skip to content

Change broker to wait for workers to exit before stopping. #518

@nfachan

Description

@nfachan

See #494.

As it stands today, when the broker gets a stop command, it just exits.

A nicer approach would be for it to immediately send messages to all of the workers telling them to stop. The scheduler would probably also stop all work. Then, once all of the workers had disconnected, it would actually exit. If new workers connected while it was stopping, it would send messages to them immediately telling them to stop.

It's unclear whether we want to do the same thing for the clients. I think the answer in the long-term is yes, because we're going to want to them to reconnect on error, but not on stop.

It's also unclear what to do about outstanding artifact transfers. It's probably okay to just bail, especially on the worker's side. I image the worker will implement this by just canceling all outstanding jobs, like it does now.

We should probably make the broker's scheduler smart enough in this situation to just drop all messages on the floor.

When this is all said and done, we can change the github-action to actually pay attention to how the worker exits.

Metadata

Metadata

Assignees

No one assigned

    Labels

    all-clientsIssues that affect all client programs: all test runners and maelstrom-run.maelstrom-brokerworker

    Type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions