Skip to content

Implement exponential backoff agent health checks  #42

@darkodraskovic

Description

@darkodraskovic

To ensure that the manager is aware of the virtual machine's operational status, we can employ exponential backoff health checks. This involves the implementation of an /health or /heartbit endpoint within the agent. The manager monitors this endpoint, waiting for a response from the agent to confirm its operational state, verifying that the agent is 'alive' and ready for the computation process.

Alternatively, we can achieve a similar outcome by attempting to initiate the computation process via the /run endpoint, employing exponential backoff for retries.

Both the /health and /run endpoint methods involve a timeout mechanism. When this deadline is exceeded, we declare a failure or error in the virtual machine boot process.

Metadata

Metadata

Type

No type

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions