Configurable api server loadbalancer monitor

/kind feature

**Describe the solution you'd like**

Currently the OpenStack loadbalancer ***monitor*** for the API server created by CAPO has hardcoded settings, as seen in:
https://github.com/kubernetes-sigs/cluster-api-provider-openstack/blob/944b2658fb11e8212aab43e5bbf112b15a8bb75a/pkg/cloud/services/loadbalancer/loadbalancer.go#L223-L230

In other words, `Delay: 30, Timeout: 5, MaxRetries: 3` means:
>After 105 (=`3*(30+5)`) seconds of downtime, API server pool members will be marked as down, and when it's back up again, another 3 attempts/MaxRetries are needed for it to be added back in the pool again, i.e. 105 more seconds.

...which in turn means that if all API server members become unavailable at the same time for 1,5 minutes, it will mean downtime in total of at least 3,5 minutes (`2*3*(30+5)`)

This may sound weird, but it's the official behavior of `max-retries` to apply to taking members both *down* and *up*, as per https://docs.openstack.org/octavia/queens/user/guides/basic-cookbook.html#heath-monitor-options (quoted below):

>`max-retries`: Number of subsequent health checks a given back-end server must fail before it is considered down, or that a failed back-end server must pass to be considered up again.

**Anything else you would like to add:**

1. We had a brief outage related to this when all our control-plane nodes were accidentally running on the same OpenStack Nova/hypervisor host which had a network issues/downtime.
   - We'll soon try out the hard/soft anti-affinity policies, which will decrease the risk for this kind of failure, but faster recovery overall for API server LB pool members might still help.
2. I haven't yet looked at if/how other CAPI providers solve this.

	monitorCreateOpts := monitors.CreateOpts{
	Name: monitorName,
	PoolID: poolID,
	Type: "TCP",
	Delay: 30,
	Timeout: 5,
	MaxRetries: 3,
	}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Configurable api server loadbalancer monitor #1221

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Configurable api server loadbalancer monitor #1221

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions