Skip to content

reconciling of multiple trino clusters results in clusterwide coordinator downtime #618

@maxgruber19

Description

@maxgruber19

we're dealing with the issue of concurrent reconcilations when trinocluster resources change. this issue occurs e.g. when a catalog is applied to the cluster matching more than one catalog-matchlabel or when all trino cluster resources are changed at the same time because they are configured in custom helm wrappers.

since we use argo for continous deployments we are not able to change clusters / upsert catalogs subsequently in a manual way.

we did not make progress with trino-lb (#490) yet but I'm sure even with trino-lb running this would cause outages everytime the trinocluster resources are (re-)configured or catalogs are upserted. unfortunately running trino in a high available way is mission critical for our production scenario

possible solution: subsequent reconcilation

introducing a flag for the operator (maybe other product operators might be affecated as well) which enables subsequent reconcilations in a queue style instead of parallelized reconcilations which lead to all clusters going offline at the same time.

disadvantage might be that a malicious cluster kills the whole reconcilation process until the resource is fixed manually.

possible solution: pdb

we already defined following pdb to make sure one coordinator per kubernetes cluster is available. unfortunately the pdb is ignored and all coordinators get killed concurrently. @maltesander @sbernauer already told about delete operations instead of evictions which would take care of the pdb. feel free to edit / add some further details

apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: trino-highavailiability-coordinator
spec:
  minAvailable: 1
  selector:
    matchLabels:
      app.kubernetes.io/component: coordinator

Seems like somebody is feeling similar pain with elasticsearch kubernetes/kubernetes#91808 (comment)

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions