Skip to content

Conversation

laiminhtrung1997
Copy link
Contributor

@laiminhtrung1997 laiminhtrung1997 commented Feb 4, 2024

Dear all,

I think we should configure topologySpreadConstraints to pod spec so these pods can spread zones for high availability.

Could someone review it, please? Thank you very much.

Best regards.

@laiminhtrung1997 laiminhtrung1997 force-pushed the add-topologySpreadConstraints branch 2 times, most recently from b948c94 to 44fcb1f Compare February 4, 2024 16:16
@FxKu FxKu added this to the 2.0 milestone Feb 13, 2024
@monotek
Copy link

monotek commented May 16, 2024

We need that feature too.

@FxKu FxKu modified the milestones: 2.0, 1.13.0 May 24, 2024
}
if !reflect.DeepEqual(c.Statefulset.Spec.Template.Spec.TopologySpreadConstraints, statefulSet.Spec.Template.Spec.TopologySpreadConstraints) {
needsReplace = true
needsRollUpdate = true
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this really need to trigger a rolling update of pods executed by operator? Will not K8s take care of it then once the statefulset is replaced?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hm good point. Maybe we can leave as is for now. With rolling update we make sure pods immediately adhere the new constraints.

InitContainersOld []v1.Container `json:"init_containers,omitempty"`
PodPriorityClassNameOld string `json:"pod_priority_class_name,omitempty"`

TopologySpreadConstraints []v1.TopologySpreadConstraint `json:"topologySpreadConstraints,omitempty"`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when introducing new fields to the api you have to update the CRDs, too. Unfortunately, we don't use a modern framework where this can be generated and you have to the manifests and in go code.

Plus, you need to run code generation -> ./hack/update-codegen.sh

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated.

false,
"",
false,
[]v1.TopologySpreadConstraint{},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

atm we reuse configured tolerations also for logical backup so I guess we can do the same with constraints

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok I got it.

return podAntiAffinity
}

func generateTopologySpreadConstraints(labels labels.Set, topologySpreadConstraintObjs []v1.TopologySpreadConstraint) []v1.TopologySpreadConstraint {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would like to see a unit test for this function :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a unit test for this function.

@FxKu
Copy link
Member

FxKu commented Jun 26, 2024

Can you also write an e2e test that tests that the constraints work as expected, please?

@FxKu FxKu modified the milestones: 1.13.0, 1.14.0 Jun 26, 2024
@laiminhtrung1997
Copy link
Contributor Author

Can you also write an e2e test that tests that the constraints work as expected, please?

Dear @FxKu
Thanks so much for your comments, I haven't written the UT or e2e test many, but I'll try my best. I'll mark it ready and let you know when I fixed the comments.
Best regards.

@laiminhtrung1997 laiminhtrung1997 force-pushed the add-topologySpreadConstraints branch 8 times, most recently from 530f847 to 18023cb Compare July 7, 2024 06:56
@laiminhtrung1997 laiminhtrung1997 force-pushed the add-topologySpreadConstraints branch 2 times, most recently from 3e99f92 to 794f6db Compare November 14, 2024 10:31
@FxKu
Copy link
Member

FxKu commented Nov 14, 2024

Advice for the future: Don't force push and squash your commits in the middle of a review. Now it's super hard for me to see what feedback you've reflected and I have to review everything again 😞

@laiminhtrung1997
Copy link
Contributor Author

@FxKu

It is note. I am truly sorry for this. It will not happen again.

@laiminhtrung1997
Copy link
Contributor Author

@FxKu

Currently, I have configured the operator using topologySpreadConstraints and Affinity.PodAntiAffinity together. My expectation is that the pods are always scheduled in different nodes and availability zones. This is my manifest for them.

# OperatorConfiguration
configuration:
  kubernetes:
    enable_pod_antiaffinity: true
    pod_antiaffinity_preferred_during_scheduling: false
    pod_antiaffinity_topology_key: kubernetes.io/hostname
    topology_spread_constraints:
    - max_skew: 1
      topology_key: topology.kubernetes.io/zone
      when_unsatisfiable: DoNotSchedule

Do you want me to put that scenario in the e2e test?

@laiminhtrung1997
Copy link
Contributor Author

Excuse me @FxKu

Is there any news on this?

@laiminhtrung1997 laiminhtrung1997 force-pushed the add-topologySpreadConstraints branch from 27f69f5 to b592ce8 Compare February 25, 2025 14:30
@laiminhtrung1997 laiminhtrung1997 force-pushed the add-topologySpreadConstraints branch 2 times, most recently from adfafef to 16caf11 Compare March 8, 2025 07:28
@laiminhtrung1997
Copy link
Contributor Author

Hi @FxKu

I have fixed all your comments. Please review the PR again. I look forward to hearing from you.

Best regards.

@laiminhtrung1997 laiminhtrung1997 force-pushed the add-topologySpreadConstraints branch from 16caf11 to dbb6aaa Compare March 21, 2025 04:40
@FxKu FxKu modified the milestones: 1.15.0, 1.16.0 Jul 15, 2025
@laiminhtrung1997 laiminhtrung1997 force-pushed the add-topologySpreadConstraints branch from dbb6aaa to a40ba6e Compare September 7, 2025 03:35
@zalando-robot
Copy link

Cannot start a pipeline due to:

No accountable user for this pipeline: no Zalando employee associated to this GitHub username

Click on pipeline status check Details link below for more information.

@FxKu FxKu modified the milestones: 1.16.0, 1.15.0 Sep 17, 2025
Comment on lines +888 to +891
if len(topologySpreadConstraintsSpec) > 0 {
podSpec.TopologySpreadConstraints = generateTopologySpreadConstraints(labels, topologySpreadConstraintsSpec)
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if len(topologySpreadConstraintsSpec) > 0 {
podSpec.TopologySpreadConstraints = generateTopologySpreadConstraints(labels, topologySpreadConstraintsSpec)
}
podSpec.TopologySpreadConstraints = generateTopologySpreadConstraints(labels, topologySpreadConstraintsSpec)

you only do a range loop inside the function so this len check is not necessary

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated it.

@FxKu
Copy link
Member

FxKu commented Oct 15, 2025

Can you please provide

  1. an example in the complete manifest (somewhere here)
  2. support in helm chart
  3. add documentation behind this paragraph

}
}
k8s.api.core_v1.patch_node(master_nodes[0], patch_node_label)
k8s.api.core_v1.patch_node(replica_nodes[0], patch_node_label)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would expect that the e2e test patches the Postgresql manifest and adds topologySpreadConstraints to then check if the pods spread evenly. But you're only patching the nodes here?

@FxKu FxKu modified the milestones: 1.15.0, 1.16.0 Oct 15, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

Status: Open Questions

Development

Successfully merging this pull request may close these issues.

4 participants