Skip to content

Commit 5dac874

Browse files
authored
Merge pull request #10630 from fabriziopandini/document-kcp-limitation
📖 Document KCP limitation
2 parents e8d4784 + d7a38f5 commit 5dac874

File tree

1 file changed

+3
-0
lines changed

1 file changed

+3
-0
lines changed

docs/book/src/tasks/automated-machine-management/healthchecking.md

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -235,6 +235,9 @@ Before deploying a MachineHealthCheck, please familiarise yourself with the foll
235235
- If the Node for a Machine is removed from the cluster, a MachineHealthCheck will consider this Machine unhealthy and remediate it immediately
236236
- If no Node joins the cluster for a Machine after the `NodeStartupTimeout`, the Machine will be remediated
237237
- If a Machine fails for any reason (if the FailureReason is set), the Machine will be remediated immediately
238+
- Important: if the kubelet on the node hosting the etcd leader member is not working, this prevents KCP from doing some checks it is expecting to do on the leader - and specifically on the leader -.
239+
This prevents remediation to happen. There are ongoing discussions about how to overcome this limitation in https://github.com/kubernetes-sigs/cluster-api/issues/8465; as of today users facing this situation
240+
are recommended to manually forward leadership to another etcd member and manually delete the corresponding machine.
238241

239242
<!-- links -->
240243
[management cluster]: ../../reference/glossary.md#management-cluster

0 commit comments

Comments
 (0)