Merge pull request #13605 from kalexand-rh/machine-health-check

kalexand-rh · web-flow · commit de5a699bdc91 · 2019-02-19T18:35:40.000-05:00
machine health check draft
diff --git a/_topic_map.yml b/_topic_map.yml
@@ -215,6 +215,8 @@ Name: Control Plane management
 Dir: control-plane-management
 Distros: openshift-origin, openshift-enterprise
 Topics:
+- Name: Deploying machine health checks
+  File: deploying-machine-health-checks
 - Name: Applying autoscaling to a cluster
   File: applying-autoscaling
 ---
diff --git a/control-plane-management/deploying-machine-health-checks.adoc b/control-plane-management/deploying-machine-health-checks.adoc
@@ -0,0 +1,14 @@
+[id='deploying-machine-health-checks']
+= Deploying machine heath checks
+include::modules/common-attributes.adoc[]
+:context: deploying-machine-health-checks
+toc::[]
+
+You can configure and deploy a machine health check to automatically repair
+damaged machines in a machine pool.
+
+include::modules/machine-health-checks-about.adoc[leveloffset=+1]
+
+include::modules/machine-health-checks-resource.adoc[leveloffset=+1]
+
+include::modules/machine-health-checks-creating.adoc[leveloffset=+1]
diff --git a/modules/machine-health-checks-about.adoc b/modules/machine-health-checks-about.adoc
@@ -0,0 +1,28 @@
+// Module included in the following assemblies:
+//
+// * master/deploying-machine-health-checks.adoc
+
+[id='machine-health-checks-about-{context}']
+= About MachineHealthChecks
+
+MachineHealthChecks automatically repair unhealthy Machines in a particular
+MachinePool.
+
+To monitor machine health, you create a resource to define the
+configuration for a controller. You set a condition to check for, such as
+staying in the `NotReady` status for 15 minutes or displaying a permanent condition
+in the node-problem-detector, and a label for the set of machines to monitor.
+
+[NOTE]
+====
+You cannot apply a MachineHealthCheck to a machine with the master role.
+====
+
+The controller that observes a MachineHealthCheck resource checks for the status
+that you defined. If a machine fails the health check, it is automatically deleted
+and a new one is created to take its place. When a machine is deleted, you
+see a `machine deleted` event. To limit disruptive impact of the machine
+deletion, the controller drains and deletes only one node at a time.
+
+
+To stop the check, you remove the resource.
diff --git a/modules/machine-health-checks-creating.adoc b/modules/machine-health-checks-creating.adoc
@@ -0,0 +1,24 @@
+// Module included in the following assemblies:
+//
+// * master/deploying-machine-health-checks.adoc
+
+[id='machine-health-checks-creating-{context}']
+= Creating a MachineHealthCheck resource
+
+You can create a MachineHealthCheck resource for all `MachinePools` in your
+cluster except the `master` pool.
+
+.Prerequisites
+
+* Install the `oc` command line and kubectl.
+
+.Procedure
+
+. Create a *_healthcheck.yml_* file that contains the definition of your
+MachineHealthCheck.
+
+. Apply the *_healthcheck.yml_* file to your cluster:
+[source,bash]
+----
+$ kubectl apply -f healcheck.yml
+----
diff --git a/modules/machine-health-checks-resource.adoc b/modules/machine-health-checks-resource.adoc
@@ -0,0 +1,67 @@
+// Module included in the following assemblies:
+//
+// * master/deploying-machine-health-checks.adoc
+
+[id='machine-health-checks-resource-{context}']
+= Sample MachineHealthCheck resource
+
+The MachineHealthCheck resource resembles the following YAML file:
+
+.MachineHealthCheck
+[source,yaml]
+----
+apiVersion: healthchecking.openshift.io/v1alpha1
+kind: MachineHealthCheck
+metadata:
+ name: example <1>
+ namespace: example <2>
+Spec:
+  Selector:
+    matchLabels:
+      sigs.k8s.io/cluster-api-cluster: <cluster_name> <3>
+      sigs.k8s.io/cluster-api-machine-role: <label> <4>
+      sigs.k8s.io/cluster-api-machine-type: <label> <4>
+      sigs.k8s.io/cluster-api-machineset: <cluster_name>-<label>-<AWS-zone> <5>
+----
+<1> Specify the name of the MachineHealthCheck to deploy. Include the name of the
+MachinePool to track.
+<2> Specify the namespace to deploy the MachineHealthCheck to.
+<3> Specify the name of your cluster.
+<4> Specify a label for the MachinePool that you want to check.
+<5> Specify the MachineSet to track in `<cluster_name>-<label>-<AWS-zone>`
+format. For example, `prod-node-us-east-1a`.
+
+
+
+////
+
+.MachinePoolHealthCheck
+[source,yaml]
+----
+apiVersion: healthchecking.machineapi.openshift.io/v1alpha1
+kind: MachinePoolHealthCheck
+metadata:
+ name: worker-pool-healthcheck
+ namespace: openshift-cluster-api
+ annotations:
+Spec:
+  MachineSelector:  metav1.LabelSelector
+----
+
+.MachineRemediation
+[source,yaml]
+----
+apiVersion: healthchecking.machineapi.openshift.io/v1alpha1
+kind: MachineRemediation
+metadata:
+ name: worker-pool-healthcheck-machineName
+ namespace: openshift-cluster-api
+ annotations:
+Spec:
+  machineName: “machineName”
+  remediationStrategy: “default”
+Status:
+  Phase:     “healthy”
+  Reason:    “no unhealthy conditions detected”
+  StartTime: “metav1.now()”
+////