Chapter 9. Deploying machine health checks
You can configure and deploy a machine health check to automatically repair damaged machines in a machine pool.
Machine health checks is a Technology Preview feature only. Technology Preview features are not supported with Red Hat production service level agreements (SLAs) and might not be functionally complete. Red Hat does not recommend using them in production. These features provide early access to upcoming product features, enabling customers to test functionality and provide feedback during the development process.
For more information about the support scope of Red Hat Technology Preview features, see https://access.redhat.com/support/offerings/techpreview/.
This process is not applicable to clusters where you manually provisioned the machines yourself. You can use the advanced machine management and scaling capabilities only in clusters where the machine API is operational.
Prerequisites
Enable a FeatureGate so you can access Technology Preview features.
NoteTurning on Technology Preview features cannot be undone and prevents upgrades.
9.1. About MachineHealthChecks
MachineHealthChecks automatically repairs unhealthy Machines in a particular MachinePool.
To monitor machine health, you create a resource to define the configuration for a controller. You set a condition to check for, such as staying in the NotReady
status for 15 minutes or displaying a permanent condition in the node-problem-detector, and a label for the set of machines to monitor.
You cannot apply a MachineHealthCheck to a machine with the master role.
The controller that observes a MachineHealthCheck resource checks for the status that you defined. If a machine fails the health check, it is automatically deleted and a new one is created to take its place. When a machine is deleted, you see a machine deleted
event. To limit disruptive impact of the machine deletion, the controller drains and deletes only one node at a time.
To stop the check, you remove the resource.
9.2. Sample MachineHealthCheck resource
The MachineHealthCheck resource resembles the following YAML file:
MachineHealthCheck
apiVersion: healthchecking.openshift.io/v1alpha1 kind: MachineHealthCheck metadata: name: example 1 namespace: openshift-machine-api Spec: Selector: matchLabels: machine.openshift.io/cluster-api-machine-role: <label> 2 machine.openshift.io/cluster-api-machine-type: <label> 3 machine.openshift.io/cluster-api-machineset: <cluster_name>-<label>-<zone> 4
9.3. Creating a MachineHealthCheck resource
You can create a MachineHealthCheck resource for all MachinePools in your cluster except the master
pool.
Prerequisites
-
Install the
oc
command line interface.
Procedure
-
Create a
healthcheck.yml
file that contains the definition of your MachineHealthCheck. Apply the
healthcheck.yml
file to your cluster:$ oc apply -f healthcheck.yml