主页
产品
OpenShift Container Platform
4.5
Machine management
第 10 章 Deploying machine health checks

第 10 章 Deploying machine health checks

You can configure and deploy a machine health check to automatically repair damaged machines in a machine pool.

重要

This process is not applicable to clusters where you manually provisioned the machines yourself. You can use the advanced machine management and scaling capabilities only in clusters where the machine API is operational.

10.1. About machine health checks
复制链接

You can define conditions under which machines in a cluster are considered unhealthy by using a MachineHealthCheck resource. Machines matching the conditions are automatically remediated.

To monitor machine health, create a MachineHealthCheck custom resource (CR) that includes a label for the set of machines to monitor and a condition to check, such as staying in the NotReady status for 15 minutes or displaying a permanent condition in the node-problem-detector.

The controller that observes a MachineHealthCheck CR checks for the condition that you defined. If a machine fails the health check, the machine is automatically deleted and a new one is created to take its place. When a machine is deleted, you see a machine deleted event.

注意

For machines with the master role, the machine health check reports the number of unhealthy nodes, but the machine is not deleted. For example:

Example output

oc get machinehealthcheck example -n openshift-machine-api

$ oc get machinehealthcheck example -n openshift-machine-api

Copy to Clipboard

Toggle word wrap

NAME      MAXUNHEALTHY   EXPECTEDMACHINES   CURRENTHEALTHY
example   40%            3                  1

NAME      MAXUNHEALTHY   EXPECTEDMACHINES   CURRENTHEALTHY
example   40%            3                  1

Copy to Clipboard

Toggle word wrap

To limit the disruptive impact of machine deletions, the controller drains and deletes only one node at a time. If there are more unhealthy machines than the maxUnhealthy threshold allows for in the targeted pool of machines, the controller stops deleting machines and you must manually intervene.

To stop the check, remove the custom resource.

10.1.1. MachineHealthChecks on Bare Metal
复制链接

Machine deletion on bare metal cluster triggers reprovisioning of a bare metal host. Usually bare metal reprovisioning is a lengthy process, during which the cluster is missing compute resources and applications might be interrupted. To change the default remediation process from machine deletion to host power-cycle, annotate the MachineHealthCheck resource with the machine.openshift.io/remediation-strategy: external-baremetal annotation.

After you set the annotation, unhealthy machines are power-cycled by using BMC credentials.

10.1.2. Limitations when deploying machine health checks
复制链接

There are limitations to consider before deploying a machine health check:

Only machines owned by a machine set are remediated by a machine health check.
Control plane machines are not currently supported and are not remediated if they are unhealthy.
If the node for a machine is removed from the cluster, a machine health check considers the machine to be unhealthy and remediates it immediately.
If the corresponding node for a machine does not join the cluster after the nodeStartupTimeout, the machine is remediated.
A machine is remediated immediately if the Machine resource phase is Failed.

Additional resources

For more information about the node conditions you can define in a MachineHealthCheck CR, see About listing all the nodes in a cluster.
For more information about short-circuiting, see Short-circuiting machine health check remediation.

第 10 章 Deploying machine health checks

10.1. About machine health checks
复制链接

10.1.1. MachineHealthChecks on Bare Metal
复制链接

10.1.2. Limitations when deploying machine health checks
复制链接

学习

尝试、购买和销售

社区

关于红帽文档

让开源更具包容性

關於紅帽

Theme

Red Hat legal and privacy links

Red Hat legal and privacy links

第 10 章 Deploying machine health checks

10.1. About machine health checks复制链接链接已复制到粘贴板!

10.1.1. MachineHealthChecks on Bare Metal复制链接链接已复制到粘贴板!

10.1.2. Limitations when deploying machine health checks复制链接链接已复制到粘贴板!

学习

尝试、购买和销售

社区

关于红帽文档

让开源更具包容性

關於紅帽

Theme

Red Hat legal and privacy links

Red Hat legal and privacy links

10.1. About machine health checks
复制链接

10.1.1. MachineHealthChecks on Bare Metal
复制链接

10.1.2. Limitations when deploying machine health checks
复制链接