4.2.8. Cluster high availability options
Configure high availability (HA) for your cluster using installer-provisioned infrastructure (IPI), Node Health Check Operator, or manual monitoring.
- Methods of configuring HA
Automatic high availability for installer-provisioned infrastructure is available by deploying machine health checks. For more information, see "Installer-provisioned infrastructure installation overview" and "About machine health checks" in the Additional resources section.
注記In OpenShift Container Platform clusters installed using installer-provisioned infrastructure and with a properly configured
MachineHealthCheckresource, if a node fails the machine health check and becomes unavailable to the cluster, it is recycled. What happens next with VMs that ran on the failed node depends on a series of conditions. See the "Run strategies" documentation for more detailed information about the potential outcomes and how run strategies affect those outcomes.Currently, installer-provisioned infrastructure is not supported on IBM Z®.
Automatic high availability for both IPI and non-IPI is available by using the Node Health Check Operator on the OpenShift Container Platform cluster to deploy the
NodeHealthCheckcontroller. The controller identifies unhealthy nodes and uses a remediation provider, such as the Self Node Remediation Operator or Fence Agents Remediation Operator, to remediate the unhealthy nodes. For more information on remediation, fencing, and maintaining nodes, see the Workload Availability for Red Hat OpenShift documentation.注記Fence Agents Remediation uses supported fencing agents to reset failed nodes faster than the Self Node Remediation Operator. This improves overall virtual machine high availability. For more information, see the OpenShift Virtualization - Fencing and VM High Availability Guide knowledgebase article.
High availability for any platform is available by using either a monitoring system or a qualified human to monitor node availability. When a node is lost, shut it down and run
oc delete node <lost_node>.注記Without an external monitoring system or a qualified human monitoring node health, virtual machines lose high availability.