3.4. Sample MachineHealthCheck resource

The MachineHealthCheck resource resembles one of the following YAML files:

MachineHealthCheck for bare metal

apiVersion: machine.openshift.io/v1beta1
kind: MachineHealthCheck
metadata:
  name: example 
  namespace: openshift-machine-api
  annotations:
    machine.openshift.io/remediation-strategy: external-baremetal 
spec:
  selector:
    matchLabels:
      machine.openshift.io/cluster-api-machine-role: <role> 
      machine.openshift.io/cluster-api-machine-type: <role> 
      machine.openshift.io/cluster-api-machineset: <cluster_name>-<label>-<zone> 
  unhealthyConditions:
  - type:    "Ready"
    timeout: "300s" 
    status: "False"
  - type:    "Ready"
    timeout: "300s" 
    status: "Unknown"
  maxUnhealthy: "40%" 
  nodeStartupTimeout: "10m"

apiVersion: machine.openshift.io/v1beta1
kind: MachineHealthCheck
metadata:
  name: example


  namespace: openshift-machine-api
  annotations:
    machine.openshift.io/remediation-strategy: external-baremetal


spec:
  selector:
    matchLabels:
      machine.openshift.io/cluster-api-machine-role: <role>


      machine.openshift.io/cluster-api-machine-type: <role>


      machine.openshift.io/cluster-api-machineset: <cluster_name>-<label>-<zone>


  unhealthyConditions:
  - type:    "Ready"
    timeout: "300s"


    status: "False"
  - type:    "Ready"
    timeout: "300s"


    status: "Unknown"
  maxUnhealthy: "40%"


  nodeStartupTimeout: "10m"

Copy to Clipboard

Toggle word wrap

1: Specify the name of the machine health check to deploy.
2: For bare metal clusters, you must include the machine.openshift.io/remediation-strategy: external-baremetal annotation in the annotations section to enable power-cycle remediation. With this remediation strategy, unhealthy hosts are rebooted instead of removed from the cluster.
3 4: Specify a label for the machine pool that you want to check.
5: Specify the machine set to track in <cluster_name>-<label>-<zone> format. For example, prod-node-us-east-1a.
6 7: Specify the timeout duration for a node condition. If a condition is met for the duration of the timeout, the machine will be remediated. Long timeouts can result in long periods of downtime for a workload on an unhealthy machine.
8: Specify the amount of unhealthy machines allowed in the targeted pool. This can be set as a percentage or an integer.
9: Specify the timeout duration that a machine health check must wait for a node to join the cluster before a machine is determined to be unhealthy.

注意

The matchLabels are examples only; you must map your machine groups based on your specific needs.

MachineHealthCheck for all other installation types

apiVersion: machine.openshift.io/v1beta1
kind: MachineHealthCheck
metadata:
  name: example 
  namespace: openshift-machine-api
spec:
  selector:
    matchLabels:
      machine.openshift.io/cluster-api-machine-role: <role> 
      machine.openshift.io/cluster-api-machine-type: <role> 
      machine.openshift.io/cluster-api-machineset: <cluster_name>-<label>-<zone> 
  unhealthyConditions:
  - type:    "Ready"
    timeout: "300s" 
    status: "False"
  - type:    "Ready"
    timeout: "300s" 
    status: "Unknown"
  maxUnhealthy: "40%" 
  nodeStartupTimeout: "10m"

apiVersion: machine.openshift.io/v1beta1
kind: MachineHealthCheck
metadata:
  name: example


  namespace: openshift-machine-api
spec:
  selector:
    matchLabels:
      machine.openshift.io/cluster-api-machine-role: <role>


      machine.openshift.io/cluster-api-machine-type: <role>


      machine.openshift.io/cluster-api-machineset: <cluster_name>-<label>-<zone>


  unhealthyConditions:
  - type:    "Ready"
    timeout: "300s"


    status: "False"
  - type:    "Ready"
    timeout: "300s"


    status: "Unknown"
  maxUnhealthy: "40%"


  nodeStartupTimeout: "10m"

Copy to Clipboard

Toggle word wrap

1: Specify the name of the machine health check to deploy.
2 3: Specify a label for the machine pool that you want to check.
4: Specify the machine set to track in <cluster_name>-<label>-<zone> format. For example, prod-node-us-east-1a.
5 6: Specify the timeout duration for a node condition. If a condition is met for the duration of the timeout, the machine will be remediated. Long timeouts can result in long periods of downtime for a workload on an unhealthy machine.
7: Specify the amount of unhealthy machines allowed in the targeted pool. This can be set as a percentage or an integer.
8: Specify the timeout duration that a machine health check must wait for a node to join the cluster before a machine is determined to be unhealthy.

注意

The matchLabels are examples only; you must map your machine groups based on your specific needs.

3.4.1. Short-circuiting machine health check remediation
复制链接

Short circuiting ensures that machine health checks remediate machines only when the cluster is healthy. Short-circuiting is configured through the maxUnhealthy field in the MachineHealthCheck resource.

If the user defines a value for the maxUnhealthy field, before remediating any machines, the MachineHealthCheck compares the value of maxUnhealthy with the number of machines within its target pool that it has determined to be unhealthy. Remediation is not performed if the number of unhealthy machines exceeds the maxUnhealthy limit.

重要

If maxUnhealthy is not set, the value defaults to 100% and the machines are remediated regardless of the state of the cluster.

The maxUnhealthy field can be set as either an integer or percentage. There are different remediation implementations depending on the maxUnhealthy value.

3.4.1.1. Setting maxUnhealthy by using an absolute value
复制链接

If maxUnhealthy is set to 2:

Remediation will be performed if 2 or fewer nodes are unhealthy
Remediation will not be performed if 3 or more nodes are unhealthy

These values are independent of how many machines are being checked by the machine health check.

3.4.1.2. Setting maxUnhealthy by using percentages
复制链接

If maxUnhealthy is set to 40% and there are 25 machines being checked:

Remediation will be performed if 10 or fewer nodes are unhealthy
Remediation will not be performed if 11 or more nodes are unhealthy

If maxUnhealthy is set to 40% and there are 6 machines being checked:

Remediation will be performed if 2 or fewer nodes are unhealthy
Remediation will not be performed if 3 or more nodes are unhealthy

注意

The allowed number of machines is rounded down when the percentage of maxUnhealthy machines that are checked is not a whole number.

3.4. Sample MachineHealthCheck resource

3.4.1. Short-circuiting machine health check remediation
复制链接

3.4.1.1. Setting maxUnhealthy by using an absolute value
复制链接

3.4.1.2. Setting maxUnhealthy by using percentages
复制链接

学习

尝试、购买和销售

社区

关于红帽文档

让开源更具包容性

關於紅帽

Theme

Red Hat legal and privacy links

Red Hat legal and privacy links

3.4. Sample MachineHealthCheck resource

3.4.1. Short-circuiting machine health check remediation复制链接链接已复制到粘贴板!

3.4.1.1. Setting maxUnhealthy by using an absolute value复制链接链接已复制到粘贴板!

3.4.1.2. Setting maxUnhealthy by using percentages复制链接链接已复制到粘贴板!

学习

尝试、购买和销售

社区

关于红帽文档

让开源更具包容性

關於紅帽

Theme

Red Hat legal and privacy links

Red Hat legal and privacy links

3.4.1. Short-circuiting machine health check remediation
复制链接

3.4.1.1. Setting maxUnhealthy by using an absolute value
复制链接

3.4.1.2. Setting maxUnhealthy by using percentages
复制链接