Remediation, fencing, and maintenance


Workload Availability for Red Hat OpenShift 24.3

Workload Availability remediation, fencing, and maintenance

Red Hat Customer Content Services

Abstract

Information about workload availability operators and their usage

Preface

Providing feedback on Workload Availability for Red Hat OpenShift documentation

We appreciate your feedback on our documentation. Let us know how we can improve it. To do so:

  1. Go to the JIRA website.
  2. Enter a descriptive title in the Summary field.
  3. Enter your suggestion for improvement in the Description field. Include links to the relevant parts of the documentation.
  4. Enter your username in the Reporter field.
  5. Enter the affected versions in the Affects Version/s field.
  6. Click Create at the bottom of the dialog.

Chapter 1. About node remediation, fencing, and maintenance

Hardware is imperfect and software contains bugs. When node-level failures, such as the kernel hangs or network interface controllers (NICs) fail, the work required from the cluster does not decrease, and workloads from affected nodes need to be restarted somewhere. However, some workloads, such as ReadWriteOnce (RWO) volumes and StatefulSets, might require at-most-one semantics.

Failures affecting these workloads risk data loss, corruption, or both. It is important to ensure that the node reaches a safe state, known as fencing before initiating recovery of the workload, known as remediation and ideally, recovery of the node also.

It is not always practical to depend on administrator intervention to confirm the true status of the nodes and workloads. To facilitate such intervention, Red Hat OpenShift provides multiple components for the automation of failure detection, fencing and remediation.

1.1. Self Node Remediation

The Self Node Remediation Operator is a Red Hat OpenShift add-on Operator that implements an external system of fencing and remediation that reboots unhealthy nodes and deletes resources, such as Pods and VolumeAttachments. The reboot ensures that the workloads are fenced, and the resource deletion accelerates the rescheduling of affected workloads. Unlike other external systems, Self Node Remediation does not require any management interface, like, for example, Intelligent Platform Management Interface (IPMI) or an API for node provisioning.

Self Node Remediation can be used by failure detection systems, like Machine Health Check or Node Health Check.

1.2. Fence Agents Remediation

The Fence Agents Remediation (FAR) Operator is a Red Hat OpenShift add-on operator that automatically remediates unhealthy nodes, similar to the Self Node Remediation Operator. You can use well-known agents to fence and remediate unhealthy nodes. The remediation includes rebooting the unhealthy node using a fence agent, and then evicting workloads from the unhealthy node, depending on the remediation strategy.

1.3. Machine Deletion Remediation

The Machine Deletion Remediation (MDR) Operator is a Red Hat OpenShift add-on Operator that uses the Machine API to reprovision unhealthy nodes. MDR works with NodeHealthCheck (NHC) to create a Custom Resource (CR) for MDR with information about the unhealthy node.

MDR follows the annotation on the node to the associated machine object and confirms that it has an owning controller. MDR proceeds to delete the machine, and then the owning controller recreates a replacement machine.

1.4. Machine Health Check

Machine Health Check utilizes a Red Hat OpenShift built-in failure detection, fencing and remediation system, which monitors the status of machines and the conditions of nodes. Machine Health Checks can be configured to trigger external fencing and remediation systems, like Self Node Remediation.

1.5. Node Health Check

The Node Health Check Operator is a Red Hat OpenShift add-on Operator that implements a failure detection system that monitors node conditions. It does not have a built-in fencing or remediation system and so must be configured with an external system that provides these features. By default, it is configured to utilize the Self Node Remediation system.

1.6. Node Maintenance

Administrators face situations where they need to interrupt the cluster, for example, replace a drive, RAM, or a NIC.

In advance of this maintenance, affected nodes should be cordoned and drained. When a node is cordoned, new workloads cannot be scheduled on that node. When a node is drained, to avoid or minimize downtime, workloads on the affected node are transferred to other nodes.

While this maintenance can be achieved using command line tools, the Node Maintenance Operator offers a declarative approach to achieve this by using a custom resource. When such a resource exists for a node, the Operator cordons and drains the node until the resource is deleted.

1.7. Flow of events during fencing and remediation

When a node becomes unhealthy multiple events occur to detect, fence, and remediate the node, in order to restore workloads, and ideally the node, to health. Some events are triggered by the OpenShift cluster, and some events are reactions by the Workload Availability operators. Understanding this flow of events, and the duration between these events, is important to make informed decisions. Decisions including, which remediation provider to use, and how to configure Node Health Check Operator and the chosen remediaton provider.

Note

The example outlined next is a common use case that outlines the phased flow of events. Only when the Node Health Check Operator has the Ready=Unknown unhealthy conditions do the phases act as follows.

1.7.1. Phase 1 - Kubernetes Health Check (Core OpenShift)

The unhealthy node stops communicating with the API server. After approximately 50 seconds the API server sets the "Ready" condition of the node to "Unknown", that is, Ready=Unknown.

1.7.2. Phase 2 - Node Health Check (NHC)

If the Ready=Unknown condition is present longer than the configured duration, it starts a remediation.

The user-configured duration in this phase represents the tolerance that the Operator has towards the duration of the unhealthy condition. It takes into account that while the workload is restarting as requested, the resource is expected to be "Unready".

For example:

  • If you have a workload that takes a long time to restart, then you need to have a longer timeout.
  • Likewise, when the workload restart is short, then the timeout needed is also short.

1.7.3. Phase 3 - Remediate Host / Remediate API (depending on the configured remediation operator)

Using Machine Deletion Remediation (MDR), Self Node Remediation (SNR) or Fence Agents Remediation (FAR), the remediator fences and isolates the node by rebooting it in order to reach a safe state.

The details of this phase are configured by the user and depends on their workload requirements.

For example:

  • Machine Deletion Remediation - The choice of platform influences the time it takes to reprovision the machine, and then the duration of the remediation. MDR is only applicable to clusters that use Machine API.
  • Self Node Remediation - The remediation time depends on many parameters, including the safe time is takes to automatically reboot unhealthy nodes, and the watchdog devices used to ensure that the machine enters a safe state when an error condition is detected.
  • Fence Agents Remediation - The fencing agent time depends on many parameters including, the cluster nodes, the management interface, and the agent parameters.

1.7.4. Phase 4 - Workload starting

When MDR is used, the remediator deletes the resources. When FAR and SNR are used, varying remediation strategies are available for them to use. One strategy is OutOfServiceTaint which uses out-of-service taint to permit the deletion of resources in the cluster. In both cases, deleting the resources enables faster rescheduling of the affected workload. The workload is then rescheduled and restarted.

This phase is initiated automatically by the remediators when fencing is complete. If fencing does not complete, and an escalation remediation is required, the user must configure the timeout in seconds for the entire remediation process. If the timeout passes, and the node is still unhealthy, NHC will try the next remediator in line to remediate the unhealthy node.

1.8. About metrics for workload availability operators

The addition of data analysis enhances observability for the workload availability operators. The data provides metrics about the activity of the operators, and the effect on the cluster. These metrics improve decision-making capabilities, enable data-driven optimization, and enhance overall system performance.

You can use metrics to do these tasks:

  • Access comprehensive tracking data for operators, to monitor overall system efficiency.
  • Access actionable insights derived from tracking data, such as identifying frequently failing nodes, or downtime due to operator’s remediations.
  • Visualize how the operator’s remediations are actually enhancing the system efficiency.

1.8.1. Configuring metrics for workload availability operators

You can use the Red Hat OpenShift web console to install the Node Health Check Operator.

Prerequisites

Procedure

  1. Create the prometheus-user-token secret from the existing prometheus-user-workload-token secret as follows:

    existingPrometheusTokenSecret=$(kubectl get secret --namespace openshift-user-workload-monitoring | grep prometheus-user-workload-token | awk '{print $1}') 1
    
    kubectl get secret ${existingPrometheusTokenSecret} --namespace=openshift-user-workload-monitoring -o yaml | \
        sed '/namespace: .*==/d;/ca.crt:/d;/serviceCa.crt/d;/creationTimestamp:/d;/resourceVersion:/d;/uid:/d;/annotations/d;/kubernetes.io/d;' | \
        sed 's/namespace: .*/namespace: openshift-workload-availability/' | \ 2
        sed 's/name: .*/name: prometheus-user-workload-token/' | \ 3
        sed 's/type: .*/type: Opaque/' | \
        > prom-token.yaml
    
    kubectl apply -f prom-token.yaml
    1
    The prometheus-user-token is required by the Metric ServiceMonitor, created in the next step.
    2
    Ensure the new Secret’s namespace is the one where NHC Operator is installed, for example, openshift-workload-availability .
    3
    The prometheus-user-workload-token only exists if User Worload Prometheus scraping is enabled.
  2. Create the ServiceMonitor as follows:

    apiVersion: monitoring.coreos.com/v1
    kind: ServiceMonitor
    metadata:
      name: node-healthcheck-metrics-monitor
      namespace: openshift-workload-availability 1
      labels:
        app.kubernetes.io/component: controller-manager
    spec:
      endpoints:
      - interval: 30s
        port: https
        scheme: https
        authorization:
          type: Bearer
          credentials:
            name: prometheus-user-workload-token
            key: token
        tlsConfig:
          ca:
            configMap:
              name: nhc-serving-certs-ca-bundle
              key: service-ca.crt
          serverName: node-healthcheck-controller-manager-metrics-service.openshift-workload-availability.svc 2
      selector:
        matchLabels:
          app.kubernetes.io/component: controller-manager
          app.kubernetes.io/name: node-healthcheck-operator
          app.kubernetes.io/instance: metrics
    1
    Specify the namespace where you want to configure the metrics, for example, openshift-workload-availability.
    2
    The serverName must contain the same namespace where the Operator is installed. In the example, openshift-workload-availability is placed after the metrics service name and before the filetype extension.

Verification

To confirm that the configuration is successful the Observe > Targets tab in OCP Web UI shows Endpoint Up.

1.8.2. Example metrics for workload availability operators

The following are example metrics from the various workload availability operators.

The metrics include information on the following indicators:

  • Operator availability: Showing if and when each Operator is up and running.
  • Node remediation count: Showing the number of remediations across the same node, and across all nodes.
  • Node remediation duration: Showing the remediation downtime or recovery time.
  • Node remediation gauge: Showing the number of ongoing remediations.

Chapter 2. Using Self Node Remediation

You can use the Self Node Remediation Operator to automatically reboot unhealthy nodes. This remediation strategy minimizes downtime for stateful applications and ReadWriteOnce (RWO) volumes, and restores compute capacity if transient failures occur.

2.1. About the Self Node Remediation Operator

The Self Node Remediation Operator runs on the cluster nodes and reboots nodes that are identified as unhealthy. The Operator uses the MachineHealthCheck or NodeHealthCheck controller to detect the health of a node in the cluster. When a node is identified as unhealthy, the MachineHealthCheck or the NodeHealthCheck resource creates the SelfNodeRemediation custom resource (CR), which triggers the Self Node Remediation Operator.

The SelfNodeRemediation CR resembles the following YAML file:

apiVersion: self-node-remediation.medik8s.io/v1alpha1
kind: SelfNodeRemediation
metadata:
  name: selfnoderemediation-sample
  namespace: openshift-workload-availability
spec:
  remediationStrategy: <remediation_strategy> 1
status:
  lastError: <last_error_message> 2
1
Specifies the remediation strategy for the nodes.
2
Displays the last error that occurred during remediation. When remediation succeeds or if no errors occur, the field is left empty.

The Self Node Remediation Operator minimizes downtime for stateful applications and restores compute capacity if transient failures occur. You can use this Operator regardless of the management interface, such as IPMI or an API to provision a node, and regardless of the cluster installation type, such as installer-provisioned infrastructure or user-provisioned infrastructure.

2.1.1. About watchdog devices

Watchdog devices can be any of the following:

  • Independently powered hardware devices
  • Hardware devices that share power with the hosts they control
  • Virtual devices implemented in software, or softdog

Hardware watchdog and softdog devices have electronic or software timers, respectively. These watchdog devices are used to ensure that the machine enters a safe state when an error condition is detected. The cluster is required to repeatedly reset the watchdog timer to prove that it is in a healthy state. This timer might elapse due to fault conditions, such as deadlocks, CPU starvation, and loss of network or disk access. If the timer expires, the watchdog device assumes that a fault has occurred and the device triggers a forced reset of the node.

Hardware watchdog devices are more reliable than softdog devices.

2.1.1.1. Understanding Self Node Remediation Operator behavior with watchdog devices

The Self Node Remediation Operator determines the remediation strategy based on the watchdog devices that are present.

If a hardware watchdog device is configured and available, the Operator uses it for remediation. If a hardware watchdog device is not configured, the Operator enables and uses a softdog device for remediation.

If neither watchdog devices are supported, either by the system or by the configuration, the Operator remediates nodes by using software reboot.

2.2. Control plane fencing

In earlier releases, you could enable Self Node Remediation and Node Health Check on worker nodes. In the event of node failure, you can now also follow remediation strategies on control plane nodes.

Self Node Remediation occurs in two primary scenarios.

  • API Server Connectivity

    • In this scenario, the control plane node to be remediated is not isolated. It can be directly connected to the API Server, or it can be indirectly connected to the API Server through worker nodes or control-plane nodes, that are directly connected to the API Server.
    • When there is API Server Connectivity, the control plane node is remediated only if the Node Health Check Operator has created a SelfNodeRemediation custom resource (CR) for the node.
  • No API Server Connectivity

    • In this scenario, the control plane node to be remediated is isolated from the API Server. The node cannot connect directly or indirectly to the API Server.
    • When there is no API Server Connectivity, the control plane node will be remediated as outlined with these steps:

      • Check the status of the control plane node with the majority of the peer worker nodes. If the majority of the peer worker nodes cannot be reached, the node will be analyzed further.

        • Self-diagnose the status of the control plane node

          • If self diagnostics passed, no action will be taken.
          • If self diagnostics failed, the node will be fenced and remediated.
          • The self diagnostics currently supported are checking the kubelet service status, and checking endpoint availability using opt in configuration.
      • If the node did not manage to communicate to most of its worker peers, check the connectivity of the control plane node with other control plane nodes. If the node can communicate with any other control plane peer, no action will be taken. Otherwise, the node will be fenced and remediated.

2.3. Installing the Self Node Remediation Operator by using the web console

You can use the Red Hat OpenShift web console to install the Self Node Remediation Operator.

Note

The Node Health Check Operator also installs the Self Node Remediation Operator as a default remediation provider.

Prerequisites

  • Log in as a user with cluster-admin privileges.

Procedure

  1. In the Red Hat OpenShift web console, navigate to OperatorsOperatorHub.
  2. Select the Self Node Remediation Operator from the list of available Operators, and then click Install.
  3. Keep the default selection of Installation mode and namespace to ensure that the Operator is installed to the openshift-workload-availability namespace.
  4. Click Install.

Verification

To confirm that the installation is successful:

  1. Navigate to the OperatorsInstalled Operators page.
  2. Check that the Operator is installed in the openshift-workload-availability namespace and its status is Succeeded.

If the Operator is not installed successfully:

  1. Navigate to the OperatorsInstalled Operators page and inspect the Status column for any errors or failures.
  2. Navigate to the WorkloadsPods page and check the logs of the self-node-remediation-controller-manager pod and self-node-remediation-ds pods in the openshift-workload-availability project for any reported issues.

2.4. Installing the Self Node Remediation Operator by using the CLI

You can use the OpenShift CLI (oc) to install the Self Node Remediation Operator.

You can install the Self Node Remediation Operator in your own namespace or in the openshift-workload-availability namespace.

Prerequisites

  • Install the OpenShift CLI (oc).
  • Log in as a user with cluster-admin privileges.

Procedure

  1. Create a Namespace custom resource (CR) for the Self Node Remediation Operator:

    1. Define the Namespace CR and save the YAML file, for example, workload-availability-namespace.yaml:

      apiVersion: v1
      kind: Namespace
      metadata:
        name: openshift-workload-availability
    2. To create the Namespace CR, run the following command:

      $ oc create -f workload-availability-namespace.yaml
  2. Create an OperatorGroup CR:

    1. Define the OperatorGroup CR and save the YAML file, for example, workload-availability-operator-group.yaml:

      apiVersion: operators.coreos.com/v1
      kind: OperatorGroup
      metadata:
        name: workload-availability-operator-group
        namespace: openshift-workload-availability
    2. To create the OperatorGroup CR, run the following command:

      $ oc create -f workload-availability-operator-group.yaml
  3. Create a Subscription CR:

    1. Define the Subscription CR and save the YAML file, for example, self-node-remediation-subscription.yaml:

      apiVersion: operators.coreos.com/v1alpha1
      kind: Subscription
      metadata:
          name: self-node-remediation-operator
          namespace: openshift-workload-availability 1
      spec:
          channel: stable
          installPlanApproval: Manual 2
          name: self-node-remediation-operator
          source: redhat-operators
          sourceNamespace: openshift-marketplace
          package: self-node-remediation
      1
      Specify the Namespace where you want to install the Self Node Remediation Operator. To install the Self Node Remediation Operator in the openshift-workload-availability namespace, specify openshift-workload-availability in the Subscription CR.
      2
      Set the approval strategy to Manual in case your specified version is superseded by a later version in the catalog. This plan prevents an automatic upgrade to a later version and requires manual approval before the starting CSV can complete the installation.
    2. To create the Subscription CR, run the following command:

      $ oc create -f self-node-remediation-subscription.yaml
  4. Verify that the Self Node Remediation Operator created the SelfNodeRemediationTemplate CR:

    $ oc get selfnoderemediationtemplate -n openshift-workload-availability

    Example output

    self-node-remediation-automatic-strategy-template

Verification

  1. Verify that the installation succeeded by inspecting the CSV resource:

    $ oc get csv -n openshift-workload-availability

    Example output

    NAME                               DISPLAY                          VERSION   REPLACES   PHASE
    self-node-remediation.v0.8.0       Self Node Remediation Operator   v.0.8.0   self-node-remediation.v0.7.1           Succeeded

  2. Verify that the Self Node Remediation Operator is up and running:

    $ oc get deployment -n openshift-workload-availability

    Example output

    NAME                                        READY   UP-TO-DATE   AVAILABLE   AGE
    self-node-remediation-controller-manager    1/1     1            1           28h

  3. Verify that the Self Node Remediation Operator created the SelfNodeRemediationConfig CR:

    $ oc get selfnoderemediationconfig -n openshift-workload-availability

    Example output

    NAME                           AGE
    self-node-remediation-config   28h

  4. Verify that each self node remediation pod is scheduled and running on each worker node and control plane node:

    $ oc get daemonset -n openshift-workload-availability

    Example output

    NAME                      DESIRED  CURRENT  READY  UP-TO-DATE  AVAILABLE  NODE SELECTOR  AGE
    self-node-remediation-ds  6        6        6      6           6          <none>         28h

2.5. Configuring the Self Node Remediation Operator

The Self Node Remediation Operator creates the SelfNodeRemediationConfig CR and the SelfNodeRemediationTemplate Custom Resource Definition (CRD).

Note

To avoid unexpected reboots of a specific node, the Node Maintenance Operator places the node in maintenance mode and automatically adds a node selector that prevents the SNR daemonset from running on the specific node.

2.5.1. Understanding the Self Node Remediation Operator configuration

The Self Node Remediation Operator creates the SelfNodeRemediationConfig CR with the name self-node-remediation-config. The CR is created in the namespace of the Self Node Remediation Operator.

A change in the SelfNodeRemediationConfig CR re-creates the Self Node Remediation daemon set.

The SelfNodeRemediationConfig CR resembles the following YAML file:

apiVersion: self-node-remediation.medik8s.io/v1alpha1
kind: SelfNodeRemediationConfig
metadata:
  name: self-node-remediation-config
  namespace: openshift-workload-availability
spec:
  safeTimeToAssumeNodeRebootedSeconds: 180 1
  watchdogFilePath: /dev/watchdog 2
  isSoftwareRebootEnabled: true 3
  apiServerTimeout: 15s 4
  apiCheckInterval: 5s 5
  maxApiErrorThreshold: 3 6
  peerApiServerTimeout: 5s 7
  peerDialTimeout: 5s 8
  peerRequestTimeout: 5s 9
  peerUpdateInterval: 15m 10
  hostPort: 30001 11
  customDsTolerations: 12
      - effect: NoSchedule
        key: node-role.kubernetes.io.infra
        operator: Equal
        value: "value1"
        tolerationSeconds: 3600
1
Specify an optional time duration that the Operator waits before recovering affected workloads running on an unhealthy node. Starting replacement pods while they are still running on the failed node can lead to data corruption and a violation of run-once semantics. The Operator calculates a minimum duration using the values in the ApiServerTimeout, ApiCheckInterval, MaxApiErrorThreshold, PeerDialTimeout, and PeerRequestTimeout fields, as well as the watchdog timeout and the cluster size at the time of remediation. To check the minimum duration calculation, view the manager pod logs and find references to the calculated minimum time in seconds.

If you specify a value that is lower than the minimum duration, the Operator uses the minimum duration. However, if you want to increase the duration to a value higher than this minimum value, you can set safeTimeToAssumeNodeRebootedSeconds to a value higher than the minimum duration.

2
Specify the file path of the watchdog device in the nodes. If you enter an incorrect path to the watchdog device, the Self Node Remediation Operator automatically detects the softdog device path.

If a watchdog device is unavailable, the SelfNodeRemediationConfig CR uses a software reboot.

3
Specify if you want to enable software reboot of the unhealthy nodes. By default, the value of isSoftwareRebootEnabled is set to true. To disable the software reboot, set the parameter value to false.
4
Specify the timeout duration to check connectivity with each API server. When this duration elapses, the Operator starts remediation. The timeout duration must be greater than or equal to 10 milliseconds.
5
Specify the frequency to check connectivity with each API server. The timeout duration must be greater than or equal to 1 second.
6
Specify a threshold value. After reaching this threshold, the node starts contacting its peers. The threshold value must be greater than or equal to 1 second.
7
Specify the duration of the timeout for the peer to connect the API server. The timeout duration must be greater than or equal to 10 milliseconds.
8
Specify the duration of the timeout for establishing connection with the peer. The timeout duration must be greater than or equal to 10 milliseconds.
9
Specify the duration of the timeout to get a response from the peer. The timeout duration must be greater than or equal to 10 milliseconds.
10
Specify the frequency to update peer information such as IP address. The timeout duration must be greater than or equal to 10 seconds.
11
Specify an optional value to change the port that Self Node Remediation agents use for internal communication. The value must be greater than 0. The default value is port 30001.
12
Specify custom toleration Self Node Remediation agents that are running on the DaemonSets to support remediation for different types of nodes. You can configure the following fields:
  • effect: The effect indicates the taint effect to match. If this field is empty, all taint effects are matched. When specified, allowed values are NoSchedule, PreferNoSchedule and NoExecute.
  • key: The key is the taint key that the toleration applies to. If this field is empty, all taint keys are matched. If the key is empty, the operator field must be Exists. This combination means to match all values and all keys.
  • operator: The operator represents a key’s relationship to the value. Valid operators are Exists and Equal. The default is Equal. Exists is equivalent to a wildcard for a value, so that a pod can tolerate all taints of a particular category.
  • value: The taint value the toleration matches to. If the operator is Exists, the value should be empty, otherwise it is just a regular string.
  • tolerationSeconds: The period of time the toleration (which must be of effect NoExecute, otherwise this field is ignored) tolerates the taint. By default, it is not set, which means tolerate the taint forever (that is, do not evict). Zero and negative values will be treated as 0 (that is evict immediately) by the system.
  • Custom toleration allows you to add a toleration to the Self Node Remediation agent pod. For more information, see Using tolerations to control OpenShift Logging pod placement.
Note

You can edit the self-node-remediation-config CR that is created by the Self Node Remediation Operator. However, when you try to create a new CR for the Self Node Remediation Operator, the following message is displayed in the logs:

controllers.SelfNodeRemediationConfig
ignoring selfnoderemediationconfig CRs that are not named 'self-node-remediation-config'
or not in the namespace of the operator:
'openshift-workload-availability' {"selfnoderemediationconfig":
"openshift-workload-availability/selfnoderemediationconfig-copy"}

2.5.2. Understanding the Self Node Remediation Template configuration

The Self Node Remediation Operator also creates the SelfNodeRemediationTemplate Custom Resource Definition (CRD). This CRD defines the remediation strategy for the nodes. The following remediation strategies are available:

Automatic
This remediation strategy simplifies the remediation process by letting the Self Node Remediation Operator decide on the most suitable remediation strategy for the cluster. This strategy checks if the OutOfServiceTaint strategy is available on the cluster. If the OutOfServiceTaint strategy is available, the Operator selects the OutOfServiceTaint strategy. If the OutOfServiceTaint strategy is not available, the Operator selects the ResourceDeletion strategy. Automatic is the default remediation strategy.
ResourceDeletion
This remediation strategy removes the pods on the node, rather than the removal of the node object. This strategy recovers workloads faster.
OutOfServiceTaint
This remediation strategy implicitly causes the removal of the pods and associated volume attachments on the node, rather than the removal of the node object. It achieves this by placing the OutOfServiceTaint strategy on the node. This strategy recovers workloads faster. This strategy has been supported on technology preview since OpenShift Container Platform version 4.13, and on general availability since OpenShift Container Platform version 4.15.

The Self Node Remediation Operator creates the SelfNodeRemediationTemplate CR for the strategy self-node-remediation-automatic-strategy-template, which the Automatic remediation strategy uses.

The SelfNodeRemediationTemplate CR resembles the following YAML file:

apiVersion: self-node-remediation.medik8s.io/v1alpha1
kind: SelfNodeRemediationTemplate
metadata:
  creationTimestamp: "2022-03-02T08:02:40Z"
  name: self-node-remediation-<remediation_object>-deletion-template 1
  namespace: openshift-workload-availability
spec:
  template:
    spec:
      remediationStrategy: <remediation_strategy>  2
1
Specifies the type of remediation template based on the remediation strategy. Replace <remediation_object> with either resource or node; for example, self-node-remediation-resource-deletion-template.
2
Specifies the remediation strategy. The default remediation strategy is Automatic.

2.5.3. Troubleshooting the Self Node Remediation Operator

2.5.3.1. General troubleshooting
Issue
You want to troubleshoot issues with the Self Node Remediation Operator.
Resolution
Check the Operator logs.
2.5.3.2. Checking the daemon set
Issue
The Self Node Remediation Operator is installed but the daemon set is not available.
Resolution
Check the Operator logs for errors or warnings.
2.5.3.3. Unsuccessful remediation
Issue
An unhealthy node was not remediated.
Resolution

Verify that the SelfNodeRemediation CR was created by running the following command:

$ oc get snr -A

If the MachineHealthCheck controller did not create the SelfNodeRemediation CR when the node turned unhealthy, check the logs of the MachineHealthCheck controller. Additionally, ensure that the MachineHealthCheck CR includes the required specification to use the remediation template.

If the SelfNodeRemediation CR was created, ensure that its name matches the unhealthy node or the machine object.

2.5.3.4. Daemon set and other Self Node Remediation Operator resources exist even after uninstalling the Operator
Issue
The Self Node Remediation Operator resources, such as the daemon set, configuration CR, and the remediation template CR, exist even after after uninstalling the Operator.
Resolution

To remove the Self Node Remediation Operator resources, delete the resources by running the following commands for each resource type:

$ oc delete ds <self-node-remediation-ds> -n <namespace>
$ oc delete snrc <self-node-remediation-config> -n <namespace>
$ oc delete snrt <self-node-remediation-template> -n <namespace>

2.5.4. Gathering data about the Self Node Remediation Operator

To collect debugging information about the Self Node Remediation Operator, use the must-gather tool. For information about the must-gather image for the Self Node Remediation Operator, see Gathering data about specific features.

2.5.5. Additional resources

Chapter 3. Using Fence Agents Remediation

You can use the Fence Agents Remediation Operator to automatically remediate unhealthy nodes, similar to the Self Node Remediation Operator. FAR is designed to run an existing set of upstream fencing agents on environments with a traditional API end-point, for example, IPMI, for power cycling cluster nodes, while their pods are quickly evicted based on the remediation strategy.

3.1. About the Fence Agents Remediation Operator

The Fence Agents Remediation (FAR) Operator uses external tools to fence unhealthy nodes. These tools are a set of fence agents, where each fence agent can be used for different environments to fence a node, and using a traditional Application Programming Interface (API) call that reboots a node. By doing so, FAR can minimize downtime for stateful applications, restores compute capacity if transient failures occur, and increases the availability of workloads.

FAR not only fences a node when it becomes unhealthy, it also tries to remediate the node from being unhealthy to healthy. It adds a taint to evict stateless pods, fences the node with a fence agent, and after a reboot, it completes the remediation with resource deletion to remove any remaining workloads (mostly stateful workloads). Adding the taint and deleting the workloads accelerates the workload rescheduling.

The Operator watches for new or deleted custom resources (CRs) called FenceAgentsRemediation which trigger a fence agent to remediate a node, based on the CR’s name. FAR uses the NodeHealthCheck controller to detect the health of a node in the cluster. When a node is identified as unhealthy, the NodeHealthCheck resource creates the FenceAgentsRemediation CR, based on the FenceAgentsRemediationTemplate CR, which then triggers the Fence Agents Remediation Operator.

FAR uses a fence agent to fence a Kubernetes node. Generally, fencing is the process of taking unresponsive/unhealthy computers into a safe state, and isolating the computer. Fence agent is a software code that uses a management interface to perform fencing, mostly power-based fencing which enables power-cycling, reset, or turning off the computer. An example fence agent is fence_ipmilan which is used for Intelligent Platform Management Interface (IPMI) environments.

apiVersion: fence-agents-remediation.medik8s.io/v1alpha1
kind: FenceAgentsRemediation
metadata:
  name: node-name 1
  namespace: openshift-workload-availability
spec:
  remediationStrategy: <remediation_strategy> 2
1
The node-name should match the name of the unhealthy cluster node.
2
Specifies the remediation strategy for the nodes. For more information on the remediation strategies available, see the Understanding the Fence Agents Remediation Template configuration topic.

The Operator includes a set of fence agents, that are also available in the Red Hat High Availability Add-On, which use a management interface, such as IPMI or an API, to provision/reboot a node for bare metal servers, virtual machines, and cloud platforms.

3.2. Installing the Fence Agents Remediation Operator by using the web console

You can use the Red Hat OpenShift web console to install the Fence Agents Remediation Operator.

Prerequisites

  • Log in as a user with cluster-admin privileges.

Procedure

  1. In the Red Hat OpenShift web console, navigate to OperatorsOperatorHub.
  2. Select the Fence Agents Remediation Operator, or FAR, from the list of available Operators, and then click Install.
  3. Keep the default selection of Installation mode and namespace to ensure that the Operator is installed to the openshift-workload-availability namespace.
  4. Click Install.

Verification

To confirm that the installation is successful:

  1. Navigate to the OperatorsInstalled Operators page.
  2. Check that the Operator is installed in the openshift-workload-availability namespace and its status is Succeeded.

If the Operator is not installed successfully:

  1. Navigate to the OperatorsInstalled Operators page and inspect the Status column for any errors or failures.
  2. Navigate to the WorkloadsPods page and check the log of the fence-agents-remediation-controller-manager pod for any reported issues.

3.3. Installing the Fence Agents Remediation Operator by using the CLI

You can use the OpenShift CLI (oc) to install the Fence Agents Remediation Operator.

You can install the Fence Agents Remediation Operator in your own namespace or in the openshift-workload-availability namespace.

Prerequisites

  • Install the OpenShift CLI (oc).
  • Log in as a user with cluster-admin privileges.

Procedure

  1. Create a Namespace custom resource (CR) for the Fence Agents Remediation Operator:

    1. Define the Namespace CR and save the YAML file, for example, workload-availability-namespace.yaml:

      apiVersion: v1
      kind: Namespace
      metadata:
        name: openshift-workload-availability
    2. To create the Namespace CR, run the following command:

      $ oc create -f workload-availability-namespace.yaml
  2. Create an OperatorGroup CR:

    1. Define the OperatorGroup CR and save the YAML file, for example, workload-availability-operator-group.yaml:

      apiVersion: operators.coreos.com/v1
      kind: OperatorGroup
      metadata:
        name: workload-availability-operator-group
        namespace: openshift-workload-availability
    2. To create the OperatorGroup CR, run the following command:

      $ oc create -f workload-availability-operator-group.yaml
  3. Create a Subscription CR:

    1. Define the Subscription CR and save the YAML file, for example, fence-agents-remediation-subscription.yaml:

      apiVersion: operators.coreos.com/v1alpha1
      kind: Subscription
      metadata:
          name: fence-agents-remediation-subscription
          namespace: openshift-workload-availability 1
      spec:
          channel: stable
          name: fence-agents-remediation
          source: redhat-operators
          sourceNamespace: openshift-marketplace
          package: fence-agents-remediation
      1
      Specify the Namespace where you want to install the Fence Agents Remediation Operator, for example, the openshift-workload-availability outlined earlier in this procedure. You can install the Subscription CR for the Fence Agents Remediation Operator in the openshift-workload-availability namespace where there is already a matching OperatorGroup CR.
    2. To create the Subscription CR, run the following command:

      $ oc create -f fence-agents-remediation-subscription.yaml

Verification

  1. Verify that the installation succeeded by inspecting the CSV resource:

    $ oc get csv -n openshift-workload-availability

    Example output

    NAME                               DISPLAY                          VERSION   REPLACES   PHASE
    fence-agents-remediation.v0.3.0      Fence Agents Remediation Operator   0.3.0   fence-agents-remediation.v0.2.1           Succeeded

  2. Verify that the Fence Agents Remediation Operator is up and running:

    $ oc get deployment -n openshift-workload-availability

    Example output

    NAME                                        READY   UP-TO-DATE   AVAILABLE   AGE
    fence-agents-remediation-controller-manager    2/2     2            2           110m

3.4. Configuring the Fence Agents Remediation Operator

You can use the Fence Agents Remediation Operator to create the FenceAgentsRemediationTemplate Custom Resource (CR), which is used by the Node Health Check Operator (NHC). This CR defines the fence agent to be used in the cluster with all the required parameters for remediating the nodes. There may be many FenceAgentsRemediationTemplate CRs, at most one for each fence agent, and when NHC is being used it can choose the FenceAgentsRemediationTemplate as the remediationTemplate to be used for power-cycling the node.

The FenceAgentsRemediationTemplate CR resembles the following YAML file:

apiVersion: fence-agents-remediation.medik8s.io/v1alpha1
kind: FenceAgentsRemediationTemplate
metadata:
  name: fence-agents-remediation-template-fence-ipmilan
  namespace: openshift-workload-availability
spec:
  template:
    spec:
      agent: fence_ipmilan 1
      nodeparameters: 2
        --ipport:
          master-0-0: '6230'
          master-0-1: '6231'
          master-0-2: '6232'
          worker-0-0: '6233'
          worker-0-1: '6234'
          worker-0-2: '6235'
      sharedparameters: 3
        '--action': reboot
        '--ip': 192.168.123.1
        '--lanplus': ''
        '--password': password
        '--username': admin
      retryCount: '5' 4
      retryInterval: '5s' 5
      timeout: '60s' 6
1
Displays the name of the fence agent to be executed, for example, fence_ipmilan.
2
Displays the node-specific parameters for executing the fence agent, for example, ipport.
3
Displays the cluster-wide parameters for executing the fence agent, for example, username.
4
Displays the number of times to retry the fence agent command in case of failure. The default number of attempts is 5.
5
Displays the interval between retries in seconds. The default is 5 seconds.
6
Displays the timeout for the fence agent command. The default is 60 seconds. For values of 60 seconds or greater, the timeout value is expressed in both minutes and seconds in the YAML file.

3.4.1. Understanding the Fence Agents Remediation Template configuration

The Fence Agents Remediation Operator also creates the FenceAgentsRemediationTemplate Custom Resource Definition (CRD). This CRD defines the remediation strategy for the nodes that is aimed to recover workloads faster. The following remediation strategies are available:

ResourceDeletion
This remediation strategy removes the pods on the node. This strategy recovers workloads faster.
OutOfServiceTaint
This remediation strategy implicitly causes the removal of the pods and associated volume attachments on the node. It achieves this by placing the OutOfServiceTaint taint on the node. The OutOfServiceTaint strategy also represents a non-graceful node shutdown. A non-graceful node shutdown occurs when a node is shut down and not detected, instead of triggering an in-operating system shutdown. This strategy has been supported on technology preview since OpenShift Container Platform version 4.13, and on general availability since OpenShift Container Platform version 4.15.

The FenceAgentsRemediationTemplate CR resembles the following YAML file:

apiVersion: fence-agents-remediation.medik8s.io/v1alpha1
kind: FenceAgentsRemediationTemplate
metadata:
  name: fence-agents-remediation-<remediation_object>-deletion-template 1
  namespace: openshift-workload-availability
spec:
  template:
    spec:
      remediationStrategy: <remediation_strategy>  2
1
Specifies the type of remediation template based on the remediation strategy. Replace <remediation_object> with either resource or taint; for example, fence-agents-remediation-resource-deletion-template.
2
Specifies the remediation strategy. The remediation strategy can either be ResourceDeletion or OutOfServiceTaint.

3.5. Troubleshooting the Fence Agents Remediation Operator

3.5.1. General troubleshooting

Issue
You want to troubleshoot issues with the Fence Agents Remediation Operator.
Resolution

Check the Operator logs.

$ oc logs <fence-agents-remediation-controller-manager-name> -c manager -n <namespace-name>

3.5.2. Unsuccessful remediation

Issue
An unhealthy node was not remediated.
Resolution

Verify that the FenceAgentsRemediation CR was created by running the following command:

$ oc get far -A

If the NodeHealthCheck controller did not create the FenceAgentsRemediation CR when the node turned unhealthy, check the logs of the NodeHealthCheck controller. Additionally, ensure that the NodeHealthCheck CR includes the required specification to use the remediation template.

If the FenceAgentsRemediation CR was created, ensure that its name matches the unhealthy node object.

3.5.3. Fence Agents Remediation Operator resources exist after uninstalling the Operator

Issue
The Fence Agents Remediation Operator resources, such as the remediation CR and the remediation template CR, exist after uninstalling the Operator.
Resolution

To remove the Fence Agents Remediation Operator resources, you can delete the resources by selecting the "Delete all operand instances for this operator" checkbox before uninstalling. This checkbox feature is only available in Red Hat OpenShift since version 4.13. For all versions of Red Hat OpenShift, you can delete the resources by running the following relevant command for each resource type:

$ oc delete far <fence-agents-remediation> -n <namespace>
$ oc delete fartemplate <fence-agents-remediation-template> -n <namespace>

The remediation CR far must be created and deleted by the same entity, for example, NHC. If the remediation CR far is still present, it is deleted, together with the FAR operator.

The remediation template CR fartemplate only exists if you use FAR with NHC. When the FAR operator is deleted using the web console, the remediation template CR fartemplate is also deleted.

3.6. Gathering data about the Fence Agents Remediation Operator

To collect debugging information about the Fence Agents Remediation Operator, use the must-gather tool. For information about the must-gather image for the Fence Agents Remediation Operator, see Gathering data about specific features.

3.7. Agents supported by the Fence Agents Remediation Operator

This section describes the agents currently supported by the Fence Agents Remediation Operator.

Most of the supported agents can be grouped by the node’s hardware proprietary and usage, as follows:

  1. BareMetal
  2. Virtualization
  3. Intel
  4. HP
  5. IBM
  6. VMware
  7. Cisco
  8. APC
  9. Dell
  10. Other
Table 3.1. BareMetal - Using the Redfish management interface is recommended, unless it is not supported.
AgentDescription

fence_redfish

An I/O Fencing agent that can be used with Out-of-Band controllers that support Redfish APIs.

fence_ipmilan [a]

An I/O Fencing agent that can be used with machines controlled by IPMI.

[a] This description also applies for the agents fence_ilo3, fence_ilo4, fence_ilo5, fence_imm, fence_idrac, and fence_ipmilanplus.
Table 3.2. Virtualization
AgentDescription

fence_rhevm

An I/O Fencing agent that can be used with RHEV-M REST API to fence virtual machines.

fence_virt [a]

An I/O Fencing agent that can be used with virtual machines.

[a] This description also applies for the agent fence_xvm.
Table 3.3. Intel
AgentDescription

fence_amt_ws

An I/O Fencing agent that can be used with Intel AMT (WS).

fence_intelmodular

An I/O Fencing agent that can be used with Intel Modular device (tested on Intel MFSYS25, should also work with MFSYS35).

Table 3.4. HP - agents for the iLO management interface or BladeSystem.
AgentDescription

fence_ilo [a]

An I/O Fencing agent that can be used for HP servers with the Integrated Light Out (iLO) PCI card.

fence_ilo_ssh [b]

A fencing agent that can be used to connect to an iLO device. It logs into device via ssh and reboot a specified outlet.

fence_ilo_moonshot

An I/O Fencing agent that can be used with HP Moonshot iLO.

fence_ilo_mp

An I/O Fencing agent that can be used with HP iLO MP.

fence_hpblade

An I/O Fencing agent that can be used with HP BladeSystem and HP Integrity Superdome X.

[a] This description also applies for the agent fence_ilo2.
[b] This description also applies for the agents fence_ilo3_ssh, fence_ilo4_ssh, and fence_ilo5_ssh.
Table 3.5. IBM
AgentDescription

fence_bladecenter

An I/O Fencing agent that can be used with IBM Bladecenters with recent enough firmware that includes telnet support.

fence_ibmblade

An I/O Fencing agent that can be used with IBM BladeCenter chassis.

fence_ipdu

An I/O Fencing agent that can be used with the IBM iPDU network power switch.

fence_rsa

An I/O Fencing agent that can be used with the IBM RSA II management interface.

Table 3.6. VMware
AgentDescription

fence_vmware_rest

An I/O Fencing agent that can be used with VMware API to fence virtual machines.

fence_vmware_soap

An I/O Fencing agent that can be used with the virtual machines managed by VMWare products that have SOAP API v4.1+.

Table 3.7. Cisco
AgentDescription

fence_cisco_mds

An I/O Fencing agent that can be used with any Cisco MDS 9000 series with SNMP enabled device.

fence_cisco_ucs

An I/O Fencing agent that can be used with Cisco UCS to fence machines.

Table 3.8. APC
AgentDescription

fence_apc

An I/O Fencing agent that can be used with the APC network power switch.

fence_apc_snmp

An I/O Fencing agent that can be used with the APC network power switch or Tripplite PDU devices.

Table 3.9. Dell
AgentDescription

fence_drac5

An I/O Fencing agent that can be used with the Dell Remote Access Card v5 or CMC (DRAC).

Table 3.10. Other - agents for usage not listed in the previous tables.
AgentDescription

fence_brocade

An I/O Fencing agent that can be used with Brocade FC switches.

fence_compute

A resource that can be used to tell Nova that compute nodes are down and to reschedule flagged instances.

fence_eaton_snmp

An I/O Fencing agent that can be used with the Eaton network power switch.

fence_emerson

An I/O Fencing agent that can be used with MPX and MPH2 managed rack PDU.

fence_eps

An I/O Fencing agent that can be used with the ePowerSwitch 8M+ power switch to fence connected machines.

fence_evacuate

A resource that can be used to reschedule flagged instances.

fence_heuristics_ping

A resource that can be used with ping-heuristics to control execution of another fence agent on the same fencing level.

fence_ifmib

An I/O Fencing agent that can be used with any SNMP IF-MIB capable device.

fence_kdump

An I/O Fencing agent that can be used with the kdump crash recovery service.

fence_mpath

An I/O Fencing agent that can be used with SCSI-3 persistent reservations to control access multipath devices.

fence_rsb

An I/O Fencing agent that can be used with the Fujitsu-Siemens RSB management interface.

fence_sbd

An I/O Fencing agent that can be used in environments where sbd can be used (shared storage).

fence_scsi

An I/O Fencing agent that can be used with SCSI-3 persistent reservations to control access to shared storage devices.

fence_wti

An I/O Fencing agent that can be used with the WTI Network Power Switch (NPS).

3.8. Additional resources

Chapter 4. Using Machine Deletion Remediation

You can use the Machine Deletion Remediation Operator to reprovision unhealthy nodes using the Machine API. You can use the Machine Deletion Remediation Operator in conjunction with the Node Health Check Operator.

4.1. About the Machine Deletion Remediation Operator

The Machine Deletion Remediation (MDR) operator works with the NodeHealthCheck controller, to reprovision unhealthy nodes using the Machine API. MDR follows the annotation on the node to the associated machine object, confirms that it has an owning controller (for example, MachineSetController), and deletes it. Once the machine CR is deleted, the owning controller creates a replacement.

The prerequisites for MDR include:

  • a Machine API-based cluster that is able to programmatically destroy and create cluster nodes,
  • nodes that are associated with machines, and
  • declaratively managed machines.

You can then modify the NodeHealthCheck CR to use MDR as its remediator. An example MDR template object and NodeHealthCheck configuration are provided in the documentation.

The MDR process works as follows:

  • the Node Health Check Operator detects an unhealthy node and creates a MDR CR.
  • the MDR Operator watches for the MDR CR associated with the unhealthy node and deletes it, if the machine has an owning controller.
  • when the node is healthy again, the MDR CR is deleted by the NodeHealthCheck controller.

4.2. Installing the Machine Deletion Remediation Operator by using the web console

You can use the Red Hat OpenShift web console to install the Machine Deletion Remediation Operator.

Prerequisites

  • Log in as a user with cluster-admin privileges.

Procedure

  1. In the Red Hat OpenShift web console, navigate to OperatorsOperatorHub.
  2. Select the Machine Deletion Remediation Operator, or MDR, from the list of available Operators, and then click Install.
  3. Keep the default selection of Installation mode and namespace to ensure that the Operator is installed to the openshift-workload-availability namespace.
  4. Click Install.

Verification

To confirm that the installation is successful:

  1. Navigate to the OperatorsInstalled Operators page.
  2. Check that the Operator is installed in the openshift-workload-availability namespace and its status is Succeeded.

If the Operator is not installed successfully:

  1. Navigate to the OperatorsInstalled Operators page and inspect the Status column for any errors or failures.
  2. Navigate to the WorkloadsPods page and check the log of the pod in the openshift-workload-availability project for any reported issues.

4.3. Installing the Machine Deletion Remediation Operator by using the CLI

You can use the OpenShift CLI (oc) to install the Machine Deletion Remediation Operator.

You can install the Machine Deletion Remediation Operator in your own namespace or in the openshift-workload-availability namespace.

Prerequisites

  • Install the OpenShift CLI (oc).
  • Log in as a user with cluster-admin privileges.

Procedure

  1. Create a Namespace custom resource (CR) for the Machine Deletion Remediation Operator:

    1. Define the Namespace CR and save the YAML file, for example, workload-availability-namespace.yaml:

      apiVersion: v1
      kind: Namespace
      metadata:
        name: openshift-workload-availability
    2. To create the Namespace CR, run the following command:

      $ oc create -f workload-availability-namespace.yaml
  2. Create an OperatorGroup CR:

    1. Define the OperatorGroup CR and save the YAML file, for example, workload-availability-operator-group.yaml:

      apiVersion: operators.coreos.com/v1
      kind: OperatorGroup
      metadata:
        name: workload-availability-operator-group
        namespace: openshift-workload-availability
    2. To create the OperatorGroup CR, run the following command:

      $ oc create -f workload-availability-operator-group.yaml
  3. Create a Subscription CR:

    1. Define the Subscription CR and save the YAML file, for example, machine-deletion-remediation-subscription.yaml:

      apiVersion: operators.coreos.com/v1alpha1
      kind: Subscription
      metadata:
          name: machine-deletion-remediation-operator
          namespace: openshift-workload-availability 1
      spec:
          channel: stable
          name: machine-deletion-remediation-operator
          source: redhat-operators
          sourceNamespace: openshift-marketplace
          package: machine-deletion-remediation
      1
      Specify the Namespace where you want to install the Machine Deletion Remediation Operator. When installing the Machine Deletion Remediation Operator in the openshift-workload-availability Subscription CR, the Namespace and OperatorGroup CRs will already exist.
    2. To create the Subscription CR, run the following command:

      $ oc create -f machine-deletion-remediation-subscription.yaml

Verification

  1. Verify that the installation succeeded by inspecting the CSV resource:

    $ oc get csv -n openshift-workload-availability

    Example output

    NAME                               DISPLAY                          VERSION   REPLACES   PHASE
    machine-deletion-remediation.v0.3.0      Machine Deletion Remediation Operator   0.3.0   machine-deletion-remediation.v0.2.1           Succeeded

4.4. Configuring the Machine Deletion Remediation Operator

You can use the Machine Deletion Remediation Operator, with the Node Health Check Operator, to create the MachineDeletionRemediationTemplate Custom Resource (CR). This CR defines the remediation strategy for the nodes.

The MachineDeletionRemediationTemplate CR resembles the following YAML file:

apiVersion: machine-deletion-remediation.medik8s.io/v1alpha1
kind: MachineDeletionRemediationTemplate
metadata:
  name: machinedeletionremediationtemplate-sample
  namespace: openshift-workload-availability
spec:
  template:
    spec: {}

4.5. Troubleshooting the Machine Deletion Remediation Operator

4.5.1. General troubleshooting

Issue
You want to troubleshoot issues with the Machine Deletion Remediation Operator.
Resolution

Check the Operator logs.

$ oc logs <machine-deletion-remediation-controller-manager-name> -c manager -n <namespace-name>

4.5.2. Unsuccessful remediation

Issue
An unhealthy node was not remediated.
Resolution

Verify that the MachineDeletionRemediation CR was created by running the following command:

$ oc get mdr -A

If the NodeHealthCheck controller did not create the MachineDeletionRemediation CR when the node turned unhealthy, check the logs of the NodeHealthCheck controller. Additionally, ensure that the NodeHealthCheck CR includes the required specification to use the remediation template.

If the MachineDeletionRemediation CR was created, ensure that its name matches the unhealthy node object.

4.5.3. Machine Deletion Remediation Operator resources exist even after uninstalling the Operator

Issue
The Machine Deletion Remediation Operator resources, such as the remediation CR and the remediation template CR, exist even after uninstalling the Operator.
Resolution

To remove the Machine Deletion Remediation Operator resources, you can delete the resources by selecting the Delete all operand instances for this operator checkbox before uninstalling. This checkbox feature is only available in Red Hat OpenShift since version 4.13. For all versions of Red Hat OpenShift, you can delete the resources by running the following relevant command for each resource type:

$ oc delete mdr <machine-deletion-remediation> -n <namespace>
$ oc delete mdrt <machine-deletion-remediation-template> -n <namespace>

The remediation CR mdr must be created and deleted by the same entity, for example, NHC. If the remediation CR mdr is still present, it is deleted, together with the MDR operator.

The remediation template CR mdrt only exists if you use MDR with NHC. When the MDR operator is deleted using the web console, the remediation template CR mdrt is also deleted.

4.6. Gathering data about the Machine Deletion Remediation Operator

To collect debugging information about the Machine Deletion Remediation Operator, use the must-gather tool. For information about the must-gather image for the Machine Deletion Remediation Operator, see Gathering data about specific features.

4.7. Additional resources

Chapter 5. Remediating nodes with Machine Health Checks

Machine health checks automatically repair unhealthy machines in a particular machine pool.

5.1. About machine health checks

Note

You can only apply a machine health check to control plane machines on clusters that use control plane machine sets.

To monitor machine health, create a resource to define the configuration for a controller. Set a condition to check, such as staying in the NotReady status for five minutes or displaying a permanent condition in the node-problem-detector, and a label for the set of machines to monitor.

The controller that observes a MachineHealthCheck resource checks for the defined condition. If a machine fails the health check, the machine is automatically deleted and one is created to take its place. When a machine is deleted, you see a machine deleted event.

To limit disruptive impact of the machine deletion, the controller drains and deletes only one node at a time. If there are more unhealthy machines than the maxUnhealthy threshold allows for in the targeted pool of machines, remediation stops and therefore enables manual intervention.

Note

Consider the timeouts carefully, accounting for workloads and requirements.

  • Long timeouts can result in long periods of downtime for the workload on the unhealthy machine.
  • Too short timeouts can result in a remediation loop. For example, the timeout for checking the NotReady status must be long enough to allow the machine to complete the startup process.

To stop the check, remove the resource.

5.1.1. Limitations when deploying machine health checks

There are limitations to consider before deploying a machine health check:

  • Only machines owned by a machine set are remediated by a machine health check.
  • If the node for a machine is removed from the cluster, a machine health check considers the machine to be unhealthy and remediates it immediately.
  • If the corresponding node for a machine does not join the cluster after the nodeStartupTimeout, the machine is remediated.
  • A machine is remediated immediately if the Machine resource phase is Failed.

5.2. Configuring machine health checks to use the Self Node Remediation Operator

Use the following procedure to configure the worker or control-plane machine health checks to use the Self Node Remediation Operator as a remediation provider.

Note

To use the Self Node Remediation Operator as a remediation provider for machine health checks, a machine must have an associated node in the cluster.

Prerequisites

  • Install the OpenShift CLI (oc).
  • Log in as a user with cluster-admin privileges.

Procedure

  1. Create a SelfNodeRemediationTemplate CR:

    1. Define the SelfNodeRemediationTemplate CR:

      apiVersion: self-node-remediation.medik8s.io/v1alpha1
      kind: SelfNodeRemediationTemplate
      metadata:
        namespace: openshift-machine-api
        name: selfnoderemediationtemplate-sample
      spec:
        template:
          spec:
            remediationStrategy: Automatic 1
      1
      Specifies the remediation strategy. The default remediation strategy is Automatic.
    2. To create the SelfNodeRemediationTemplate CR, run the following command:

      $ oc create -f <snrt-name>.yaml
  2. Create or update the MachineHealthCheck CR to point to the SelfNodeRemediationTemplate CR:

    1. Define or update the MachineHealthCheck CR:

      apiVersion: machine.openshift.io/v1beta1
      kind: MachineHealthCheck
      metadata:
        name: machine-health-check
        namespace: openshift-machine-api
      spec:
        selector:
          matchLabels: 1
            machine.openshift.io/cluster-api-machine-role: "worker"
            machine.openshift.io/cluster-api-machine-type: "worker"
        unhealthyConditions:
        - type:    "Ready"
          timeout: "300s"
          status: "False"
        - type:    "Ready"
          timeout: "300s"
          status: "Unknown"
        maxUnhealthy: "40%"
        nodeStartupTimeout: "10m"
        remediationTemplate: 2
          kind: SelfNodeRemediationTemplate
          apiVersion: self-node-remediation.medik8s.io/v1alpha1
          name: selfnoderemediationtemplate-sample
      1
      Selects whether the machine health check is for worker or control-plane nodes. The label can also be user-defined.
      2
      Specifies the details for the remediation template.
    2. To create a MachineHealthCheck CR, run the following command:

      $ oc create -f <mhc-name>.yaml
    3. To update a MachineHealthCheck CR, run the following command:

      $ oc apply -f <mhc-name>.yaml

Chapter 6. Remediating nodes with Node Health Checks

You can use the Node Health Check Operator to identify unhealthy nodes. The Operator then uses other operators to remediate the unhealthy nodes. The Node Health Check Operator installs the Self Node Remediation Operator as a default remediation provider.

For more information on the Self Node Remediation Operator, see the Using Self Node Remediation chapter.

The Node Health Check Operator can also be used with other remediation providers, including:

Note

Due to the existence of preinstalled machine health checks on Red Hat OpenShift Service on AWS (ROSA) clusters, the Node Health Check Operator is unable to function in such an environment.

6.1. About the Node Health Check Operator

The Node Health Check Operator detects the health of the nodes in a cluster. The NodeHealthCheck controller creates the NodeHealthCheck custom resource (CR), which defines a set of criteria and thresholds to determine the health of a node.

When the Node Health Check Operator detects an unhealthy node, it creates a remediation CR that triggers the remediation provider. For example, the controller creates the SelfNodeRemediation CR, which triggers the Self Node Remediation Operator to remediate the unhealthy node.

The NodeHealthCheck CR resembles the following YAML file, with self-node-remediation as the remediation provider:

apiVersion: remediation.medik8s.io/v1alpha1
kind: NodeHealthCheck
metadata:
  name: nodehealthcheck-sample
spec:
  minHealthy: 51% 1
  pauseRequests: 2
    - <pause-test-cluster>
  remediationTemplate: 3
    apiVersion: self-node-remediation.medik8s.io/v1alpha1
    name: self-node-remediation-resource-deletion-template
    namespace: openshift-workload-availability
    kind: SelfNodeRemediationTemplate
  escalatingRemediations: 4
    - remediationTemplate:
        apiVersion: self-node-remediation.medik8s.io/v1alpha1
        name: self-node-remediation-resource-deletion-template
        namespace: openshift-workload-availability
        kind: SelfNodeRemediationTemplate
    order: 1
    timeout: 300s
  selector: 5
    matchExpressions:
      - key: node-role.kubernetes.io/worker
        operator: Exists
  unhealthyConditions: 6
    - type: Ready
      status: "False"
      duration: 300s 7
    - type: Ready
      status: Unknown
      duration: 300s 8
1
Specifies the amount of healthy nodes(in percentage or number) required for a remediation provider to concurrently remediate nodes in the targeted pool. If the number of healthy nodes equals to or exceeds the limit set by minHealthy, remediation occurs. The default value is 51%.
2
Prevents any new remediation from starting, while allowing any ongoing remediations to persist. The default value is empty. However, you can enter an array of strings that identify the cause of pausing the remediation. For example, pause-test-cluster.
Note

During the upgrade process, nodes in the cluster might become temporarily unavailable and get identified as unhealthy. In the case of worker nodes, when the Operator detects that the cluster is upgrading, it stops remediating new unhealthy nodes to prevent such nodes from rebooting.

3
Specifies a remediation template from the remediation provider. For example, from the Self Node Remediation Operator. remediationTemplate is mutually exclusive with escalatingRemediations.
4
Specifies a list of RemediationTemplates with order and timeout fields. To obtain a healthy node, use this field to sequence and configure multiple remediations. This strategy increases the likelihood of obtaining a healthy node, instead of depending on a single remediation that might not be successful. The order field determines the order in which the remediations are invoked (lower order = earlier invocation). The timeout field determines when the next remediation is invoked. escalatingRemediations is mutually exclusive with remediationTemplate.
Note

When escalatingRemediations is used the remediation providers, Self Node Remediation Operator and Fence Agents Remediation Operator, can be used multiple times with different remediationTemplate configurations. However, you can not use the same Machine Deletion Remediation configuration with different remediationTemplate configurations.

5
Specifies a selector that matches labels or expressions that you want to check. Avoid selecting both control-plane and worker nodes in one CR.
6
Specifies a list of the conditions that determine whether a node is considered unhealthy.
7 8
Specifies the timeout duration for a node condition. If a condition is met for the duration of the timeout, the node will be remediated. Long timeouts can result in long periods of downtime for a workload on an unhealthy node.

The NodeHealthCheck CR resembles the following YAML file, with metal3 as the remediation provider:

apiVersion: remediation.medik8s.io/v1alpha1
kind: NodeHealthCheck
metadata:
  name: nhc-worker-metal3
spec:
  minHealthy: 30%
  remediationTemplate:
    apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
    kind: Metal3RemediationTemplate
    name: metal3-remediation
    namespace: openshift-machine-api
  selector:
    matchExpressions:
    - key: node-role.kubernetes.io/worker
      operator: Exists
  unhealthyConditions:
  - duration: 300s
    status: 'False'
    type: Ready
  - duration: 300s
    status: 'Unknown'
    type: Ready
Note

The matchExpressions are examples only; you must map your machine groups based on your specific needs.

The Metal3RemediationTemplate resembles the following YAML file, with metal3 as the remediation provider:

apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
kind: Metal3RemediationTemplate
metadata:
  name: metal3-remediation
  namespace: openshift-machine-api
spec:
  template:
    spec:
      strategy:
        retryLimit: 1
        timeout: 5m0s
        type: Reboot
Note

In addition to creating a NodeHealthCheck CR, you must also create the Metal3RemediationTemplate.

6.1.1. Understanding the Node Health Check Operator workflow

When a node is identified as unhealthy, the Node Health Check Operator checks how many other nodes are unhealthy. If the number of healthy nodes exceeds the amount that is specified in the minHealthy field of the NodeHealthCheck CR, the controller creates a remediation CR from the details that are provided in the external remediation template by the remediation provider. After remediation, the kubelet updates the node’s health status.

When the node turns healthy, the controller deletes the external remediation template.

6.1.2. About how node health checks prevent conflicts with machine health checks

When both, node health checks and machine health checks are deployed, the node health check avoids conflict with the machine health check.

Note

Red Hat OpenShift deploys machine-api-termination-handler as the default MachineHealthCheck resource.

The following list summarizes the system behavior when node health checks and machine health checks are deployed:

  • If only the default machine health check exists, the node health check continues to identify unhealthy nodes. However, the node health check ignores unhealthy nodes in a Terminating state. The default machine health check handles the unhealthy nodes with a Terminating state.

    Example log message

    INFO MHCChecker	ignoring unhealthy Node, it is terminating and will be handled by MHC	{"NodeName": "node-1.example.com"}

  • If the default machine health check is modified (for example, the unhealthyConditions is Ready), or if additional machine health checks are created, the node health check is disabled.

    Example log message

    INFO controllers.NodeHealthCheck disabling NHC in order to avoid conflict with custom MHCs configured in the cluster {"NodeHealthCheck": "/nhc-worker-default"}

  • When, again, only the default machine health check exists, the node health check is re-enabled.

    Example log message

    INFO controllers.NodeHealthCheck re-enabling NHC, no conflicting MHC configured in the cluster {"NodeHealthCheck": "/nhc-worker-default"}

6.2. Control plane fencing

In earlier releases, you could enable Self Node Remediation and Node Health Check on worker nodes. In the event of node failure, you can now also follow remediation strategies on control plane nodes.

Do not use the same NodeHealthCheck CR for worker nodes and control plane nodes. Grouping worker nodes and control plane nodes together can result in incorrect evaluation of the minimum healthy node count, and cause unexpected or missing remediations. This is because of the way the Node Health Check Operator handles control plane nodes. You should group the control plane nodes in their own group and the worker nodes in their own group. If required, you can also create multiple groups of worker nodes.

Considerations for remediation strategies:

  • Avoid Node Health Check configurations that involve multiple configurations overlapping the same nodes because they can result in unexpected behavior. This suggestion applies to both worker and control plane nodes.
  • The Node Health Check Operator implements a hardcoded limitation of remediating a maximum of one control plane node at a time. Multiple control plane nodes should not be remediated at the same time.

6.3. Installing the Node Health Check Operator by using the web console

You can use the Red Hat OpenShift web console to install the Node Health Check Operator.

Prerequisites

  • Log in as a user with cluster-admin privileges.

Procedure

  1. In the Red Hat OpenShift web console, navigate to OperatorsOperatorHub.
  2. Select the Node Health Check Operator, then click Install.
  3. Keep the default selection of Installation mode and namespace to ensure that the Operator will be installed to the openshift-workload-availability namespace.
  4. Ensure that the Console plug-in is set to Enable.
  5. Click Install.

Verification

To confirm that the installation is successful:

  1. Navigate to the OperatorsInstalled Operators page.
  2. Check that the Operator is installed in the openshift-workload-availability namespace and that its status is Succeeded.

If the Operator is not installed successfully:

  1. Navigate to the OperatorsInstalled Operators page and inspect the Status column for any errors or failures.
  2. Navigate to the WorkloadsPods page and check the logs in any pods in the openshift-workload-availability project that are reporting issues.

6.4. Installing the Node Health Check Operator by using the CLI

You can use the OpenShift CLI (oc) to install the Node Health Check Operator.

You can install the Node Health Check Operator in your own namespace or in the openshift-workload-availability namespace.

Prerequisites

  • Install the OpenShift CLI (oc).
  • Log in as a user with cluster-admin privileges.

Procedure

  1. Create a Namespace custom resource (CR) for the Node Health Check Operator:

    1. Define the Namespace CR and save the YAML file, for example, node-health-check-namespace.yaml:

      apiVersion: v1
      kind: Namespace
      metadata:
        name: openshift-workload-availability
    2. To create the Namespace CR, run the following command:

      $ oc create -f node-health-check-namespace.yaml
  2. Create an OperatorGroup CR:

    1. Define the OperatorGroup CR and save the YAML file, for example, workload-availability-operator-group.yaml:

      apiVersion: operators.coreos.com/v1
      kind: OperatorGroup
      metadata:
        name: workload-availability-operator-group
        namespace: openshift-workload-availability
    2. To create the OperatorGroup CR, run the following command:

      $ oc create -f workload-availability-operator-group.yaml
  3. Create a Subscription CR:

    1. Define the Subscription CR and save the YAML file, for example, node-health-check-subscription.yaml:

      apiVersion: operators.coreos.com/v1alpha1
      kind: Subscription
      metadata:
          name: node-health-check-operator
          namespace: openshift-workload-availability 1
      spec:
          channel: stable 2
          installPlanApproval: Manual 3
          name: node-healthcheck-operator
          source: redhat-operators
          sourceNamespace: openshift-marketplace
          package: node-healthcheck-operator
      1
      Specify the Namespace where you want to install the Node Health Check Operator. To install the Node Health Check Operator in the openshift-workload-availability namespace, specify openshift-workload-availability in the Subscription CR.
      2
      Specify the channel name for your subscription. To upgrade to the latest version of the Node Health Check Operator, you must manually change the channel name for your subscription from candidate to stable.
      3
      Set the approval strategy to Manual in case your specified version is superseded by a later version in the catalog. This plan prevents an automatic upgrade to a later version and requires manual approval before the starting CSV can complete the installation.
    2. To create the Subscription CR, run the following command:

      $ oc create -f node-health-check-subscription.yaml

Verification

  1. Verify that the installation succeeded by inspecting the CSV resource:

    $ oc get csv -n openshift-workload-availability

    Example output

    NAME                              DISPLAY                     VERSION  REPLACES PHASE
    node-healthcheck-operator.v0.7.0  Node Health Check Operator  0.7.0   node-healthcheck-operator.v0.6.1           Succeeded

  2. Verify that the Node Health Check Operator is up and running:

    $ oc get deployment -n openshift-workload-availability

    Example output

    NAME                                           READY   UP-TO-DATE   AVAILABLE   AGE
    node-healthcheck-controller-manager            2/2     2            2           10d

6.5. Creating a node health check

Using the web console, you can create a node health check to identify unhealthy nodes and specify the remediation type and strategy to fix them.

Procedure

  1. From the Administrator perspective of the Red Hat OpenShift web console, click ComputeNodeHealthChecksCreateNodeHealthCheck.
  2. Specify whether to configure the node health check using the Form view or the YAML view.
  3. Enter a Name for the node health check. The name must consist of lower case, alphanumeric characters, '-' or '.', and must start and end with an alphanumeric character.
  4. Specify the Remediator type, and Self node remediation or Other. The Self node remediation option is part of the Self Node Remediation Operator that is installed with the Node Health Check Operator. Selecting Other requires an API version, Kind, Name, and Namespace to be entered, which then points to the remediation template resource of a remediator.
  5. Make a Nodes selection by specifying the labels of the nodes you want to remediate. The selection matches labels that you want to check. If more than one label is specified, the nodes must contain each label. The default value is empty, which selects both worker and control-plane nodes.

    Note

    When creating a node health check with the Self Node Remediation Operator, you must select either node-role.kubernetes.io/worker or node-role.kubernetes.io/control-plane as the value.

  6. Specify the minimum number of healthy nodes, using either a percentage or a number, required for a NodeHealthCheck to remediate nodes in the targeted pool. If the number of healthy nodes equals to or exceeds the limit set by Min healthy, remediation occurs. The default value is 51%.
  7. Specify a list of Unhealthy conditions that if a node meets determines whether the node is considered unhealthy, and requires remediation. You can specify the Type, Status and Duration. You can also create your own custom type.
  8. Click Create to create the node health check.

Verification

  • Navigate to the ComputeNodeHealthCheck page and verify that the corresponding node health check is listed, and their status displayed. Once created, node health checks can be paused, modified, and deleted.

6.6. Gathering data about the Node Health Check Operator

To collect debugging information about the Node Health Check Operator, use the must-gather tool. For information about the must-gather image for the Node Health Check Operator, see Gathering data about specific features.

6.7. Additional resources

Chapter 7. Placing nodes in maintenance mode with Node Maintenance Operator

You can use the Node Maintenance Operator to place nodes in maintenance mode by using the oc adm utility or NodeMaintenance custom resources (CRs).

7.1. About the Node Maintenance Operator

The Node Maintenance Operator watches for new or deleted NodeMaintenance CRs. When a new NodeMaintenance CR is detected, no new workloads are scheduled and the node is cordoned off from the rest of the cluster. All pods that can be evicted are evicted from the node. When a NodeMaintenance CR is deleted, the node that is referenced in the CR is made available for new workloads.

Note

Using a NodeMaintenance CR for node maintenance tasks achieves the same results as the oc adm cordon and oc adm drain commands using standard Red Hat OpenShift CR processing.

7.2. Installing the Node Maintenance Operator

You can install the Node Maintenance Operator using the web console or the OpenShift CLI (oc).

Note

If OpenShift Virtualization version 4.10 or less is installed in your cluster, it includes an outdated version of the Node Maintenance Operator.

7.2.1. Installing the Node Maintenance Operator by using the web console

You can use the Red Hat OpenShift web console to install the Node Maintenance Operator.

Prerequisites

  • Log in as a user with cluster-admin privileges.

Procedure

  1. In the Red Hat OpenShift web console, navigate to OperatorsOperatorHub.
  2. Select the Node Maintenance Operator, then click Install.
  3. Keep the default selection of Installation mode and namespace to ensure that the Operator will be installed to the openshift-workload-availability namespace.
  4. Click Install.

Verification

To confirm that the installation is successful:

  1. Navigate to the OperatorsInstalled Operators page.
  2. Check that the Operator is installed in the openshift-workload-availability namespace and that its status is Succeeded.

If the Operator is not installed successfully:

  1. Navigate to the OperatorsInstalled Operators page and inspect the Status column for any errors or failures.
  2. Navigate to the OperatorsInstalled OperatorsNode Maintenance OperatorDetails page, and inspect the Conditions section for errors before pod creation.
  3. Navigate to the WorkloadsPods page, search for the Node Maintenance Operator pod in the installed namespace, and check the logs in the Logs tab.

7.2.2. Installing the Node Maintenance Operator by using the CLI

You can use the OpenShift CLI (oc) to install the Node Maintenance Operator.

You can install the Node Maintenance Operator in your own namespace or in the openshift-workload-availability namespace.

Prerequisites

  • Install the OpenShift CLI (oc).
  • Log in as a user with cluster-admin privileges.

Procedure

  1. Create a Namespace CR for the Node Maintenance Operator:

    1. Define the Namespace CR and save the YAML file, for example, workload-availability-namespace.yaml:

      apiVersion: v1
      kind: Namespace
      metadata:
        name: openshift-workload-availability
    2. To create the Namespace CR, run the following command:

      $ oc create -f workload-availability-namespace.yaml
  2. Create an OperatorGroup CR:

    1. Define the OperatorGroup CR and save the YAML file, for example, workload-availability-operator-group.yaml:

      apiVersion: operators.coreos.com/v1
      kind: OperatorGroup
      metadata:
        name: workload-availability-operator-group
        namespace: openshift-workload-availability
    2. To create the OperatorGroup CR, run the following command:

      $ oc create -f workload-availability-operator-group.yaml
  3. Create a Subscription CR:

    1. Define the Subscription CR and save the YAML file, for example, node-maintenance-subscription.yaml:

      apiVersion: operators.coreos.com/v1alpha1
      kind: Subscription
      metadata:
        name: node-maintenance-operator
        namespace: openshift-workload-availability 1
      spec:
        channel: stable
        installPlanApproval: Automatic
        name: node-maintenance-operator
        source: redhat-operators
        sourceNamespace: openshift-marketplace
        package: node-maintenance-operator
      1
      Specify the Namespace where you want to install the Node Maintenance Operator.
      Important

      To install the Node Maintenance Operator in the openshift-workload-availability namespace, specify openshift-workload-availability in the Subscription CR.

    2. To create the Subscription CR, run the following command:

      $ oc create -f node-maintenance-subscription.yaml

Verification

  1. Verify that the installation succeeded by inspecting the CSV resource:

    $ oc get csv -n openshift-workload-availability

    Example output

    NAME                               DISPLAY                     VERSION   REPLACES  PHASE
    node-maintenance-operator.v5.3.0   Node Maintenance Operator   5.3.0   node-maintenance-operator.v5.2.1            Succeeded

  2. Verify that the Node Maintenance Operator is running:

    $ oc get deployment -n openshift-workload-availability

    Example output

    NAME                                           READY   UP-TO-DATE   AVAILABLE   AGE
    node-maintenance-operator-controller-manager   1/1     1            1           10d

The Node Maintenance Operator is supported in a restricted network environment. For more information, see Using Operator Lifecycle Manager on restricted networks.

7.3. Setting a node to maintenance mode

You can place a node into maintenance mode from the web console or from the CLI by using a NodeMaintenance CR.

7.3.1. Setting a node to maintenance mode by using the web console

To set a node to maintenance mode, you can create a NodeMaintenance custom resource (CR) by using the web console.

Prerequisites

  • Log in as a user with cluster-admin privileges.
  • Install the Node Maintenance Operator from the OperatorHub.

Procedure

  1. From the Administrator perspective in the web console, navigate to OperatorsInstalled Operators.
  2. Select the Node Maintenance Operator from the list of Operators.
  3. In the Node Maintenance tab, click Create NodeMaintenance.
  4. In the Create NodeMaintenance page, select the Form view or the YAML view to configure the NodeMaintenance CR.
  5. To apply the NodeMaintenance CR that you have configured, click Create.

Verification

In the Node Maintenance tab, inspect the Status column and verify that its status is Succeeded.

7.3.2. Setting a node to maintenance mode by using the CLI

You can put a node into maintenance mode with a NodeMaintenance custom resource (CR). When you apply a NodeMaintenance CR, all allowed pods are evicted and the node is rendered unschedulable. Evicted pods are queued to be moved to another node in the cluster.

Prerequisites

  • Install the Red Hat OpenShift CLI oc.
  • Log in to the cluster as a user with cluster-admin privileges.

Procedure

  1. Create the following NodeMaintenance CR, and save the file as nodemaintenance-cr.yaml:

    apiVersion: nodemaintenance.medik8s.io/v1beta1
    kind: NodeMaintenance
    metadata:
      name: nodemaintenance-cr  1
    spec:
      nodeName: node-1.example.com 2
      reason: "NIC replacement" 3
    1
    The name of the node maintenance CR.
    2
    The name of the node to be put into maintenance mode.
    3
    A plain text description of the reason for maintenance.
  2. Apply the node maintenance CR by running the following command:

    $ oc apply -f nodemaintenance-cr.yaml

Verification

  1. Check the progress of the maintenance task by running the following command:

    $ oc describe node <node-name>

    where <node-name> is the name of your node; for example, node-1.example.com

  2. Check the example output:

    Events:
      Type     Reason                     Age                   From     Message
      ----     ------                     ----                  ----     -------
      Normal   NodeNotSchedulable         61m                   kubelet  Node node-1.example.com status is now: NodeNotSchedulable

7.3.3. Checking status of current NodeMaintenance CR tasks

You can check the status of current NodeMaintenance CR tasks.

Prerequisites

  • Install the Red Hat OpenShift CLI oc.
  • Log in as a user with cluster-admin privileges.

Procedure

  • Check the status of current node maintenance tasks, for example the NodeMaintenance CR or nm object, by running the following command:

    $ oc get nm -o yaml

    Example output

    apiVersion: v1
    items:
    - apiVersion: nodemaintenance.medik8s.io/v1beta1
      kind: NodeMaintenance
      metadata:
    ...
      spec:
        nodeName: node-1.example.com
        reason: Node maintenance
      status:
        drainProgress: 100   1
        evictionPods: 3   2
        lastError: "Last failure message" 3
        lastUpdate: "2022-06-23T11:43:18Z" 4
        phase: Succeeded
        totalpods: 5 5
    ...

    1
    The percentage completion of draining the node.
    2
    The number of pods scheduled for eviction.
    3
    The latest eviction error, if any.
    4
    The last time the status was updated.
    5
    The total number of pods before the node entered maintenance mode.

7.4. Resuming a node from maintenance mode

You can resume a node from maintenance mode from the web console or from the CLI by using a NodeMaintenance CR. Resuming a node brings it out of maintenance mode and makes it schedulable again.

7.4.1. Resuming a node from maintenance mode by using the web console

To resume a node from maintenance mode, you can delete a NodeMaintenance custom resource (CR) by using the web console.

Prerequisites

  • Log in as a user with cluster-admin privileges.
  • Install the Node Maintenance Operator from the OperatorHub.

Procedure

  1. From the Administrator perspective in the web console, navigate to OperatorsInstalled Operators.
  2. Select the Node Maintenance Operator from the list of Operators.
  3. In the Node Maintenance tab, select the NodeMaintenance CR that you want to delete.
  4. Click the Options menu kebab at the end of the node and select Delete NodeMaintenance.

Verification

  1. In the Red Hat OpenShift console, click Compute → Nodes.
  2. Inspect the Status column of the node for which you deleted the NodeMaintenance CR and verify that its status is Ready.

7.4.2. Resuming a node from maintenance mode by using the CLI

You can resume a node from maintenance mode that was initiated with a NodeMaintenance CR by deleting the NodeMaintenance CR.

Prerequisites

  • Install the Red Hat OpenShift CLI oc.
  • Log in to the cluster as a user with cluster-admin privileges.

Procedure

  • When your node maintenance task is complete, delete the active NodeMaintenance CR:

    $ oc delete -f nodemaintenance-cr.yaml

    Example output

    nodemaintenance.nodemaintenance.medik8s.io "maintenance-example" deleted

Verification

  1. Check the progress of the maintenance task by running the following command:

    $ oc describe node <node-name>

    where <node-name> is the name of your node; for example, node-1.example.com

  2. Check the example output:

    Events:
      Type     Reason                  Age                   From     Message
      ----     ------                  ----                  ----     -------
      Normal   NodeSchedulable         2m                    kubelet  Node node-1.example.com status is now: NodeSchedulable

7.5. Working with bare-metal nodes

For clusters with bare-metal nodes, you can place a node into maintenance mode, and resume a node from maintenance mode, by using the web console Actions control.

Note

Clusters with bare-metal nodes can also place a node into maintenance mode, and resume a node from maintenance mode, by using the web console and CLI, as outlined. These methods, by using the web console Actions control, are applicable to bare-metal clusters only.

7.5.1. Maintaining bare-metal nodes

When you deploy Red Hat OpenShift on bare-metal infrastructure, you must take additional considerations into account compared to deploying on cloud infrastructure. Unlike in cloud environments, where the cluster nodes are considered ephemeral, reprovisioning a bare-metal node requires significantly more time and effort for maintenance tasks.

When a bare-metal node fails due to a kernel error or a NIC card hardware failure, workloads on the failed node need to be restarted on another node in the cluster while the problem node is repaired or replaced. Node maintenance mode allows cluster administrators to gracefully turn-off nodes, move workloads to other parts of the cluster, and ensure that workloads do not get interrupted. Detailed progress and node status details are provided during maintenance.

7.5.2. Setting a bare-metal node to maintenance mode

Set a bare-metal node to maintenance mode using the Options menu kebab found on each node in the ComputeNodes list, or using the Actions control of the Node Details screen.

Procedure

  1. From the Administrator perspective of the web console, click ComputeNodes.
  2. You can set the node to maintenance from this screen, which makes it easier to perform actions on multiple nodes, or from the Node Details screen, where you can view comprehensive details of the selected node:

    • Click the Options menu kebab at the end of the node and select Start Maintenance.
    • Click the node name to open the Node Details screen and click ActionsStart Maintenance.
  3. Click Start Maintenance in the confirmation window.

The node is no longer schedulable. If it had virtual machines with the LiveMigration eviction strategy, then it will live migrate them. All other pods and virtual machines on the node are deleted and recreated on another node.

Verification

  • Navigate to the ComputeNodes page and verify that the corresponding node has a status of Under maintenance.

7.5.3. Resuming a bare-metal node from maintenance mode

Resume a bare-metal node from maintenance mode using the Options menu kebab found on each node in the ComputeNodes list, or using the Actions control of the Node Details screen.

Procedure

  1. From the Administrator perspective of the web console, click ComputeNodes.
  2. You can resume the node from this screen, which makes it easier to perform actions on multiple nodes, or from the Node Details screen, where you can view comprehensive details of the selected node:

    • Click the Options menu kebab at the end of the node and select Stop Maintenance.
    • Click the node name to open the Node Details screen and click ActionsStop Maintenance.
  3. Click Stop Maintenance in the confirmation window.

The node becomes schedulable. If it had virtual machine instances that were running on the node prior to maintenance, then they will not automatically migrate back to this node.

Verification

  • Navigate to the ComputeNodes page and verify that the corresponding node has a status of Ready.

7.6. Gathering data about the Node Maintenance Operator

To collect debugging information about the Node Maintenance Operator, use the must-gather tool. For information about the must-gather image for the Node Maintenance Operator, see Gathering data about specific features.

7.7. Additional resources

Legal Notice

Copyright © 2024 Red Hat, Inc.
The text of and illustrations in this document are licensed by Red Hat under a Creative Commons Attribution–Share Alike 3.0 Unported license ("CC-BY-SA"). An explanation of CC-BY-SA is available at http://creativecommons.org/licenses/by-sa/3.0/. In accordance with CC-BY-SA, if you distribute this document or an adaptation of it, you must provide the URL for the original version.
Red Hat, as the licensor of this document, waives the right to enforce, and agrees not to assert, Section 4d of CC-BY-SA to the fullest extent permitted by applicable law.
Red Hat, Red Hat Enterprise Linux, the Shadowman logo, the Red Hat logo, JBoss, OpenShift, Fedora, the Infinity logo, and RHCE are trademarks of Red Hat, Inc., registered in the United States and other countries.
Linux® is the registered trademark of Linus Torvalds in the United States and other countries.
Java® is a registered trademark of Oracle and/or its affiliates.
XFS® is a trademark of Silicon Graphics International Corp. or its subsidiaries in the United States and/or other countries.
MySQL® is a registered trademark of MySQL AB in the United States, the European Union and other countries.
Node.js® is an official trademark of Joyent. Red Hat is not formally related to or endorsed by the official Joyent Node.js open source or commercial project.
The OpenStack® Word Mark and OpenStack logo are either registered trademarks/service marks or trademarks/service marks of the OpenStack Foundation, in the United States and other countries and are used with the OpenStack Foundation's permission. We are not affiliated with, endorsed or sponsored by the OpenStack Foundation, or the OpenStack community.
All other trademarks are the property of their respective owners.
Red Hat logoGithubRedditYoutubeTwitter

Learn

Try, buy, & sell

Communities

About Red Hat Documentation

We help Red Hat users innovate and achieve their goals with our products and services with content they can trust. Explore our recent updates.

Making open source more inclusive

Red Hat is committed to replacing problematic language in our code, documentation, and web properties. For more details, see the Red Hat Blog.

About Red Hat

We deliver hardened solutions that make it easier for enterprises to work across platforms and environments, from the core datacenter to the network edge.

© 2024 Red Hat, Inc.