Remediation, fencing, and maintenance
Workload Availability remediation, fencing, and maintenance
Abstract
Preface
Providing feedback on Workload Availability for Red Hat OpenShift documentation
We appreciate your feedback on our documentation. Let us know how we can improve it. To do so:
- Go to the JIRA website.
- Enter a descriptive title in the Summary field.
- Enter your suggestion for improvement in the Description field. Include links to the relevant parts of the documentation.
- Enter your username in the Reporter field.
- Enter the affected versions in the Affects Version/s field.
- Click Create at the bottom of the dialog.
Chapter 1. About node remediation, fencing, and maintenance
Hardware is imperfect and software contains bugs. When node-level failures, such as the kernel hangs or network interface controllers (NICs) fail, the work required from the cluster does not decrease, and workloads from affected nodes need to be restarted somewhere. However, some workloads, such as ReadWriteOnce (RWO) volumes and StatefulSets, might require at-most-one semantics.
Failures affecting these workloads risk data loss, corruption, or both. It is important to ensure that the node reaches a safe state, known as fencing
before initiating recovery of the workload, known as remediation
and ideally, recovery of the node also.
It is not always practical to depend on administrator intervention to confirm the true status of the nodes and workloads. To facilitate such intervention, Red Hat OpenShift provides multiple components for the automation of failure detection, fencing and remediation.
1.1. Self Node Remediation
The Self Node Remediation Operator is a Red Hat OpenShift add-on Operator that implements an external system of fencing and remediation that reboots unhealthy nodes and deletes resources, such as Pods and VolumeAttachments. The reboot ensures that the workloads are fenced, and the resource deletion accelerates the rescheduling of affected workloads. Unlike other external systems, Self Node Remediation does not require any management interface, like, for example, Intelligent Platform Management Interface (IPMI) or an API for node provisioning.
Self Node Remediation can be used by failure detection systems, like Machine Health Check or Node Health Check.
1.2. Fence Agents Remediation
The Fence Agents Remediation (FAR) Operator is a Red Hat OpenShift add-on operator that automatically remediates unhealthy nodes, similar to the Self Node Remediation Operator. You can use well-known agents to fence and remediate unhealthy nodes. The remediation includes rebooting the unhealthy node using a fence agent, and then evicting workloads from the unhealthy node, depending on the remediation strategy.
1.3. Machine Deletion Remediation
The Machine Deletion Remediation (MDR) Operator is a Red Hat OpenShift add-on Operator that uses the Machine API to reprovision unhealthy nodes. MDR works with NodeHealthCheck (NHC) to create a Custom Resource (CR) for MDR with information about the unhealthy node.
MDR follows the annotation on the node to the associated machine object and confirms that it has an owning controller. MDR proceeds to delete the machine, and then the owning controller recreates a replacement machine.
1.4. Machine Health Check
Machine Health Check utilizes a Red Hat OpenShift built-in failure detection, fencing and remediation system, which monitors the status of machines and the conditions of nodes. Machine Health Checks can be configured to trigger external fencing and remediation systems, like Self Node Remediation.
1.5. Node Health Check
The Node Health Check Operator is a Red Hat OpenShift add-on Operator that implements a failure detection system that monitors node conditions. It does not have a built-in fencing or remediation system and so must be configured with an external system that provides these features. By default, it is configured to utilize the Self Node Remediation system.
1.6. Node Maintenance
Administrators face situations where they need to interrupt the cluster, for example, replace a drive, RAM, or a NIC.
In advance of this maintenance, affected nodes should be cordoned and drained. When a node is cordoned, new workloads cannot be scheduled on that node. When a node is drained, to avoid or minimize downtime, workloads on the affected node are transferred to other nodes.
While this maintenance can be achieved using command line tools, the Node Maintenance Operator offers a declarative approach to achieve this by using a custom resource. When such a resource exists for a node, the Operator cordons and drains the node until the resource is deleted.
1.7. Flow of events during fencing and remediation
When a node becomes unhealthy multiple events occur to detect, fence, and remediate the node, in order to restore workloads, and ideally the node, to health. Some events are triggered by the OpenShift cluster, and some events are reactions by the Workload Availability operators. Understanding this flow of events, and the duration between these events, is important to make informed decisions. Decisions including, which remediation provider to use, and how to configure Node Health Check Operator and the chosen remediaton provider.
The example outlined next is a common use case that outlines the phased flow of events. Only when the Node Health Check Operator has the Ready=Unknown
unhealthy conditions do the phases act as follows.
1.7.1. Phase 1 - Kubernetes Health Check (Core OpenShift)
The unhealthy node stops communicating with the API server. After approximately 50 seconds the API server sets the "Ready" condition of the node to "Unknown", that is, Ready=Unknown
.
1.7.2. Phase 2 - Node Health Check (NHC)
If the Ready=Unknown
condition is present longer than the configured duration, it starts a remediation.
The user-configured duration in this phase represents the tolerance that the Operator has towards the duration of the unhealthy condition. It takes into account that while the workload is restarting as requested, the resource is expected to be "Unready".
For example:
- If you have a workload that takes a long time to restart, then you need to have a longer timeout.
- Likewise, when the workload restart is short, then the timeout needed is also short.
1.7.3. Phase 3 - Remediate Host / Remediate API (depending on the configured remediation operator)
Using Machine Deletion Remediation (MDR), Self Node Remediation (SNR) or Fence Agents Remediation (FAR), the remediator fences and isolates the node by rebooting it in order to reach a safe state.
The details of this phase are configured by the user and depends on their workload requirements.
For example:
- Machine Deletion Remediation - The choice of platform influences the time it takes to reprovision the machine, and then the duration of the remediation. MDR is only applicable to clusters that use Machine API.
- Self Node Remediation - The remediation time depends on many parameters, including the safe time is takes to automatically reboot unhealthy nodes, and the watchdog devices used to ensure that the machine enters a safe state when an error condition is detected.
- Fence Agents Remediation - The fencing agent time depends on many parameters including, the cluster nodes, the management interface, and the agent parameters.
1.7.4. Phase 4 - Workload starting
When MDR is used, the remediator deletes the resources. When FAR and SNR are used, varying remediation strategies are available for them to use. One strategy is OutOfServiceTaint
which uses out-of-service taint to permit the deletion of resources in the cluster. In both cases, deleting the resources enables faster rescheduling of the affected workload. The workload is then rescheduled and restarted.
This phase is initiated automatically by the remediators when fencing is complete. If fencing does not complete, and an escalation remediation is required, the user must configure the timeout in seconds for the entire remediation process. If the timeout passes, and the node is still unhealthy, NHC will try the next remediator in line to remediate the unhealthy node.
1.8. About metrics for workload availability operators
The addition of data analysis enhances observability for the workload availability operators. The data provides metrics about the activity of the operators, and the effect on the cluster. These metrics improve decision-making capabilities, enable data-driven optimization, and enhance overall system performance.
You can use metrics to do these tasks:
- Access comprehensive tracking data for operators, to monitor overall system efficiency.
- Access actionable insights derived from tracking data, such as identifying frequently failing nodes, or downtime due to operator’s remediations.
- Visualize how the operator’s remediations are actually enhancing the system efficiency.
1.8.1. Configuring metrics for workload availability operators
You can use the Red Hat OpenShift web console to install the Node Health Check Operator.
Prerequisites
- You must first configure the monitoring stack. For more information, see Configuring the monitoring stack.
- You must enable monitoring for used-defined projects. For more information, see Enabling monitoring for used-defined projects.
Procedure
Create the
prometheus-user-token
secret from the existingprometheus-user-workload-token
secret as follows:existingPrometheusTokenSecret=$(kubectl get secret --namespace openshift-user-workload-monitoring | grep prometheus-user-workload-token | awk '{print $1}') 1 kubectl get secret ${existingPrometheusTokenSecret} --namespace=openshift-user-workload-monitoring -o yaml | \ sed '/namespace: .*==/d;/ca.crt:/d;/serviceCa.crt/d;/creationTimestamp:/d;/resourceVersion:/d;/uid:/d;/annotations/d;/kubernetes.io/d;' | \ sed 's/namespace: .*/namespace: openshift-workload-availability/' | \ 2 sed 's/name: .*/name: prometheus-user-workload-token/' | \ 3 sed 's/type: .*/type: Opaque/' | \ > prom-token.yaml kubectl apply -f prom-token.yaml
- 1
- The prometheus-user-token is required by the Metric ServiceMonitor, created in the next step.
- 2
- Ensure the new Secret’s namespace is the one where NHC Operator is installed, for example, openshift-workload-availability .
- 3
- The prometheus-user-workload-token only exists if User Worload Prometheus scraping is enabled.
Create the ServiceMonitor as follows:
apiVersion: monitoring.coreos.com/v1 kind: ServiceMonitor metadata: name: node-healthcheck-metrics-monitor namespace: openshift-workload-availability 1 labels: app.kubernetes.io/component: controller-manager spec: endpoints: - interval: 30s port: https scheme: https authorization: type: Bearer credentials: name: prometheus-user-workload-token key: token tlsConfig: ca: configMap: name: nhc-serving-certs-ca-bundle key: service-ca.crt serverName: node-healthcheck-controller-manager-metrics-service.openshift-workload-availability.svc 2 selector: matchLabels: app.kubernetes.io/component: controller-manager app.kubernetes.io/name: node-healthcheck-operator app.kubernetes.io/instance: metrics
- 1
- Specify the namespace where you want to configure the metrics, for example,
openshift-workload-availability
. - 2
- The serverName must contain the same namespace where the Operator is installed. In the example,
openshift-workload-availability
is placed after the metrics service name and before the filetype extension.
Verification
To confirm that the configuration is successful the Observe > Targets tab in OCP Web UI shows Endpoint Up
.
1.8.2. Example metrics for workload availability operators
The following are example metrics from the various workload availability operators.
The metrics include information on the following indicators:
- Operator availability: Showing if and when each Operator is up and running.
- Node remediation count: Showing the number of remediations across the same node, and across all nodes.
- Node remediation duration: Showing the remediation downtime or recovery time.
- Node remediation gauge: Showing the number of ongoing remediations.
Chapter 2. Using Self Node Remediation
You can use the Self Node Remediation Operator to automatically reboot unhealthy nodes. This remediation strategy minimizes downtime for stateful applications and ReadWriteOnce (RWO) volumes, and restores compute capacity if transient failures occur.
2.1. About the Self Node Remediation Operator
The Self Node Remediation Operator runs on the cluster nodes and reboots nodes that are identified as unhealthy. The Operator uses the MachineHealthCheck
or NodeHealthCheck
controller to detect the health of a node in the cluster. When a node is identified as unhealthy, the MachineHealthCheck
or the NodeHealthCheck
resource creates the SelfNodeRemediation
custom resource (CR), which triggers the Self Node Remediation Operator.
The SelfNodeRemediation
CR resembles the following YAML file:
apiVersion: self-node-remediation.medik8s.io/v1alpha1 kind: SelfNodeRemediation metadata: name: selfnoderemediation-sample namespace: openshift-workload-availability spec: remediationStrategy: <remediation_strategy> 1 status: lastError: <last_error_message> 2
The Self Node Remediation Operator minimizes downtime for stateful applications and restores compute capacity if transient failures occur. You can use this Operator regardless of the management interface, such as IPMI or an API to provision a node, and regardless of the cluster installation type, such as installer-provisioned infrastructure or user-provisioned infrastructure.
2.1.1. About watchdog devices
Watchdog devices can be any of the following:
- Independently powered hardware devices
- Hardware devices that share power with the hosts they control
-
Virtual devices implemented in software, or
softdog
Hardware watchdog and softdog
devices have electronic or software timers, respectively. These watchdog devices are used to ensure that the machine enters a safe state when an error condition is detected. The cluster is required to repeatedly reset the watchdog timer to prove that it is in a healthy state. This timer might elapse due to fault conditions, such as deadlocks, CPU starvation, and loss of network or disk access. If the timer expires, the watchdog device assumes that a fault has occurred and the device triggers a forced reset of the node.
Hardware watchdog devices are more reliable than softdog
devices.
2.1.1.1. Understanding Self Node Remediation Operator behavior with watchdog devices
The Self Node Remediation Operator determines the remediation strategy based on the watchdog devices that are present.
If a hardware watchdog device is configured and available, the Operator uses it for remediation. If a hardware watchdog device is not configured, the Operator enables and uses a softdog
device for remediation.
If neither watchdog devices are supported, either by the system or by the configuration, the Operator remediates nodes by using software reboot.
Additional resources
2.2. Control plane fencing
In earlier releases, you could enable Self Node Remediation and Node Health Check on worker nodes. In the event of node failure, you can now also follow remediation strategies on control plane nodes.
Self Node Remediation occurs in two primary scenarios.
API Server Connectivity
- In this scenario, the control plane node to be remediated is not isolated. It can be directly connected to the API Server, or it can be indirectly connected to the API Server through worker nodes or control-plane nodes, that are directly connected to the API Server.
-
When there is API Server Connectivity, the control plane node is remediated only if the Node Health Check Operator has created a
SelfNodeRemediation
custom resource (CR) for the node.
No API Server Connectivity
- In this scenario, the control plane node to be remediated is isolated from the API Server. The node cannot connect directly or indirectly to the API Server.
When there is no API Server Connectivity, the control plane node will be remediated as outlined with these steps:
Check the status of the control plane node with the majority of the peer worker nodes. If the majority of the peer worker nodes cannot be reached, the node will be analyzed further.
Self-diagnose the status of the control plane node
- If self diagnostics passed, no action will be taken.
- If self diagnostics failed, the node will be fenced and remediated.
-
The self diagnostics currently supported are checking the
kubelet
service status, and checking endpoint availability usingopt in
configuration.
- If the node did not manage to communicate to most of its worker peers, check the connectivity of the control plane node with other control plane nodes. If the node can communicate with any other control plane peer, no action will be taken. Otherwise, the node will be fenced and remediated.
2.3. Installing the Self Node Remediation Operator by using the web console
You can use the Red Hat OpenShift web console to install the Self Node Remediation Operator.
The Node Health Check Operator also installs the Self Node Remediation Operator as a default remediation provider.
Prerequisites
-
Log in as a user with
cluster-admin
privileges.
Procedure
- In the Red Hat OpenShift web console, navigate to Operators → OperatorHub.
- Select the Self Node Remediation Operator from the list of available Operators, and then click Install.
-
Keep the default selection of Installation mode and namespace to ensure that the Operator is installed to the
openshift-workload-availability
namespace. - Click Install.
Verification
To confirm that the installation is successful:
- Navigate to the Operators → Installed Operators page.
-
Check that the Operator is installed in the
openshift-workload-availability
namespace and its status isSucceeded
.
If the Operator is not installed successfully:
-
Navigate to the Operators → Installed Operators page and inspect the
Status
column for any errors or failures. -
Navigate to the Workloads → Pods page and check the logs of the
self-node-remediation-controller-manager
pod andself-node-remediation-ds
pods in theopenshift-workload-availability
project for any reported issues.
2.4. Installing the Self Node Remediation Operator by using the CLI
You can use the OpenShift CLI (oc
) to install the Self Node Remediation Operator.
You can install the Self Node Remediation Operator in your own namespace or in the openshift-workload-availability
namespace.
Prerequisites
-
Install the OpenShift CLI (
oc
). -
Log in as a user with
cluster-admin
privileges.
Procedure
Create a
Namespace
custom resource (CR) for the Self Node Remediation Operator:Define the
Namespace
CR and save the YAML file, for example,workload-availability-namespace.yaml
:apiVersion: v1 kind: Namespace metadata: name: openshift-workload-availability
To create the
Namespace
CR, run the following command:$ oc create -f workload-availability-namespace.yaml
Create an
OperatorGroup
CR:Define the
OperatorGroup
CR and save the YAML file, for example,workload-availability-operator-group.yaml
:apiVersion: operators.coreos.com/v1 kind: OperatorGroup metadata: name: workload-availability-operator-group namespace: openshift-workload-availability
To create the
OperatorGroup
CR, run the following command:$ oc create -f workload-availability-operator-group.yaml
Create a
Subscription
CR:Define the
Subscription
CR and save the YAML file, for example,self-node-remediation-subscription.yaml
:apiVersion: operators.coreos.com/v1alpha1 kind: Subscription metadata: name: self-node-remediation-operator namespace: openshift-workload-availability 1 spec: channel: stable installPlanApproval: Manual 2 name: self-node-remediation-operator source: redhat-operators sourceNamespace: openshift-marketplace package: self-node-remediation
- 1
- Specify the
Namespace
where you want to install the Self Node Remediation Operator. To install the Self Node Remediation Operator in theopenshift-workload-availability
namespace, specifyopenshift-workload-availability
in theSubscription
CR. - 2
- Set the approval strategy to Manual in case your specified version is superseded by a later version in the catalog. This plan prevents an automatic upgrade to a later version and requires manual approval before the starting CSV can complete the installation.
To create the
Subscription
CR, run the following command:$ oc create -f self-node-remediation-subscription.yaml
Verify that the Self Node Remediation Operator created the
SelfNodeRemediationTemplate
CR:$ oc get selfnoderemediationtemplate -n openshift-workload-availability
Example output
self-node-remediation-automatic-strategy-template
Verification
Verify that the installation succeeded by inspecting the CSV resource:
$ oc get csv -n openshift-workload-availability
Example output
NAME DISPLAY VERSION REPLACES PHASE self-node-remediation.v0.8.0 Self Node Remediation Operator v.0.8.0 self-node-remediation.v0.7.1 Succeeded
Verify that the Self Node Remediation Operator is up and running:
$ oc get deployment -n openshift-workload-availability
Example output
NAME READY UP-TO-DATE AVAILABLE AGE self-node-remediation-controller-manager 1/1 1 1 28h
Verify that the Self Node Remediation Operator created the
SelfNodeRemediationConfig
CR:$ oc get selfnoderemediationconfig -n openshift-workload-availability
Example output
NAME AGE self-node-remediation-config 28h
Verify that each self node remediation pod is scheduled and running on each worker node and control plane node:
$ oc get daemonset -n openshift-workload-availability
Example output
NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE self-node-remediation-ds 6 6 6 6 6 <none> 28h
2.5. Configuring the Self Node Remediation Operator
The Self Node Remediation Operator creates the SelfNodeRemediationConfig
CR and the SelfNodeRemediationTemplate
Custom Resource Definition (CRD).
To avoid unexpected reboots of a specific node, the Node Maintenance Operator places the node in maintenance mode and automatically adds a node selector that prevents the SNR daemonset from running on the specific node.
2.5.1. Understanding the Self Node Remediation Operator configuration
The Self Node Remediation Operator creates the SelfNodeRemediationConfig
CR with the name self-node-remediation-config
. The CR is created in the namespace of the Self Node Remediation Operator.
A change in the SelfNodeRemediationConfig
CR re-creates the Self Node Remediation daemon set.
The SelfNodeRemediationConfig
CR resembles the following YAML file:
apiVersion: self-node-remediation.medik8s.io/v1alpha1 kind: SelfNodeRemediationConfig metadata: name: self-node-remediation-config namespace: openshift-workload-availability spec: safeTimeToAssumeNodeRebootedSeconds: 180 1 watchdogFilePath: /dev/watchdog 2 isSoftwareRebootEnabled: true 3 apiServerTimeout: 15s 4 apiCheckInterval: 5s 5 maxApiErrorThreshold: 3 6 peerApiServerTimeout: 5s 7 peerDialTimeout: 5s 8 peerRequestTimeout: 5s 9 peerUpdateInterval: 15m 10 hostPort: 30001 11 customDsTolerations: 12 - effect: NoSchedule key: node-role.kubernetes.io.infra operator: Equal value: "value1" tolerationSeconds: 3600
- 1
- Specify an optional time duration that the Operator waits before recovering affected workloads running on an unhealthy node. Starting replacement pods while they are still running on the failed node can lead to data corruption and a violation of run-once semantics. The Operator calculates a minimum duration using the values in the
ApiServerTimeout
,ApiCheckInterval
,MaxApiErrorThreshold
,PeerDialTimeout
, andPeerRequestTimeout
fields, as well as the watchdog timeout and the cluster size at the time of remediation. To check the minimum duration calculation, view the manager pod logs and find references to thecalculated minimum time in seconds
.If you specify a value that is lower than the minimum duration, the Operator uses the minimum duration. However, if you want to increase the duration to a value higher than this minimum value, you can set
safeTimeToAssumeNodeRebootedSeconds
to a value higher than the minimum duration. - 2
- Specify the file path of the watchdog device in the nodes. If you enter an incorrect path to the watchdog device, the Self Node Remediation Operator automatically detects the softdog device path.
If a watchdog device is unavailable, the
SelfNodeRemediationConfig
CR uses a software reboot. - 3
- Specify if you want to enable software reboot of the unhealthy nodes. By default, the value of
isSoftwareRebootEnabled
is set totrue
. To disable the software reboot, set the parameter value tofalse
. - 4
- Specify the timeout duration to check connectivity with each API server. When this duration elapses, the Operator starts remediation. The timeout duration must be greater than or equal to 10 milliseconds.
- 5
- Specify the frequency to check connectivity with each API server. The timeout duration must be greater than or equal to 1 second.
- 6
- Specify a threshold value. After reaching this threshold, the node starts contacting its peers. The threshold value must be greater than or equal to 1 second.
- 7
- Specify the duration of the timeout for the peer to connect the API server. The timeout duration must be greater than or equal to 10 milliseconds.
- 8
- Specify the duration of the timeout for establishing connection with the peer. The timeout duration must be greater than or equal to 10 milliseconds.
- 9
- Specify the duration of the timeout to get a response from the peer. The timeout duration must be greater than or equal to 10 milliseconds.
- 10
- Specify the frequency to update peer information such as IP address. The timeout duration must be greater than or equal to 10 seconds.
- 11
- Specify an optional value to change the port that Self Node Remediation agents use for internal communication. The value must be greater than 0. The default value is port 30001.
- 12
- Specify custom toleration Self Node Remediation agents that are running on the DaemonSets to support remediation for different types of nodes. You can configure the following fields:
-
effect
: The effect indicates the taint effect to match. If this field is empty, all taint effects are matched. When specified, allowed values areNoSchedule
,PreferNoSchedule
andNoExecute
. -
key
: The key is the taint key that the toleration applies to. If this field is empty, all taint keys are matched. If the key is empty, theoperator
field must beExists
. This combination means to match all values and all keys. -
operator
: The operator represents a key’s relationship to the value. Valid operators areExists
andEqual
. The default isEqual
.Exists
is equivalent to a wildcard for a value, so that a pod can tolerate all taints of a particular category. -
value
: The taint value the toleration matches to. If the operator isExists
, the value should be empty, otherwise it is just a regular string. -
tolerationSeconds
: The period of time the toleration (which must be of effect NoExecute, otherwise this field is ignored) tolerates the taint. By default, it is not set, which means tolerate the taint forever (that is, do not evict). Zero and negative values will be treated as 0 (that is evict immediately) by the system. - Custom toleration allows you to add a toleration to the Self Node Remediation agent pod. For more information, see Using tolerations to control OpenShift Logging pod placement.
-
You can edit the self-node-remediation-config
CR that is created by the Self Node Remediation Operator. However, when you try to create a new CR for the Self Node Remediation Operator, the following message is displayed in the logs:
controllers.SelfNodeRemediationConfig ignoring selfnoderemediationconfig CRs that are not named 'self-node-remediation-config' or not in the namespace of the operator: 'openshift-workload-availability' {"selfnoderemediationconfig": "openshift-workload-availability/selfnoderemediationconfig-copy"}
2.5.2. Understanding the Self Node Remediation Template configuration
The Self Node Remediation Operator also creates the SelfNodeRemediationTemplate
Custom Resource Definition (CRD). This CRD defines the remediation strategy for the nodes. The following remediation strategies are available:
Automatic
-
This remediation strategy simplifies the remediation process by letting the Self Node Remediation Operator decide on the most suitable remediation strategy for the cluster. This strategy checks if the
OutOfServiceTaint
strategy is available on the cluster. If theOutOfServiceTaint
strategy is available, the Operator selects theOutOfServiceTaint
strategy. If theOutOfServiceTaint
strategy is not available, the Operator selects theResourceDeletion
strategy.Automatic
is the default remediation strategy. ResourceDeletion
- This remediation strategy removes the pods on the node, rather than the removal of the node object. This strategy recovers workloads faster.
OutOfServiceTaint
-
This remediation strategy implicitly causes the removal of the pods and associated volume attachments on the node, rather than the removal of the node object. It achieves this by placing the
OutOfServiceTaint
strategy on the node. This strategy recovers workloads faster. This strategy has been supported on technology preview since OpenShift Container Platform version 4.13, and on general availability since OpenShift Container Platform version 4.15.
The Self Node Remediation Operator creates the SelfNodeRemediationTemplate
CR for the strategy self-node-remediation-automatic-strategy-template
, which the Automatic
remediation strategy uses.
The SelfNodeRemediationTemplate
CR resembles the following YAML file:
apiVersion: self-node-remediation.medik8s.io/v1alpha1 kind: SelfNodeRemediationTemplate metadata: creationTimestamp: "2022-03-02T08:02:40Z" name: self-node-remediation-<remediation_object>-deletion-template 1 namespace: openshift-workload-availability spec: template: spec: remediationStrategy: <remediation_strategy> 2
2.5.3. Troubleshooting the Self Node Remediation Operator
2.5.3.1. General troubleshooting
- Issue
- You want to troubleshoot issues with the Self Node Remediation Operator.
- Resolution
- Check the Operator logs.
2.5.3.2. Checking the daemon set
- Issue
- The Self Node Remediation Operator is installed but the daemon set is not available.
- Resolution
- Check the Operator logs for errors or warnings.
2.5.3.3. Unsuccessful remediation
- Issue
- An unhealthy node was not remediated.
- Resolution
Verify that the
SelfNodeRemediation
CR was created by running the following command:$ oc get snr -A
If the
MachineHealthCheck
controller did not create theSelfNodeRemediation
CR when the node turned unhealthy, check the logs of theMachineHealthCheck
controller. Additionally, ensure that theMachineHealthCheck
CR includes the required specification to use the remediation template.If the
SelfNodeRemediation
CR was created, ensure that its name matches the unhealthy node or the machine object.
2.5.3.4. Daemon set and other Self Node Remediation Operator resources exist even after uninstalling the Operator
- Issue
- The Self Node Remediation Operator resources, such as the daemon set, configuration CR, and the remediation template CR, exist even after after uninstalling the Operator.
- Resolution
To remove the Self Node Remediation Operator resources, delete the resources by running the following commands for each resource type:
$ oc delete ds <self-node-remediation-ds> -n <namespace>
$ oc delete snrc <self-node-remediation-config> -n <namespace>
$ oc delete snrt <self-node-remediation-template> -n <namespace>
2.5.4. Gathering data about the Self Node Remediation Operator
To collect debugging information about the Self Node Remediation Operator, use the must-gather
tool. For information about the must-gather
image for the Self Node Remediation Operator, see Gathering data about specific features.
2.5.5. Additional resources
Chapter 3. Using Fence Agents Remediation
You can use the Fence Agents Remediation Operator to automatically remediate unhealthy nodes, similar to the Self Node Remediation Operator. FAR is designed to run an existing set of upstream fencing agents on environments with a traditional API end-point, for example, IPMI, for power cycling cluster nodes, while their pods are quickly evicted based on the remediation strategy.
3.1. About the Fence Agents Remediation Operator
The Fence Agents Remediation (FAR) Operator uses external tools to fence unhealthy nodes. These tools are a set of fence agents, where each fence agent can be used for different environments to fence a node, and using a traditional Application Programming Interface (API) call that reboots a node. By doing so, FAR can minimize downtime for stateful applications, restores compute capacity if transient failures occur, and increases the availability of workloads.
FAR not only fences a node when it becomes unhealthy, it also tries to remediate the node from being unhealthy to healthy. It adds a taint to evict stateless pods, fences the node with a fence agent, and after a reboot, it completes the remediation with resource deletion to remove any remaining workloads (mostly stateful workloads). Adding the taint and deleting the workloads accelerates the workload rescheduling.
The Operator watches for new or deleted custom resources (CRs) called FenceAgentsRemediation
which trigger a fence agent to remediate a node, based on the CR’s name. FAR uses the NodeHealthCheck
controller to detect the health of a node in the cluster. When a node is identified as unhealthy, the NodeHealthCheck
resource creates the FenceAgentsRemediation
CR, based on the FenceAgentsRemediationTemplate
CR, which then triggers the Fence Agents Remediation Operator.
FAR uses a fence agent to fence a Kubernetes node. Generally, fencing is the process of taking unresponsive/unhealthy computers into a safe state, and isolating the computer. Fence agent is a software code that uses a management interface to perform fencing, mostly power-based fencing which enables power-cycling, reset, or turning off the computer. An example fence agent is fence_ipmilan
which is used for Intelligent Platform Management Interface (IPMI) environments.
apiVersion: fence-agents-remediation.medik8s.io/v1alpha1 kind: FenceAgentsRemediation metadata: name: node-name 1 namespace: openshift-workload-availability spec: remediationStrategy: <remediation_strategy> 2
- 1
- The node-name should match the name of the unhealthy cluster node.
- 2
- Specifies the remediation strategy for the nodes. For more information on the remediation strategies available, see the Understanding the Fence Agents Remediation Template configuration topic.
The Operator includes a set of fence agents, that are also available in the Red Hat High Availability Add-On, which use a management interface, such as IPMI or an API, to provision/reboot a node for bare metal servers, virtual machines, and cloud platforms.
3.2. Installing the Fence Agents Remediation Operator by using the web console
You can use the Red Hat OpenShift web console to install the Fence Agents Remediation Operator.
Prerequisites
-
Log in as a user with
cluster-admin
privileges.
Procedure
- In the Red Hat OpenShift web console, navigate to Operators → OperatorHub.
- Select the Fence Agents Remediation Operator, or FAR, from the list of available Operators, and then click Install.
-
Keep the default selection of Installation mode and namespace to ensure that the Operator is installed to the
openshift-workload-availability
namespace. - Click Install.
Verification
To confirm that the installation is successful:
- Navigate to the Operators → Installed Operators page.
-
Check that the Operator is installed in the
openshift-workload-availability
namespace and its status isSucceeded
.
If the Operator is not installed successfully:
- Navigate to the Operators → Installed Operators page and inspect the Status column for any errors or failures.
-
Navigate to the Workloads → Pods page and check the log of the
fence-agents-remediation-controller-manager
pod for any reported issues.
3.3. Installing the Fence Agents Remediation Operator by using the CLI
You can use the OpenShift CLI (oc
) to install the Fence Agents Remediation Operator.
You can install the Fence Agents Remediation Operator in your own namespace or in the openshift-workload-availability
namespace.
Prerequisites
-
Install the OpenShift CLI (
oc
). -
Log in as a user with
cluster-admin
privileges.
Procedure
Create a
Namespace
custom resource (CR) for the Fence Agents Remediation Operator:Define the
Namespace
CR and save the YAML file, for example,workload-availability-namespace.yaml
:apiVersion: v1 kind: Namespace metadata: name: openshift-workload-availability
To create the
Namespace
CR, run the following command:$ oc create -f workload-availability-namespace.yaml
Create an
OperatorGroup
CR:Define the
OperatorGroup
CR and save the YAML file, for example,workload-availability-operator-group.yaml
:apiVersion: operators.coreos.com/v1 kind: OperatorGroup metadata: name: workload-availability-operator-group namespace: openshift-workload-availability
To create the
OperatorGroup
CR, run the following command:$ oc create -f workload-availability-operator-group.yaml
Create a
Subscription
CR:Define the
Subscription
CR and save the YAML file, for example,fence-agents-remediation-subscription.yaml
:apiVersion: operators.coreos.com/v1alpha1 kind: Subscription metadata: name: fence-agents-remediation-subscription namespace: openshift-workload-availability 1 spec: channel: stable name: fence-agents-remediation source: redhat-operators sourceNamespace: openshift-marketplace package: fence-agents-remediation
- 1
- Specify the
Namespace
where you want to install the Fence Agents Remediation Operator, for example, theopenshift-workload-availability
outlined earlier in this procedure. You can install theSubscription
CR for the Fence Agents Remediation Operator in theopenshift-workload-availability
namespace where there is already a matchingOperatorGroup
CR.
To create the
Subscription
CR, run the following command:$ oc create -f fence-agents-remediation-subscription.yaml
Verification
Verify that the installation succeeded by inspecting the CSV resource:
$ oc get csv -n openshift-workload-availability
Example output
NAME DISPLAY VERSION REPLACES PHASE fence-agents-remediation.v0.3.0 Fence Agents Remediation Operator 0.3.0 fence-agents-remediation.v0.2.1 Succeeded
Verify that the Fence Agents Remediation Operator is up and running:
$ oc get deployment -n openshift-workload-availability
Example output
NAME READY UP-TO-DATE AVAILABLE AGE fence-agents-remediation-controller-manager 2/2 2 2 110m
3.4. Configuring the Fence Agents Remediation Operator
You can use the Fence Agents Remediation Operator to create the FenceAgentsRemediationTemplate
Custom Resource (CR), which is used by the Node Health Check Operator (NHC). This CR defines the fence agent to be used in the cluster with all the required parameters for remediating the nodes. There may be many FenceAgentsRemediationTemplate
CRs, at most one for each fence agent, and when NHC is being used it can choose the FenceAgentsRemediationTemplate
as the remediationTemplate to be used for power-cycling the node.
The FenceAgentsRemediationTemplate
CR resembles the following YAML file:
apiVersion: fence-agents-remediation.medik8s.io/v1alpha1 kind: FenceAgentsRemediationTemplate metadata: name: fence-agents-remediation-template-fence-ipmilan namespace: openshift-workload-availability spec: template: spec: agent: fence_ipmilan 1 nodeparameters: 2 --ipport: master-0-0: '6230' master-0-1: '6231' master-0-2: '6232' worker-0-0: '6233' worker-0-1: '6234' worker-0-2: '6235' sharedparameters: 3 '--action': reboot '--ip': 192.168.123.1 '--lanplus': '' '--password': password '--username': admin retryCount: '5' 4 retryInterval: '5s' 5 timeout: '60s' 6
- 1
- Displays the name of the fence agent to be executed, for example,
fence_ipmilan
. - 2
- Displays the node-specific parameters for executing the fence agent, for example,
ipport
. - 3
- Displays the cluster-wide parameters for executing the fence agent, for example,
username
. - 4
- Displays the number of times to retry the fence agent command in case of failure. The default number of attempts is 5.
- 5
- Displays the interval between retries in seconds. The default is 5 seconds.
- 6
- Displays the timeout for the fence agent command. The default is 60 seconds. For values of 60 seconds or greater, the timeout value is expressed in both minutes and seconds in the YAML file.
3.4.1. Understanding the Fence Agents Remediation Template configuration
The Fence Agents Remediation Operator also creates the FenceAgentsRemediationTemplate
Custom Resource Definition (CRD). This CRD defines the remediation strategy for the nodes that is aimed to recover workloads faster. The following remediation strategies are available:
ResourceDeletion
- This remediation strategy removes the pods on the node. This strategy recovers workloads faster.
OutOfServiceTaint
-
This remediation strategy implicitly causes the removal of the pods and associated volume attachments on the node. It achieves this by placing the
OutOfServiceTaint
taint on the node. TheOutOfServiceTaint
strategy also represents a non-graceful node shutdown. A non-graceful node shutdown occurs when a node is shut down and not detected, instead of triggering an in-operating system shutdown. This strategy has been supported on technology preview since OpenShift Container Platform version 4.13, and on general availability since OpenShift Container Platform version 4.15.
The FenceAgentsRemediationTemplate
CR resembles the following YAML file:
apiVersion: fence-agents-remediation.medik8s.io/v1alpha1 kind: FenceAgentsRemediationTemplate metadata: name: fence-agents-remediation-<remediation_object>-deletion-template 1 namespace: openshift-workload-availability spec: template: spec: remediationStrategy: <remediation_strategy> 2
- 1
- Specifies the type of remediation template based on the remediation strategy. Replace
<remediation_object>
with eitherresource
ortaint
; for example,fence-agents-remediation-resource-deletion-template
. - 2
- Specifies the remediation strategy. The remediation strategy can either be
ResourceDeletion
orOutOfServiceTaint
.
3.5. Troubleshooting the Fence Agents Remediation Operator
3.5.1. General troubleshooting
- Issue
- You want to troubleshoot issues with the Fence Agents Remediation Operator.
- Resolution
Check the Operator logs.
$ oc logs <fence-agents-remediation-controller-manager-name> -c manager -n <namespace-name>
3.5.2. Unsuccessful remediation
- Issue
- An unhealthy node was not remediated.
- Resolution
Verify that the
FenceAgentsRemediation
CR was created by running the following command:$ oc get far -A
If the
NodeHealthCheck
controller did not create theFenceAgentsRemediation
CR when the node turned unhealthy, check the logs of theNodeHealthCheck
controller. Additionally, ensure that theNodeHealthCheck
CR includes the required specification to use the remediation template.If the
FenceAgentsRemediation
CR was created, ensure that its name matches the unhealthy node object.
3.5.3. Fence Agents Remediation Operator resources exist after uninstalling the Operator
- Issue
- The Fence Agents Remediation Operator resources, such as the remediation CR and the remediation template CR, exist after uninstalling the Operator.
- Resolution
To remove the Fence Agents Remediation Operator resources, you can delete the resources by selecting the "Delete all operand instances for this operator" checkbox before uninstalling. This checkbox feature is only available in Red Hat OpenShift since version 4.13. For all versions of Red Hat OpenShift, you can delete the resources by running the following relevant command for each resource type:
$ oc delete far <fence-agents-remediation> -n <namespace>
$ oc delete fartemplate <fence-agents-remediation-template> -n <namespace>
The remediation CR
far
must be created and deleted by the same entity, for example, NHC. If the remediation CRfar
is still present, it is deleted, together with the FAR operator.The remediation template CR
fartemplate
only exists if you use FAR with NHC. When the FAR operator is deleted using the web console, the remediation template CRfartemplate
is also deleted.
3.6. Gathering data about the Fence Agents Remediation Operator
To collect debugging information about the Fence Agents Remediation Operator, use the must-gather
tool. For information about the must-gather
image for the Fence Agents Remediation Operator, see Gathering data about specific features.
3.7. Agents supported by the Fence Agents Remediation Operator
This section describes the agents currently supported by the Fence Agents Remediation Operator.
Most of the supported agents can be grouped by the node’s hardware proprietary and usage, as follows:
- BareMetal
- Virtualization
- Intel
- HP
- IBM
- VMware
- Cisco
- APC
- Dell
- Other
Agent | Description |
---|---|
| An I/O Fencing agent that can be used with Out-of-Band controllers that support Redfish APIs. |
| An I/O Fencing agent that can be used with machines controlled by IPMI. |
[a]
This description also applies for the agents fence_ilo3 , fence_ilo4 , fence_ilo5 , fence_imm , fence_idrac , and fence_ipmilanplus .
|
Agent | Description |
---|---|
| An I/O Fencing agent that can be used with RHEV-M REST API to fence virtual machines. |
| An I/O Fencing agent that can be used with virtual machines. |
[a]
This description also applies for the agent fence_xvm .
|
Agent | Description |
---|---|
| An I/O Fencing agent that can be used with Intel AMT (WS). |
| An I/O Fencing agent that can be used with Intel Modular device (tested on Intel MFSYS25, should also work with MFSYS35). |
Agent | Description |
---|---|
| An I/O Fencing agent that can be used for HP servers with the Integrated Light Out (iLO) PCI card. |
| A fencing agent that can be used to connect to an iLO device. It logs into device via ssh and reboot a specified outlet. |
| An I/O Fencing agent that can be used with HP Moonshot iLO. |
| An I/O Fencing agent that can be used with HP iLO MP. |
| An I/O Fencing agent that can be used with HP BladeSystem and HP Integrity Superdome X. |
[a]
This description also applies for the agent fence_ilo2 .
[b]
This description also applies for the agents fence_ilo3_ssh , fence_ilo4_ssh , and fence_ilo5_ssh .
|
Agent | Description |
---|---|
| An I/O Fencing agent that can be used with IBM Bladecenters with recent enough firmware that includes telnet support. |
| An I/O Fencing agent that can be used with IBM BladeCenter chassis. |
| An I/O Fencing agent that can be used with the IBM iPDU network power switch. |
| An I/O Fencing agent that can be used with the IBM RSA II management interface. |
Agent | Description |
---|---|
| An I/O Fencing agent that can be used with VMware API to fence virtual machines. |
| An I/O Fencing agent that can be used with the virtual machines managed by VMWare products that have SOAP API v4.1+. |
Agent | Description |
---|---|
| An I/O Fencing agent that can be used with any Cisco MDS 9000 series with SNMP enabled device. |
| An I/O Fencing agent that can be used with Cisco UCS to fence machines. |
Agent | Description |
---|---|
| An I/O Fencing agent that can be used with the APC network power switch. |
| An I/O Fencing agent that can be used with the APC network power switch or Tripplite PDU devices. |
Agent | Description |
---|---|
| An I/O Fencing agent that can be used with the Dell Remote Access Card v5 or CMC (DRAC). |
Agent | Description |
---|---|
| An I/O Fencing agent that can be used with Brocade FC switches. |
| A resource that can be used to tell Nova that compute nodes are down and to reschedule flagged instances. |
| An I/O Fencing agent that can be used with the Eaton network power switch. |
| An I/O Fencing agent that can be used with MPX and MPH2 managed rack PDU. |
| An I/O Fencing agent that can be used with the ePowerSwitch 8M+ power switch to fence connected machines. |
| A resource that can be used to reschedule flagged instances. |
| A resource that can be used with ping-heuristics to control execution of another fence agent on the same fencing level. |
| An I/O Fencing agent that can be used with any SNMP IF-MIB capable device. |
| An I/O Fencing agent that can be used with the kdump crash recovery service. |
| An I/O Fencing agent that can be used with SCSI-3 persistent reservations to control access multipath devices. |
| An I/O Fencing agent that can be used with the Fujitsu-Siemens RSB management interface. |
| An I/O Fencing agent that can be used in environments where sbd can be used (shared storage). |
| An I/O Fencing agent that can be used with SCSI-3 persistent reservations to control access to shared storage devices. |
| An I/O Fencing agent that can be used with the WTI Network Power Switch (NPS). |
3.8. Additional resources
Chapter 4. Using Machine Deletion Remediation
You can use the Machine Deletion Remediation Operator to reprovision unhealthy nodes using the Machine API. You can use the Machine Deletion Remediation Operator in conjunction with the Node Health Check Operator.
4.1. About the Machine Deletion Remediation Operator
The Machine Deletion Remediation (MDR) operator works with the NodeHealthCheck
controller, to reprovision unhealthy nodes using the Machine API. MDR follows the annotation on the node to the associated machine object, confirms that it has an owning controller (for example, MachineSetController
), and deletes it. Once the machine CR is deleted, the owning controller creates a replacement.
The prerequisites for MDR include:
- a Machine API-based cluster that is able to programmatically destroy and create cluster nodes,
- nodes that are associated with machines, and
- declaratively managed machines.
You can then modify the NodeHealthCheck
CR to use MDR as its remediator. An example MDR template object and NodeHealthCheck
configuration are provided in the documentation.
The MDR process works as follows:
- the Node Health Check Operator detects an unhealthy node and creates a MDR CR.
- the MDR Operator watches for the MDR CR associated with the unhealthy node and deletes it, if the machine has an owning controller.
-
when the node is healthy again, the MDR CR is deleted by the
NodeHealthCheck
controller.
4.2. Installing the Machine Deletion Remediation Operator by using the web console
You can use the Red Hat OpenShift web console to install the Machine Deletion Remediation Operator.
Prerequisites
-
Log in as a user with
cluster-admin
privileges.
Procedure
- In the Red Hat OpenShift web console, navigate to Operators → OperatorHub.
- Select the Machine Deletion Remediation Operator, or MDR, from the list of available Operators, and then click Install.
-
Keep the default selection of Installation mode and namespace to ensure that the Operator is installed to the
openshift-workload-availability
namespace. - Click Install.
Verification
To confirm that the installation is successful:
- Navigate to the Operators → Installed Operators page.
-
Check that the Operator is installed in the
openshift-workload-availability
namespace and its status isSucceeded
.
If the Operator is not installed successfully:
- Navigate to the Operators → Installed Operators page and inspect the Status column for any errors or failures.
-
Navigate to the Workloads → Pods page and check the log of the pod in the
openshift-workload-availability
project for any reported issues.
4.3. Installing the Machine Deletion Remediation Operator by using the CLI
You can use the OpenShift CLI (oc
) to install the Machine Deletion Remediation Operator.
You can install the Machine Deletion Remediation Operator in your own namespace or in the openshift-workload-availability
namespace.
Prerequisites
-
Install the OpenShift CLI (
oc
). -
Log in as a user with
cluster-admin
privileges.
Procedure
Create a
Namespace
custom resource (CR) for the Machine Deletion Remediation Operator:Define the
Namespace
CR and save the YAML file, for example,workload-availability-namespace.yaml
:apiVersion: v1 kind: Namespace metadata: name: openshift-workload-availability
To create the
Namespace
CR, run the following command:$ oc create -f workload-availability-namespace.yaml
Create an
OperatorGroup
CR:Define the
OperatorGroup
CR and save the YAML file, for example,workload-availability-operator-group.yaml
:apiVersion: operators.coreos.com/v1 kind: OperatorGroup metadata: name: workload-availability-operator-group namespace: openshift-workload-availability
To create the
OperatorGroup
CR, run the following command:$ oc create -f workload-availability-operator-group.yaml
Create a
Subscription
CR:Define the
Subscription
CR and save the YAML file, for example,machine-deletion-remediation-subscription.yaml
:apiVersion: operators.coreos.com/v1alpha1 kind: Subscription metadata: name: machine-deletion-remediation-operator namespace: openshift-workload-availability 1 spec: channel: stable name: machine-deletion-remediation-operator source: redhat-operators sourceNamespace: openshift-marketplace package: machine-deletion-remediation
- 1
- Specify the
Namespace
where you want to install the Machine Deletion Remediation Operator. When installing the Machine Deletion Remediation Operator in theopenshift-workload-availability
Subscription
CR, theNamespace
andOperatorGroup
CRs will already exist.
To create the
Subscription
CR, run the following command:$ oc create -f machine-deletion-remediation-subscription.yaml
Verification
Verify that the installation succeeded by inspecting the CSV resource:
$ oc get csv -n openshift-workload-availability
Example output
NAME DISPLAY VERSION REPLACES PHASE machine-deletion-remediation.v0.3.0 Machine Deletion Remediation Operator 0.3.0 machine-deletion-remediation.v0.2.1 Succeeded
4.4. Configuring the Machine Deletion Remediation Operator
You can use the Machine Deletion Remediation Operator, with the Node Health Check Operator, to create the MachineDeletionRemediationTemplate
Custom Resource (CR). This CR defines the remediation strategy for the nodes.
The MachineDeletionRemediationTemplate
CR resembles the following YAML file:
apiVersion: machine-deletion-remediation.medik8s.io/v1alpha1 kind: MachineDeletionRemediationTemplate metadata: name: machinedeletionremediationtemplate-sample namespace: openshift-workload-availability spec: template: spec: {}
4.5. Troubleshooting the Machine Deletion Remediation Operator
4.5.1. General troubleshooting
- Issue
- You want to troubleshoot issues with the Machine Deletion Remediation Operator.
- Resolution
Check the Operator logs.
$ oc logs <machine-deletion-remediation-controller-manager-name> -c manager -n <namespace-name>
4.5.2. Unsuccessful remediation
- Issue
- An unhealthy node was not remediated.
- Resolution
Verify that the
MachineDeletionRemediation
CR was created by running the following command:$ oc get mdr -A
If the
NodeHealthCheck
controller did not create theMachineDeletionRemediation
CR when the node turned unhealthy, check the logs of theNodeHealthCheck
controller. Additionally, ensure that theNodeHealthCheck
CR includes the required specification to use the remediation template.If the
MachineDeletionRemediation
CR was created, ensure that its name matches the unhealthy node object.
4.5.3. Machine Deletion Remediation Operator resources exist even after uninstalling the Operator
- Issue
- The Machine Deletion Remediation Operator resources, such as the remediation CR and the remediation template CR, exist even after uninstalling the Operator.
- Resolution
To remove the Machine Deletion Remediation Operator resources, you can delete the resources by selecting the Delete all operand instances for this operator checkbox before uninstalling. This checkbox feature is only available in Red Hat OpenShift since version 4.13. For all versions of Red Hat OpenShift, you can delete the resources by running the following relevant command for each resource type:
$ oc delete mdr <machine-deletion-remediation> -n <namespace>
$ oc delete mdrt <machine-deletion-remediation-template> -n <namespace>
The remediation CR
mdr
must be created and deleted by the same entity, for example, NHC. If the remediation CRmdr
is still present, it is deleted, together with the MDR operator.The remediation template CR
mdrt
only exists if you use MDR with NHC. When the MDR operator is deleted using the web console, the remediation template CRmdrt
is also deleted.
4.6. Gathering data about the Machine Deletion Remediation Operator
To collect debugging information about the Machine Deletion Remediation Operator, use the must-gather
tool. For information about the must-gather
image for the Machine Deletion Remediation Operator, see Gathering data about specific features.
4.7. Additional resources
Chapter 5. Remediating nodes with Machine Health Checks
Machine health checks automatically repair unhealthy machines in a particular machine pool.
5.1. About machine health checks
You can only apply a machine health check to control plane machines on clusters that use control plane machine sets.
To monitor machine health, create a resource to define the configuration for a controller. Set a condition to check, such as staying in the NotReady
status for five minutes or displaying a permanent condition in the node-problem-detector, and a label for the set of machines to monitor.
The controller that observes a MachineHealthCheck
resource checks for the defined condition. If a machine fails the health check, the machine is automatically deleted and one is created to take its place. When a machine is deleted, you see a machine deleted
event.
To limit disruptive impact of the machine deletion, the controller drains and deletes only one node at a time. If there are more unhealthy machines than the maxUnhealthy
threshold allows for in the targeted pool of machines, remediation stops and therefore enables manual intervention.
Consider the timeouts carefully, accounting for workloads and requirements.
- Long timeouts can result in long periods of downtime for the workload on the unhealthy machine.
-
Too short timeouts can result in a remediation loop. For example, the timeout for checking the
NotReady
status must be long enough to allow the machine to complete the startup process.
To stop the check, remove the resource.
5.1.1. Limitations when deploying machine health checks
There are limitations to consider before deploying a machine health check:
- Only machines owned by a machine set are remediated by a machine health check.
- If the node for a machine is removed from the cluster, a machine health check considers the machine to be unhealthy and remediates it immediately.
-
If the corresponding node for a machine does not join the cluster after the
nodeStartupTimeout
, the machine is remediated. -
A machine is remediated immediately if the
Machine
resource phase isFailed
.
5.2. Configuring machine health checks to use the Self Node Remediation Operator
Use the following procedure to configure the worker or control-plane machine health checks to use the Self Node Remediation Operator as a remediation provider.
To use the Self Node Remediation Operator as a remediation provider for machine health checks, a machine must have an associated node in the cluster.
Prerequisites
-
Install the OpenShift CLI (
oc
). -
Log in as a user with
cluster-admin
privileges.
Procedure
Create a
SelfNodeRemediationTemplate
CR:Define the
SelfNodeRemediationTemplate
CR:apiVersion: self-node-remediation.medik8s.io/v1alpha1 kind: SelfNodeRemediationTemplate metadata: namespace: openshift-machine-api name: selfnoderemediationtemplate-sample spec: template: spec: remediationStrategy: Automatic 1
- 1
- Specifies the remediation strategy. The default remediation strategy is
Automatic
.
To create the
SelfNodeRemediationTemplate
CR, run the following command:$ oc create -f <snrt-name>.yaml
Create or update the
MachineHealthCheck
CR to point to theSelfNodeRemediationTemplate
CR:Define or update the
MachineHealthCheck
CR:apiVersion: machine.openshift.io/v1beta1 kind: MachineHealthCheck metadata: name: machine-health-check namespace: openshift-machine-api spec: selector: matchLabels: 1 machine.openshift.io/cluster-api-machine-role: "worker" machine.openshift.io/cluster-api-machine-type: "worker" unhealthyConditions: - type: "Ready" timeout: "300s" status: "False" - type: "Ready" timeout: "300s" status: "Unknown" maxUnhealthy: "40%" nodeStartupTimeout: "10m" remediationTemplate: 2 kind: SelfNodeRemediationTemplate apiVersion: self-node-remediation.medik8s.io/v1alpha1 name: selfnoderemediationtemplate-sample
To create a
MachineHealthCheck
CR, run the following command:$ oc create -f <mhc-name>.yaml
To update a
MachineHealthCheck
CR, run the following command:$ oc apply -f <mhc-name>.yaml
Chapter 6. Remediating nodes with Node Health Checks
You can use the Node Health Check Operator to identify unhealthy nodes. The Operator then uses other operators to remediate the unhealthy nodes. The Node Health Check Operator installs the Self Node Remediation Operator as a default remediation provider.
For more information on the Self Node Remediation Operator, see the Using Self Node Remediation chapter.
The Node Health Check Operator can also be used with other remediation providers, including:
- The Fence Agents Remediation Operator.
- The Machine Deletion Remediation Operator.
Due to the existence of preinstalled machine health checks on Red Hat OpenShift Service on AWS (ROSA) clusters, the Node Health Check Operator is unable to function in such an environment.
6.1. About the Node Health Check Operator
The Node Health Check Operator detects the health of the nodes in a cluster. The NodeHealthCheck
controller creates the NodeHealthCheck
custom resource (CR), which defines a set of criteria and thresholds to determine the health of a node.
When the Node Health Check Operator detects an unhealthy node, it creates a remediation CR that triggers the remediation provider. For example, the controller creates the SelfNodeRemediation
CR, which triggers the Self Node Remediation Operator to remediate the unhealthy node.
The NodeHealthCheck
CR resembles the following YAML file, with self-node-remediation as the remediation provider:
apiVersion: remediation.medik8s.io/v1alpha1 kind: NodeHealthCheck metadata: name: nodehealthcheck-sample spec: minHealthy: 51% 1 pauseRequests: 2 - <pause-test-cluster> remediationTemplate: 3 apiVersion: self-node-remediation.medik8s.io/v1alpha1 name: self-node-remediation-resource-deletion-template namespace: openshift-workload-availability kind: SelfNodeRemediationTemplate escalatingRemediations: 4 - remediationTemplate: apiVersion: self-node-remediation.medik8s.io/v1alpha1 name: self-node-remediation-resource-deletion-template namespace: openshift-workload-availability kind: SelfNodeRemediationTemplate order: 1 timeout: 300s selector: 5 matchExpressions: - key: node-role.kubernetes.io/worker operator: Exists unhealthyConditions: 6 - type: Ready status: "False" duration: 300s 7 - type: Ready status: Unknown duration: 300s 8
- 1
- Specifies the amount of healthy nodes(in percentage or number) required for a remediation provider to concurrently remediate nodes in the targeted pool. If the number of healthy nodes equals to or exceeds the limit set by
minHealthy
, remediation occurs. The default value is 51%. - 2
- Prevents any new remediation from starting, while allowing any ongoing remediations to persist. The default value is empty. However, you can enter an array of strings that identify the cause of pausing the remediation. For example,
pause-test-cluster
.NoteDuring the upgrade process, nodes in the cluster might become temporarily unavailable and get identified as unhealthy. In the case of worker nodes, when the Operator detects that the cluster is upgrading, it stops remediating new unhealthy nodes to prevent such nodes from rebooting.
- 3
- Specifies a remediation template from the remediation provider. For example, from the Self Node Remediation Operator.
remediationTemplate
is mutually exclusive withescalatingRemediations
. - 4
- Specifies a list of
RemediationTemplates
with order and timeout fields. To obtain a healthy node, use this field to sequence and configure multiple remediations. This strategy increases the likelihood of obtaining a healthy node, instead of depending on a single remediation that might not be successful. Theorder
field determines the order in which the remediations are invoked (lower order = earlier invocation). Thetimeout
field determines when the next remediation is invoked.escalatingRemediations
is mutually exclusive withremediationTemplate
.NoteWhen
escalatingRemediations
is used the remediation providers, Self Node Remediation Operator and Fence Agents Remediation Operator, can be used multiple times with differentremediationTemplate
configurations. However, you can not use the same Machine Deletion Remediation configuration with differentremediationTemplate
configurations. - 5
- Specifies a selector that matches labels or expressions that you want to check. Avoid selecting both control-plane and worker nodes in one CR.
- 6
- Specifies a list of the conditions that determine whether a node is considered unhealthy.
- 7 8
- Specifies the timeout duration for a node condition. If a condition is met for the duration of the timeout, the node will be remediated. Long timeouts can result in long periods of downtime for a workload on an unhealthy node.
The NodeHealthCheck
CR resembles the following YAML file, with metal3 as the remediation provider:
apiVersion: remediation.medik8s.io/v1alpha1 kind: NodeHealthCheck metadata: name: nhc-worker-metal3 spec: minHealthy: 30% remediationTemplate: apiVersion: infrastructure.cluster.x-k8s.io/v1beta1 kind: Metal3RemediationTemplate name: metal3-remediation namespace: openshift-machine-api selector: matchExpressions: - key: node-role.kubernetes.io/worker operator: Exists unhealthyConditions: - duration: 300s status: 'False' type: Ready - duration: 300s status: 'Unknown' type: Ready
The matchExpressions
are examples only; you must map your machine groups based on your specific needs.
The Metal3RemediationTemplate
resembles the following YAML file, with metal3 as the remediation provider:
apiVersion: infrastructure.cluster.x-k8s.io/v1beta1 kind: Metal3RemediationTemplate metadata: name: metal3-remediation namespace: openshift-machine-api spec: template: spec: strategy: retryLimit: 1 timeout: 5m0s type: Reboot
In addition to creating a NodeHealthCheck
CR, you must also create the Metal3RemediationTemplate
.
6.1.1. Understanding the Node Health Check Operator workflow
When a node is identified as unhealthy, the Node Health Check Operator checks how many other nodes are unhealthy. If the number of healthy nodes exceeds the amount that is specified in the minHealthy
field of the NodeHealthCheck
CR, the controller creates a remediation CR from the details that are provided in the external remediation template by the remediation provider. After remediation, the kubelet updates the node’s health status.
When the node turns healthy, the controller deletes the external remediation template.
6.1.2. About how node health checks prevent conflicts with machine health checks
When both, node health checks and machine health checks are deployed, the node health check avoids conflict with the machine health check.
Red Hat OpenShift deploys machine-api-termination-handler
as the default MachineHealthCheck
resource.
The following list summarizes the system behavior when node health checks and machine health checks are deployed:
If only the default machine health check exists, the node health check continues to identify unhealthy nodes. However, the node health check ignores unhealthy nodes in a Terminating state. The default machine health check handles the unhealthy nodes with a Terminating state.
Example log message
INFO MHCChecker ignoring unhealthy Node, it is terminating and will be handled by MHC {"NodeName": "node-1.example.com"}
If the default machine health check is modified (for example, the
unhealthyConditions
isReady
), or if additional machine health checks are created, the node health check is disabled.Example log message
INFO controllers.NodeHealthCheck disabling NHC in order to avoid conflict with custom MHCs configured in the cluster {"NodeHealthCheck": "/nhc-worker-default"}
When, again, only the default machine health check exists, the node health check is re-enabled.
Example log message
INFO controllers.NodeHealthCheck re-enabling NHC, no conflicting MHC configured in the cluster {"NodeHealthCheck": "/nhc-worker-default"}
6.2. Control plane fencing
In earlier releases, you could enable Self Node Remediation and Node Health Check on worker nodes. In the event of node failure, you can now also follow remediation strategies on control plane nodes.
Do not use the same NodeHealthCheck
CR for worker nodes and control plane nodes. Grouping worker nodes and control plane nodes together can result in incorrect evaluation of the minimum healthy node count, and cause unexpected or missing remediations. This is because of the way the Node Health Check Operator handles control plane nodes. You should group the control plane nodes in their own group and the worker nodes in their own group. If required, you can also create multiple groups of worker nodes.
Considerations for remediation strategies:
- Avoid Node Health Check configurations that involve multiple configurations overlapping the same nodes because they can result in unexpected behavior. This suggestion applies to both worker and control plane nodes.
- The Node Health Check Operator implements a hardcoded limitation of remediating a maximum of one control plane node at a time. Multiple control plane nodes should not be remediated at the same time.
6.3. Installing the Node Health Check Operator by using the web console
You can use the Red Hat OpenShift web console to install the Node Health Check Operator.
Prerequisites
-
Log in as a user with
cluster-admin
privileges.
Procedure
- In the Red Hat OpenShift web console, navigate to Operators → OperatorHub.
- Select the Node Health Check Operator, then click Install.
-
Keep the default selection of Installation mode and namespace to ensure that the Operator will be installed to the
openshift-workload-availability
namespace. -
Ensure that the Console plug-in is set to
Enable
. - Click Install.
Verification
To confirm that the installation is successful:
- Navigate to the Operators → Installed Operators page.
-
Check that the Operator is installed in the
openshift-workload-availability
namespace and that its status isSucceeded
.
If the Operator is not installed successfully:
- Navigate to the Operators → Installed Operators page and inspect the Status column for any errors or failures.
-
Navigate to the Workloads → Pods page and check the logs in any pods in the
openshift-workload-availability
project that are reporting issues.
6.4. Installing the Node Health Check Operator by using the CLI
You can use the OpenShift CLI (oc
) to install the Node Health Check Operator.
You can install the Node Health Check Operator in your own namespace or in the openshift-workload-availability
namespace.
Prerequisites
-
Install the OpenShift CLI (
oc
). -
Log in as a user with
cluster-admin
privileges.
Procedure
Create a
Namespace
custom resource (CR) for the Node Health Check Operator:Define the
Namespace
CR and save the YAML file, for example,node-health-check-namespace.yaml
:apiVersion: v1 kind: Namespace metadata: name: openshift-workload-availability
To create the
Namespace
CR, run the following command:$ oc create -f node-health-check-namespace.yaml
Create an
OperatorGroup
CR:Define the
OperatorGroup
CR and save the YAML file, for example,workload-availability-operator-group.yaml
:apiVersion: operators.coreos.com/v1 kind: OperatorGroup metadata: name: workload-availability-operator-group namespace: openshift-workload-availability
To create the
OperatorGroup
CR, run the following command:$ oc create -f workload-availability-operator-group.yaml
Create a
Subscription
CR:Define the
Subscription
CR and save the YAML file, for example,node-health-check-subscription.yaml
:apiVersion: operators.coreos.com/v1alpha1 kind: Subscription metadata: name: node-health-check-operator namespace: openshift-workload-availability 1 spec: channel: stable 2 installPlanApproval: Manual 3 name: node-healthcheck-operator source: redhat-operators sourceNamespace: openshift-marketplace package: node-healthcheck-operator
- 1
- Specify the
Namespace
where you want to install the Node Health Check Operator. To install the Node Health Check Operator in theopenshift-workload-availability
namespace, specifyopenshift-workload-availability
in theSubscription
CR. - 2
- Specify the channel name for your subscription. To upgrade to the latest version of the Node Health Check Operator, you must manually change the channel name for your subscription from
candidate
tostable
. - 3
- Set the approval strategy to Manual in case your specified version is superseded by a later version in the catalog. This plan prevents an automatic upgrade to a later version and requires manual approval before the starting CSV can complete the installation.
To create the
Subscription
CR, run the following command:$ oc create -f node-health-check-subscription.yaml
Verification
Verify that the installation succeeded by inspecting the CSV resource:
$ oc get csv -n openshift-workload-availability
Example output
NAME DISPLAY VERSION REPLACES PHASE node-healthcheck-operator.v0.7.0 Node Health Check Operator 0.7.0 node-healthcheck-operator.v0.6.1 Succeeded
Verify that the Node Health Check Operator is up and running:
$ oc get deployment -n openshift-workload-availability
Example output
NAME READY UP-TO-DATE AVAILABLE AGE node-healthcheck-controller-manager 2/2 2 2 10d
6.5. Creating a node health check
Using the web console, you can create a node health check to identify unhealthy nodes and specify the remediation type and strategy to fix them.
Procedure
- From the Administrator perspective of the Red Hat OpenShift web console, click Compute → NodeHealthChecks → CreateNodeHealthCheck.
- Specify whether to configure the node health check using the Form view or the YAML view.
- Enter a Name for the node health check. The name must consist of lower case, alphanumeric characters, '-' or '.', and must start and end with an alphanumeric character.
- Specify the Remediator type, and Self node remediation or Other. The Self node remediation option is part of the Self Node Remediation Operator that is installed with the Node Health Check Operator. Selecting Other requires an API version, Kind, Name, and Namespace to be entered, which then points to the remediation template resource of a remediator.
Make a Nodes selection by specifying the labels of the nodes you want to remediate. The selection matches labels that you want to check. If more than one label is specified, the nodes must contain each label. The default value is empty, which selects both worker and control-plane nodes.
NoteWhen creating a node health check with the Self Node Remediation Operator, you must select either
node-role.kubernetes.io/worker
ornode-role.kubernetes.io/control-plane
as the value.- Specify the minimum number of healthy nodes, using either a percentage or a number, required for a NodeHealthCheck to remediate nodes in the targeted pool. If the number of healthy nodes equals to or exceeds the limit set by Min healthy, remediation occurs. The default value is 51%.
- Specify a list of Unhealthy conditions that if a node meets determines whether the node is considered unhealthy, and requires remediation. You can specify the Type, Status and Duration. You can also create your own custom type.
- Click Create to create the node health check.
Verification
- Navigate to the Compute → NodeHealthCheck page and verify that the corresponding node health check is listed, and their status displayed. Once created, node health checks can be paused, modified, and deleted.
6.6. Gathering data about the Node Health Check Operator
To collect debugging information about the Node Health Check Operator, use the must-gather
tool. For information about the must-gather
image for the Node Health Check Operator, see Gathering data about specific features.
6.7. Additional resources
Chapter 7. Placing nodes in maintenance mode with Node Maintenance Operator
You can use the Node Maintenance Operator to place nodes in maintenance mode by using the oc adm
utility or NodeMaintenance
custom resources (CRs).
7.1. About the Node Maintenance Operator
The Node Maintenance Operator watches for new or deleted NodeMaintenance
CRs. When a new NodeMaintenance
CR is detected, no new workloads are scheduled and the node is cordoned off from the rest of the cluster. All pods that can be evicted are evicted from the node. When a NodeMaintenance
CR is deleted, the node that is referenced in the CR is made available for new workloads.
Using a NodeMaintenance
CR for node maintenance tasks achieves the same results as the oc adm cordon
and oc adm drain
commands using standard Red Hat OpenShift CR processing.
7.2. Installing the Node Maintenance Operator
You can install the Node Maintenance Operator using the web console or the OpenShift CLI (oc
).
If OpenShift Virtualization version 4.10 or less is installed in your cluster, it includes an outdated version of the Node Maintenance Operator.
7.2.1. Installing the Node Maintenance Operator by using the web console
You can use the Red Hat OpenShift web console to install the Node Maintenance Operator.
Prerequisites
-
Log in as a user with
cluster-admin
privileges.
Procedure
- In the Red Hat OpenShift web console, navigate to Operators → OperatorHub.
- Select the Node Maintenance Operator, then click Install.
-
Keep the default selection of Installation mode and namespace to ensure that the Operator will be installed to the
openshift-workload-availability
namespace. - Click Install.
Verification
To confirm that the installation is successful:
- Navigate to the Operators → Installed Operators page.
-
Check that the Operator is installed in the
openshift-workload-availability
namespace and that its status isSucceeded
.
If the Operator is not installed successfully:
- Navigate to the Operators → Installed Operators page and inspect the Status column for any errors or failures.
-
Navigate to the Operators → Installed Operators → Node Maintenance Operator → Details page, and inspect the
Conditions
section for errors before pod creation. -
Navigate to the Workloads → Pods page, search for the
Node Maintenance Operator
pod in the installed namespace, and check the logs in theLogs
tab.
7.2.2. Installing the Node Maintenance Operator by using the CLI
You can use the OpenShift CLI (oc
) to install the Node Maintenance Operator.
You can install the Node Maintenance Operator in your own namespace or in the openshift-workload-availability
namespace.
Prerequisites
-
Install the OpenShift CLI (
oc
). -
Log in as a user with
cluster-admin
privileges.
Procedure
Create a
Namespace
CR for the Node Maintenance Operator:Define the
Namespace
CR and save the YAML file, for example,workload-availability-namespace.yaml
:apiVersion: v1 kind: Namespace metadata: name: openshift-workload-availability
To create the
Namespace
CR, run the following command:$ oc create -f workload-availability-namespace.yaml
Create an
OperatorGroup
CR:Define the
OperatorGroup
CR and save the YAML file, for example,workload-availability-operator-group.yaml
:apiVersion: operators.coreos.com/v1 kind: OperatorGroup metadata: name: workload-availability-operator-group namespace: openshift-workload-availability
To create the
OperatorGroup
CR, run the following command:$ oc create -f workload-availability-operator-group.yaml
Create a
Subscription
CR:Define the
Subscription
CR and save the YAML file, for example,node-maintenance-subscription.yaml
:apiVersion: operators.coreos.com/v1alpha1 kind: Subscription metadata: name: node-maintenance-operator namespace: openshift-workload-availability 1 spec: channel: stable installPlanApproval: Automatic name: node-maintenance-operator source: redhat-operators sourceNamespace: openshift-marketplace package: node-maintenance-operator
- 1
- Specify the
Namespace
where you want to install the Node Maintenance Operator.
ImportantTo install the Node Maintenance Operator in the
openshift-workload-availability
namespace, specifyopenshift-workload-availability
in theSubscription
CR.To create the
Subscription
CR, run the following command:$ oc create -f node-maintenance-subscription.yaml
Verification
Verify that the installation succeeded by inspecting the CSV resource:
$ oc get csv -n openshift-workload-availability
Example output
NAME DISPLAY VERSION REPLACES PHASE node-maintenance-operator.v5.3.0 Node Maintenance Operator 5.3.0 node-maintenance-operator.v5.2.1 Succeeded
Verify that the Node Maintenance Operator is running:
$ oc get deployment -n openshift-workload-availability
Example output
NAME READY UP-TO-DATE AVAILABLE AGE node-maintenance-operator-controller-manager 1/1 1 1 10d
The Node Maintenance Operator is supported in a restricted network environment. For more information, see Using Operator Lifecycle Manager on restricted networks.
7.3. Setting a node to maintenance mode
You can place a node into maintenance mode from the web console or from the CLI by using a NodeMaintenance
CR.
7.3.1. Setting a node to maintenance mode by using the web console
To set a node to maintenance mode, you can create a NodeMaintenance
custom resource (CR) by using the web console.
Prerequisites
-
Log in as a user with
cluster-admin
privileges. - Install the Node Maintenance Operator from the OperatorHub.
Procedure
- From the Administrator perspective in the web console, navigate to Operators → Installed Operators.
- Select the Node Maintenance Operator from the list of Operators.
- In the Node Maintenance tab, click Create NodeMaintenance.
-
In the Create NodeMaintenance page, select the Form view or the YAML view to configure the
NodeMaintenance
CR. -
To apply the
NodeMaintenance
CR that you have configured, click Create.
Verification
In the Node Maintenance tab, inspect the Status
column and verify that its status is Succeeded
.
7.3.2. Setting a node to maintenance mode by using the CLI
You can put a node into maintenance mode with a NodeMaintenance
custom resource (CR). When you apply a NodeMaintenance
CR, all allowed pods are evicted and the node is rendered unschedulable. Evicted pods are queued to be moved to another node in the cluster.
Prerequisites
-
Install the Red Hat OpenShift CLI
oc
. -
Log in to the cluster as a user with
cluster-admin
privileges.
Procedure
Create the following
NodeMaintenance
CR, and save the file asnodemaintenance-cr.yaml
:apiVersion: nodemaintenance.medik8s.io/v1beta1 kind: NodeMaintenance metadata: name: nodemaintenance-cr 1 spec: nodeName: node-1.example.com 2 reason: "NIC replacement" 3
Apply the node maintenance CR by running the following command:
$ oc apply -f nodemaintenance-cr.yaml
Verification
Check the progress of the maintenance task by running the following command:
$ oc describe node <node-name>
where
<node-name>
is the name of your node; for example,node-1.example.com
Check the example output:
Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal NodeNotSchedulable 61m kubelet Node node-1.example.com status is now: NodeNotSchedulable
7.3.3. Checking status of current NodeMaintenance CR tasks
You can check the status of current NodeMaintenance
CR tasks.
Prerequisites
-
Install the Red Hat OpenShift CLI
oc
. -
Log in as a user with
cluster-admin
privileges.
Procedure
Check the status of current node maintenance tasks, for example the
NodeMaintenance
CR ornm
object, by running the following command:$ oc get nm -o yaml
Example output
apiVersion: v1 items: - apiVersion: nodemaintenance.medik8s.io/v1beta1 kind: NodeMaintenance metadata: ... spec: nodeName: node-1.example.com reason: Node maintenance status: drainProgress: 100 1 evictionPods: 3 2 lastError: "Last failure message" 3 lastUpdate: "2022-06-23T11:43:18Z" 4 phase: Succeeded totalpods: 5 5 ...
7.4. Resuming a node from maintenance mode
You can resume a node from maintenance mode from the web console or from the CLI by using a NodeMaintenance
CR. Resuming a node brings it out of maintenance mode and makes it schedulable again.
7.4.1. Resuming a node from maintenance mode by using the web console
To resume a node from maintenance mode, you can delete a NodeMaintenance
custom resource (CR) by using the web console.
Prerequisites
-
Log in as a user with
cluster-admin
privileges. - Install the Node Maintenance Operator from the OperatorHub.
Procedure
- From the Administrator perspective in the web console, navigate to Operators → Installed Operators.
- Select the Node Maintenance Operator from the list of Operators.
-
In the Node Maintenance tab, select the
NodeMaintenance
CR that you want to delete. -
Click the Options menu
at the end of the node and select Delete NodeMaintenance.
Verification
- In the Red Hat OpenShift console, click Compute → Nodes.
-
Inspect the
Status
column of the node for which you deleted theNodeMaintenance
CR and verify that its status isReady
.
7.4.2. Resuming a node from maintenance mode by using the CLI
You can resume a node from maintenance mode that was initiated with a NodeMaintenance
CR by deleting the NodeMaintenance
CR.
Prerequisites
-
Install the Red Hat OpenShift CLI
oc
. -
Log in to the cluster as a user with
cluster-admin
privileges.
Procedure
When your node maintenance task is complete, delete the active
NodeMaintenance
CR:$ oc delete -f nodemaintenance-cr.yaml
Example output
nodemaintenance.nodemaintenance.medik8s.io "maintenance-example" deleted
Verification
Check the progress of the maintenance task by running the following command:
$ oc describe node <node-name>
where
<node-name>
is the name of your node; for example,node-1.example.com
Check the example output:
Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal NodeSchedulable 2m kubelet Node node-1.example.com status is now: NodeSchedulable
7.5. Working with bare-metal nodes
For clusters with bare-metal nodes, you can place a node into maintenance mode, and resume a node from maintenance mode, by using the web console Actions control.
Clusters with bare-metal nodes can also place a node into maintenance mode, and resume a node from maintenance mode, by using the web console and CLI, as outlined. These methods, by using the web console Actions control, are applicable to bare-metal clusters only.
7.5.1. Maintaining bare-metal nodes
When you deploy Red Hat OpenShift on bare-metal infrastructure, you must take additional considerations into account compared to deploying on cloud infrastructure. Unlike in cloud environments, where the cluster nodes are considered ephemeral, reprovisioning a bare-metal node requires significantly more time and effort for maintenance tasks.
When a bare-metal node fails due to a kernel error or a NIC card hardware failure, workloads on the failed node need to be restarted on another node in the cluster while the problem node is repaired or replaced. Node maintenance mode allows cluster administrators to gracefully turn-off nodes, move workloads to other parts of the cluster, and ensure that workloads do not get interrupted. Detailed progress and node status details are provided during maintenance.
7.5.2. Setting a bare-metal node to maintenance mode
Set a bare-metal node to maintenance mode using the Options menu
found on each node in the Compute → Nodes list, or using the Actions control of the Node Details screen.
Procedure
- From the Administrator perspective of the web console, click Compute → Nodes.
You can set the node to maintenance from this screen, which makes it easier to perform actions on multiple nodes, or from the Node Details screen, where you can view comprehensive details of the selected node:
-
Click the Options menu
at the end of the node and select Start Maintenance.
- Click the node name to open the Node Details screen and click Actions → Start Maintenance.
-
Click the Options menu
- Click Start Maintenance in the confirmation window.
The node is no longer schedulable. If it had virtual machines with the LiveMigration
eviction strategy, then it will live migrate them. All other pods and virtual machines on the node are deleted and recreated on another node.
Verification
-
Navigate to the Compute → Nodes page and verify that the corresponding node has a status of
Under maintenance
.
7.5.3. Resuming a bare-metal node from maintenance mode
Resume a bare-metal node from maintenance mode using the Options menu
found on each node in the Compute → Nodes list, or using the Actions control of the Node Details screen.
Procedure
- From the Administrator perspective of the web console, click Compute → Nodes.
You can resume the node from this screen, which makes it easier to perform actions on multiple nodes, or from the Node Details screen, where you can view comprehensive details of the selected node:
-
Click the Options menu
at the end of the node and select Stop Maintenance.
- Click the node name to open the Node Details screen and click Actions → Stop Maintenance.
-
Click the Options menu
- Click Stop Maintenance in the confirmation window.
The node becomes schedulable. If it had virtual machine instances that were running on the node prior to maintenance, then they will not automatically migrate back to this node.
Verification
-
Navigate to the Compute → Nodes page and verify that the corresponding node has a status of
Ready
.
7.6. Gathering data about the Node Maintenance Operator
To collect debugging information about the Node Maintenance Operator, use the must-gather
tool. For information about the must-gather
image for the Node Maintenance Operator, see Gathering data about specific features.