Remediation, fencing, and maintenance
Workload Availability remediation, fencing, and maintenance
Abstract
Preface Copy linkLink copied to clipboard!
Providing feedback on Workload Availability for Red Hat OpenShift documentation Copy linkLink copied to clipboard!
We appreciate your feedback on our documentation. Let us know how we can improve it. To do so:
- Go to the JIRA website.
- Enter a descriptive title in the Summary field.
- Enter your suggestion for improvement in the Description field. Include links to the relevant parts of the documentation.
- Enter your username in the Reporter field.
- Enter the affected versions in the Affects Version/s field.
- Click Create at the bottom of the dialog.
Chapter 1. About node remediation, fencing, and maintenance Copy linkLink copied to clipboard!
Hardware is imperfect and software contains bugs. When node-level failures, such as the kernel hangs or network interface controllers (NICs) fail, the work required from the cluster does not decrease, and workloads from affected nodes need to be restarted somewhere. However, some workloads, such as ReadWriteOnce (RWO) volumes and StatefulSets, might require at-most-one semantics.
Failures affecting these workloads risk data loss, corruption, or both. It is important to ensure that the node reaches a safe state, known as fencing before initiating recovery of the workload, known as remediation and ideally, recovery of the node also.
It is not always practical to depend on administrator intervention to confirm the true status of the nodes and workloads. To facilitate such intervention, Red Hat OpenShift provides multiple components for the automation of failure detection, fencing and remediation.
1.1. Self Node Remediation Copy linkLink copied to clipboard!
The Self Node Remediation Operator is a Red Hat OpenShift add-on Operator that implements an external system of fencing and remediation that reboots unhealthy nodes and deletes resources, such as Pods and VolumeAttachments. The reboot ensures that the workloads are fenced, and the resource deletion accelerates the rescheduling of affected workloads. Unlike other external systems, Self Node Remediation does not require any management interface, like, for example, Intelligent Platform Management Interface (IPMI) or an API for node provisioning.
Self Node Remediation can be used by failure detection systems, like Machine Health Check or Node Health Check.
1.2. Fence Agents Remediation Copy linkLink copied to clipboard!
The Fence Agents Remediation (FAR) Operator is a Red Hat OpenShift add-on operator that automatically remediates unhealthy nodes, similar to the Self Node Remediation Operator. You can use well-known agents to fence and remediate unhealthy nodes. The remediation includes rebooting the unhealthy node using a fence agent, and then evicting workloads from the unhealthy node, depending on the remediation strategy.
1.3. Machine Deletion Remediation Copy linkLink copied to clipboard!
The Machine Deletion Remediation (MDR) Operator is a Red Hat OpenShift add-on Operator that uses the Machine API to reprovision unhealthy nodes. MDR works with NodeHealthCheck (NHC) to create a Custom Resource (CR) for MDR with information about the unhealthy node.
MDR follows the annotation on the node to the associated machine object and confirms that it has an owning controller. MDR proceeds to delete the machine, and then the owning controller recreates a replacement machine.
1.4. Machine Health Check Copy linkLink copied to clipboard!
Machine Health Check utilizes a Red Hat OpenShift built-in failure detection, fencing and remediation system, which monitors the status of machines and the conditions of nodes. Machine Health Checks can be configured to trigger external fencing and remediation systems, like Self Node Remediation.
1.5. Node Health Check Copy linkLink copied to clipboard!
The Node Health Check Operator is a Red Hat OpenShift add-on Operator that implements a failure detection system that monitors node conditions. It does not have a built-in fencing or remediation system and so must be configured with an external system that provides these features. By default, it is configured to utilize the Self Node Remediation system.
1.6. Node Maintenance Copy linkLink copied to clipboard!
Administrators face situations where they need to interrupt the cluster, for example, replace a drive, RAM, or a NIC.
In advance of this maintenance, affected nodes should be cordoned and drained. When a node is cordoned, new workloads cannot be scheduled on that node. When a node is drained, to avoid or minimize downtime, workloads on the affected node are transferred to other nodes.
While this maintenance can be achieved using command line tools, the Node Maintenance Operator offers a declarative approach to achieve this by using a custom resource. When such a resource exists for a node, the Operator cordons and drains the node until the resource is deleted.
1.7. Flow of events during fencing and remediation Copy linkLink copied to clipboard!
When a node becomes unhealthy multiple events occur to detect, fence, and remediate the node, in order to restore workloads, and ideally the node, to health. Some events are triggered by the OpenShift cluster, and some events are reactions by the Workload Availability operators. Understanding this flow of events, and the duration between these events, is important to make informed decisions. Decisions including, which remediation provider to use, and how to configure Node Health Check Operator and the chosen remediaton provider.
The example outlined next is a common use case that outlines the phased flow of events. Only when the Node Health Check Operator has the Ready=Unknown unhealthy conditions do the phases act as follows.
1.7.1. Phase 1 - Kubernetes Health Check (Core OpenShift) Copy linkLink copied to clipboard!
The unhealthy node stops communicating with the API server. After approximately 50 seconds the API server sets the "Ready" condition of the node to "Unknown", that is, Ready=Unknown.
1.7.2. Phase 2 - Node Health Check (NHC) Copy linkLink copied to clipboard!
If the Ready=Unknown condition is present longer than the configured duration, it starts a remediation.
The user-configured duration in this phase represents the tolerance that the Operator has towards the duration of the unhealthy condition. It takes into account that while the workload is restarting as requested, the resource is expected to be "Unready".
For example:
- If you have a workload that takes a long time to restart, then you need to have a longer timeout.
- Likewise, when the workload restart is short, then the timeout needed is also short.
1.7.3. Phase 3 - Remediate Host / Remediate API (depending on the configured remediation operator) Copy linkLink copied to clipboard!
Using Machine Deletion Remediation (MDR), Self Node Remediation (SNR) or Fence Agents Remediation (FAR), the remediator fences and isolates the node by rebooting it in order to reach a safe state.
The details of this phase are configured by the user and depends on their workload requirements.
For example:
- Machine Deletion Remediation - The choice of platform influences the time it takes to reprovision the machine, and then the duration of the remediation. MDR is only applicable to clusters that use Machine API.
- Self Node Remediation - The remediation time depends on many parameters, including the safe time is takes to automatically reboot unhealthy nodes, and the watchdog devices used to ensure that the machine enters a safe state when an error condition is detected.
- Fence Agents Remediation - The fencing agent time depends on many parameters including, the cluster nodes, the management interface, and the agent parameters.
1.7.4. Phase 4 - Workload starting Copy linkLink copied to clipboard!
When MDR is used, the remediator deletes the resources. When FAR and SNR are used, varying remediation strategies are available for them to use. One strategy is OutOfServiceTaint which uses out-of-service taint to permit the deletion of resources in the cluster. In both cases, deleting the resources enables faster rescheduling of the affected workload. The workload is then rescheduled and restarted.
This phase is initiated automatically by the remediators when fencing is complete. If fencing does not complete, and an escalation remediation is required, the user must configure the timeout in seconds for the entire remediation process. If the timeout passes, and the node is still unhealthy, NHC will try the next remediator in line to remediate the unhealthy node.
1.8. About metrics for workload availability operators Copy linkLink copied to clipboard!
The addition of data analysis enhances observability for the workload availability operators. The data provides metrics about the activity of the operators, and the effect on the cluster. These metrics improve decision-making capabilities, enable data-driven optimization, and enhance overall system performance.
You can use metrics to do these tasks:
- Access comprehensive tracking data for operators, to monitor overall system efficiency.
- Access actionable insights derived from tracking data, such as identifying frequently failing nodes, or downtime due to operator’s remediations.
- Visualize how the operator’s remediations are actually enhancing the system efficiency.
1.8.1. Configuring metrics for workload availability operators Copy linkLink copied to clipboard!
You can use the Red Hat OpenShift web console to install the Node Health Check Operator.
Prerequisites
- You must first configure the monitoring stack. For more information, see Configuring the monitoring stack.
- You must enable monitoring for used-defined projects. For more information, see Enabling monitoring for used-defined projects.
Procedure
Create the
prometheus-user-tokensecret from the existingprometheus-user-workload-tokensecret as follows:existingPrometheusTokenSecret=$(kubectl get secret --namespace openshift-user-workload-monitoring | grep prometheus-user-workload-token | awk '{print $1}')1 kubectl get secret ${existingPrometheusTokenSecret} --namespace=openshift-user-workload-monitoring -o yaml | \ sed '/namespace: .*==/d;/ca.crt:/d;/serviceCa.crt/d;/creationTimestamp:/d;/resourceVersion:/d;/uid:/d;/annotations/d;/kubernetes.io/d;' | \ sed 's/namespace: .*/namespace: openshift-workload-availability/' | \2 sed 's/name: .*/name: prometheus-user-workload-token/' | \3 sed 's/type: .*/type: Opaque/' | \ > prom-token.yaml kubectl apply -f prom-token.yaml- 1
- The prometheus-user-token is required by the Metric ServiceMonitor, created in the next step.
- 2
- Ensure the new Secret’s namespace is the one where NHC Operator is installed, for example, openshift-workload-availability .
- 3
- The prometheus-user-workload-token only exists if User Worload Prometheus scraping is enabled.
Create the ServiceMonitor as follows:
apiVersion: monitoring.coreos.com/v1 kind: ServiceMonitor metadata: name: node-healthcheck-metrics-monitor namespace: openshift-workload-availability1 labels: app.kubernetes.io/component: controller-manager spec: endpoints: - interval: 30s port: https scheme: https authorization: type: Bearer credentials: name: prometheus-user-workload-token key: token tlsConfig: ca: configMap: name: nhc-serving-certs-ca-bundle key: service-ca.crt serverName: node-healthcheck-controller-manager-metrics-service.openshift-workload-availability.svc2 selector: matchLabels: app.kubernetes.io/component: controller-manager app.kubernetes.io/name: node-healthcheck-operator app.kubernetes.io/instance: metrics- 1
- Specify the namespace where you want to configure the metrics, for example,
openshift-workload-availability. - 2
- The serverName must contain the same namespace where the Operator is installed. In the example,
openshift-workload-availabilityis placed after the metrics service name and before the filetype extension.
Verification
To confirm that the configuration is successful the Observe > Targets tab in OCP Web UI shows Endpoint Up.
1.8.2. Example metrics for workload availability operators Copy linkLink copied to clipboard!
The following are example metrics from the various workload availability operators.
The metrics include information on the following indicators:
- Operator availability: Showing if and when each Operator is up and running.
- Node remediation count: Showing the number of remediations across the same node, and across all nodes.
- Node remediation duration: Showing the remediation downtime or recovery time.
- Node remediation gauge: Showing the number of ongoing remediations.
Chapter 2. Using Self Node Remediation Copy linkLink copied to clipboard!
You can use the Self Node Remediation Operator to automatically reboot unhealthy nodes. This remediation strategy minimizes downtime for stateful applications and ReadWriteOnce (RWO) volumes, and restores compute capacity if transient failures occur.
The Self Node Remediation Operator is a "Rolling Stream" Operator, meaning updates are available asynchronously of OpenShift Container Platform releases. For more information, see OpenShift Operator Life Cycles on the Red Hat Customer Portal.
2.1. About the Self Node Remediation Operator Copy linkLink copied to clipboard!
The Self Node Remediation Operator runs on the cluster nodes and reboots nodes that are identified as unhealthy. The Operator uses the MachineHealthCheck or NodeHealthCheck controller to detect the health of a node in the cluster. When a node is identified as unhealthy, the MachineHealthCheck or the NodeHealthCheck resource creates the SelfNodeRemediation custom resource (CR), which triggers the Self Node Remediation Operator.
The SelfNodeRemediation CR resembles the following YAML file:
apiVersion: self-node-remediation.medik8s.io/v1alpha1
kind: SelfNodeRemediation
metadata:
name: selfnoderemediation-sample
namespace: openshift-workload-availability
spec:
remediationStrategy: <remediation_strategy>
status:
lastError: <last_error_message>
The Self Node Remediation Operator minimizes downtime for stateful applications and restores compute capacity if transient failures occur. You can use this Operator regardless of the management interface, such as IPMI or an API to provision a node, and regardless of the cluster installation type, such as installer-provisioned infrastructure or user-provisioned infrastructure.
2.1.1. About watchdog devices Copy linkLink copied to clipboard!
Watchdog devices can be any of the following:
- Independently powered hardware devices
- Hardware devices that share power with the hosts they control
-
Virtual devices implemented in software, or
softdog
Hardware watchdog and softdog devices have electronic or software timers, respectively. These watchdog devices are used to ensure that the machine enters a safe state when an error condition is detected. The cluster is required to repeatedly reset the watchdog timer to prove that it is in a healthy state. This timer might elapse due to fault conditions, such as deadlocks, CPU starvation, and loss of network or disk access. If the timer expires, the watchdog device assumes that a fault has occurred and the device triggers a forced reset of the node.
Hardware watchdog devices are more reliable than softdog devices.
2.1.1.1. Understanding Self Node Remediation Operator behavior with watchdog devices Copy linkLink copied to clipboard!
The Self Node Remediation Operator determines the remediation strategy based on the watchdog devices that are present.
If a hardware watchdog device is configured and available, the Operator uses it for remediation. If a hardware watchdog device is not configured, the Operator enables and uses a softdog device for remediation.
If neither watchdog devices are supported, either by the system or by the configuration, the Operator remediates nodes by using software reboot.
2.2. Control plane fencing Copy linkLink copied to clipboard!
In earlier releases, you could enable Self Node Remediation and Node Health Check on worker nodes. In the event of node failure, you can now also follow remediation strategies on control plane nodes.
Self Node Remediation occurs in two primary scenarios.
API Server Connectivity
- In this scenario, the control plane node to be remediated is not isolated. It can be directly connected to the API Server, or it can be indirectly connected to the API Server through worker nodes or control-plane nodes, that are directly connected to the API Server.
-
When there is API Server Connectivity, the control plane node is remediated only if the Node Health Check Operator has created a
SelfNodeRemediationcustom resource (CR) for the node.
No API Server Connectivity
- In this scenario, the control plane node to be remediated is isolated from the API Server. The node cannot connect directly or indirectly to the API Server.
When there is no API Server Connectivity, the control plane node will be remediated as outlined with these steps:
Check the status of the control plane node with the majority of the peer worker nodes. If the majority of the peer worker nodes cannot be reached, the node will be analyzed further.
Self-diagnose the status of the control plane node
- If self diagnostics passed, no action will be taken.
- If self diagnostics failed, the node will be fenced and remediated.
-
The self diagnostics currently supported are checking the
kubeletservice status, and checking endpoint availability usingopt inconfiguration.
- If the node did not manage to communicate to most of its worker peers, check the connectivity of the control plane node with other control plane nodes. If the node can communicate with any other control plane peer, no action will be taken. Otherwise, the node will be fenced and remediated.
2.3. Installing the Self Node Remediation Operator by using the web console Copy linkLink copied to clipboard!
You can use the Red Hat OpenShift web console to install the Self Node Remediation Operator.
Prerequisites
-
Log in as a user with
cluster-adminprivileges.
Procedure
- In the Red Hat OpenShift web console, navigate to Operators → OperatorHub.
- Select the Self Node Remediation Operator from the list of available Operators, and then click Install.
-
Keep the default selection of Installation mode and namespace to ensure that the Operator is installed to the
openshift-workload-availabilitynamespace. - Click Install.
Verification
To confirm that the installation is successful:
- Navigate to the Operators → Installed Operators page.
-
Check that the Operator is installed in the
openshift-workload-availabilitynamespace and its status isSucceeded.
If the Operator is not installed successfully:
-
Navigate to the Operators → Installed Operators page and inspect the
Statuscolumn for any errors or failures. -
Navigate to the Workloads → Pods page and check the logs of the
self-node-remediation-controller-managerpod andself-node-remediation-dspods in theopenshift-workload-availabilityproject for any reported issues.
2.4. Installing the Self Node Remediation Operator by using the CLI Copy linkLink copied to clipboard!
You can use the OpenShift CLI (oc) to install the Self Node Remediation Operator.
You can install the Self Node Remediation Operator in your own namespace or in the openshift-workload-availability namespace.
Prerequisites
-
Install the OpenShift CLI (
oc). -
Log in as a user with
cluster-adminprivileges.
Procedure
Create a
Namespacecustom resource (CR) for the Self Node Remediation Operator:Define the
NamespaceCR and save the YAML file, for example,workload-availability-namespace.yaml:apiVersion: v1 kind: Namespace metadata: name: openshift-workload-availabilityTo create the
NamespaceCR, run the following command:$ oc create -f workload-availability-namespace.yaml
Create an
OperatorGroupCR:Define the
OperatorGroupCR and save the YAML file, for example,workload-availability-operator-group.yaml:apiVersion: operators.coreos.com/v1 kind: OperatorGroup metadata: name: workload-availability-operator-group namespace: openshift-workload-availabilityTo create the
OperatorGroupCR, run the following command:$ oc create -f workload-availability-operator-group.yaml
Create a
SubscriptionCR:Define the
SubscriptionCR and save the YAML file, for example,self-node-remediation-subscription.yaml:apiVersion: operators.coreos.com/v1alpha1 kind: Subscription metadata: name: self-node-remediation-operator namespace: openshift-workload-availability1 spec: channel: stable installPlanApproval: Manual2 name: self-node-remediation-operator source: redhat-operators sourceNamespace: openshift-marketplace package: self-node-remediation- 1
- Specify the
Namespacewhere you want to install the Self Node Remediation Operator. To install the Self Node Remediation Operator in theopenshift-workload-availabilitynamespace, specifyopenshift-workload-availabilityin theSubscriptionCR. - 2
- Set the approval strategy to Manual in case your specified version is superseded by a later version in the catalog. This plan prevents an automatic upgrade to a later version and requires manual approval before the starting CSV can complete the installation.
To create the
SubscriptionCR, run the following command:$ oc create -f self-node-remediation-subscription.yaml
Verify that the Self Node Remediation Operator created the
SelfNodeRemediationTemplateCR:$ oc get selfnoderemediationtemplate -n openshift-workload-availabilityExample output
self-node-remediation-automatic-strategy-template
Verification
Verify that the installation succeeded by inspecting the CSV resource:
$ oc get csv -n openshift-workload-availabilityExample output
NAME DISPLAY VERSION REPLACES PHASE self-node-remediation.v0.8.0 Self Node Remediation Operator v.0.8.0 self-node-remediation.v0.7.1 SucceededVerify that the Self Node Remediation Operator is up and running:
$ oc get deployment -n openshift-workload-availabilityExample output
NAME READY UP-TO-DATE AVAILABLE AGE self-node-remediation-controller-manager 1/1 1 1 28hVerify that the Self Node Remediation Operator created the
SelfNodeRemediationConfigCR:$ oc get selfnoderemediationconfig -n openshift-workload-availabilityExample output
NAME AGE self-node-remediation-config 28hVerify that each self node remediation pod is scheduled and running on each worker node and control plane node:
$ oc get daemonset -n openshift-workload-availabilityExample output
NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE self-node-remediation-ds 6 6 6 6 6 <none> 28h
2.5. Configuring the Self Node Remediation Operator Copy linkLink copied to clipboard!
The Self Node Remediation Operator creates the SelfNodeRemediationConfig CR and the SelfNodeRemediationTemplate Custom Resource Definition (CRD).
To avoid unexpected reboots of a specific node, the Node Maintenance Operator places the node in maintenance mode and automatically adds a node selector that prevents the SNR daemonset from running on the specific node.
2.5.1. Understanding the Self Node Remediation Operator configuration Copy linkLink copied to clipboard!
The Self Node Remediation Operator creates the SelfNodeRemediationConfig CR with the name self-node-remediation-config. The CR is created in the namespace of the Self Node Remediation Operator.
A change in the SelfNodeRemediationConfig CR re-creates the Self Node Remediation daemon set.
The SelfNodeRemediationConfig CR resembles the following YAML file:
apiVersion: self-node-remediation.medik8s.io/v1alpha1
kind: SelfNodeRemediationConfig
metadata:
name: self-node-remediation-config
namespace: openshift-workload-availability
spec:
safeTimeToAssumeNodeRebootedSeconds: 180
watchdogFilePath: /dev/watchdog
isSoftwareRebootEnabled: true
apiServerTimeout: 15s
apiCheckInterval: 5s
maxApiErrorThreshold: 3
peerApiServerTimeout: 5s
peerDialTimeout: 5s
peerRequestTimeout: 5s
peerUpdateInterval: 15m
hostPort: 30001
customDsTolerations:
- effect: NoSchedule
key: node-role.kubernetes.io.infra
operator: Equal
value: "value1"
tolerationSeconds: 3600
- 1
- Specify an optional time duration that the Operator waits before recovering affected workloads running on an unhealthy node. Starting replacement pods while they are still running on the failed node can lead to data corruption and a violation of run-once semantics. The Operator calculates a minimum duration using the values in the
ApiServerTimeout,ApiCheckInterval,MaxApiErrorThreshold,PeerDialTimeout, andPeerRequestTimeoutfields, as well as the watchdog timeout and the cluster size at the time of remediation. To check the minimum duration calculation, view the manager pod logs and find references to thecalculated minimum time in seconds.If you specify a value that is lower than the minimum duration, the Operator uses the minimum duration. However, if you want to increase the duration to a value higher than this minimum value, you can set
safeTimeToAssumeNodeRebootedSecondsto a value higher than the minimum duration. - 2
- Specify the file path of the watchdog device in the nodes. If you enter an incorrect path to the watchdog device, the Self Node Remediation Operator automatically detects the softdog device path.
If a watchdog device is unavailable, the
SelfNodeRemediationConfigCR uses a software reboot. - 3
- Specify if you want to enable software reboot of the unhealthy nodes. By default, the value of
isSoftwareRebootEnabledis set totrue. To disable the software reboot, set the parameter value tofalse. - 4
- Specify the timeout duration to check connectivity with each API server. When this duration elapses, the Operator starts remediation. The timeout duration must be greater than or equal to 10 milliseconds.
- 5
- Specify the frequency to check connectivity with each API server. The timeout duration must be greater than or equal to 1 second.
- 6
- Specify a threshold value. After reaching this threshold, the node starts contacting its peers. The threshold value must be greater than or equal to 1 second.
- 7
- Specify the duration of the timeout for the peer to connect the API server. The timeout duration must be greater than or equal to 10 milliseconds.
- 8
- Specify the duration of the timeout for establishing connection with the peer. The timeout duration must be greater than or equal to 10 milliseconds.
- 9
- Specify the duration of the timeout to get a response from the peer. The timeout duration must be greater than or equal to 10 milliseconds.
- 10
- Specify the frequency to update peer information such as IP address. The timeout duration must be greater than or equal to 10 seconds.
- 11
- Specify an optional value to change the port that Self Node Remediation agents use for internal communication. The value must be greater than 0. The default value is port 30001.
- 12
- Specify custom toleration Self Node Remediation agents that are running on the DaemonSets to support remediation for different types of nodes. You can configure the following fields:
-
effect: The effect indicates the taint effect to match. If this field is empty, all taint effects are matched. When specified, allowed values areNoSchedule,PreferNoScheduleandNoExecute. -
key: The key is the taint key that the toleration applies to. If this field is empty, all taint keys are matched. If the key is empty, theoperatorfield must beExists. This combination means to match all values and all keys. -
operator: The operator represents a key’s relationship to the value. Valid operators areExistsandEqual. The default isEqual.Existsis equivalent to a wildcard for a value, so that a pod can tolerate all taints of a particular category. -
value: The taint value the toleration matches to. If the operator isExists, the value should be empty, otherwise it is just a regular string. -
tolerationSeconds: The period of time the toleration (which must be of effect NoExecute, otherwise this field is ignored) tolerates the taint. By default, it is not set, which means tolerate the taint forever (that is, do not evict). Zero and negative values will be treated as 0 (that is evict immediately) by the system. - Custom toleration allows you to add a toleration to the Self Node Remediation agent pod. For more information, see Using tolerations to control OpenShift Logging pod placement.
-
- The Self Node Remediation Operator creates the CR by default in the deployment namespace.
-
The name for the CR must be
self-node-remediation-config. -
You can only have one
SelfNodeRemediationConfigCR. -
Deleting the
SelfNodeRemediationConfigCR disables Self Node Remediation. -
You can edit the
self-node-remediation-configCR that is created by the Self Node Remediation Operator. However, when you try to create a new CR for the Self Node Remediation Operator, the following message is displayed in the logs:
controllers.SelfNodeRemediationConfig
ignoring selfnoderemediationconfig CRs that are not named 'self-node-remediation-config'
or not in the namespace of the operator:
'openshift-workload-availability' {"selfnoderemediationconfig":
"openshift-workload-availability/selfnoderemediationconfig-copy"}
2.5.2. Understanding the Self Node Remediation Template configuration Copy linkLink copied to clipboard!
The Self Node Remediation Operator also creates the SelfNodeRemediationTemplate Custom Resource Definition (CRD). This CRD defines the remediation strategy for the nodes that is aimed to recover workloads faster. The following remediation strategies are available:
Automatic-
This remediation strategy simplifies the remediation process by letting the Self Node Remediation Operator decide on the most suitable remediation strategy for the cluster. This strategy checks if the
OutOfServiceTaintstrategy is available on the cluster. If theOutOfServiceTaintstrategy is available, the Operator selects theOutOfServiceTaintstrategy. If theOutOfServiceTaintstrategy is not available, the Operator selects theResourceDeletionstrategy.Automaticis the default remediation strategy. ResourceDeletion- This remediation strategy removes the pods on the node, rather than the removal of the node object.
OutOfServiceTaint-
This remediation strategy implicitly causes the removal of the pods and associated volume attachments on the node, rather than the removal of the node object. It achieves this by placing the
OutOfServiceTaintstrategy on the node. This strategy has been supported on technology preview since OpenShift Container Platform version 4.13, and on general availability since OpenShift Container Platform version 4.15.
The Self Node Remediation Operator creates the SelfNodeRemediationTemplate CR for the strategy self-node-remediation-automatic-strategy-template, which the Automatic remediation strategy uses.
The SelfNodeRemediationTemplate CR resembles the following YAML file:
apiVersion: self-node-remediation.medik8s.io/v1alpha1
kind: SelfNodeRemediationTemplate
metadata:
creationTimestamp: "2022-03-02T08:02:40Z"
name: self-node-remediation-<remediation_object>-deletion-template
namespace: openshift-workload-availability
spec:
template:
spec:
remediationStrategy: <remediation_strategy>
2.5.3. Troubleshooting the Self Node Remediation Operator Copy linkLink copied to clipboard!
2.5.3.1. General troubleshooting Copy linkLink copied to clipboard!
- Issue
- You want to troubleshoot issues with the Self Node Remediation Operator.
- Resolution
- Check the Operator logs.
2.5.3.2. Checking the daemon set Copy linkLink copied to clipboard!
- Issue
- The Self Node Remediation Operator is installed but the daemon set is not available.
- Resolution
- Check the Operator logs for errors or warnings.
2.5.3.3. Unsuccessful remediation Copy linkLink copied to clipboard!
- Issue
- An unhealthy node was not remediated.
- Resolution
Verify that the
SelfNodeRemediationCR was created by running the following command:$ oc get snr -AIf the
MachineHealthCheckcontroller did not create theSelfNodeRemediationCR when the node turned unhealthy, check the logs of theMachineHealthCheckcontroller. Additionally, ensure that theMachineHealthCheckCR includes the required specification to use the remediation template.If the
SelfNodeRemediationCR was created, ensure that its name matches the unhealthy node or the machine object.
2.5.3.4. Daemon set and other Self Node Remediation Operator resources exist even after uninstalling the Operator Copy linkLink copied to clipboard!
- Issue
- The Self Node Remediation Operator resources, such as the daemon set, configuration CR, and the remediation template CR, exist even after after uninstalling the Operator.
- Resolution
To remove the Self Node Remediation Operator resources, delete the resources by running the following commands for each resource type:
$ oc delete ds <self-node-remediation-ds> -n <namespace>$ oc delete snrc <self-node-remediation-config> -n <namespace>$ oc delete snrt <self-node-remediation-template> -n <namespace>
2.5.4. Gathering data about the Self Node Remediation Operator Copy linkLink copied to clipboard!
To collect debugging information about the Self Node Remediation Operator, use the must-gather tool. For information about the must-gather image for the Self Node Remediation Operator, see Gathering data about specific features.
2.5.5. Additional resources Copy linkLink copied to clipboard!
Chapter 3. Using Fence Agents Remediation Copy linkLink copied to clipboard!
You can use the Fence Agents Remediation Operator to automatically remediate unhealthy nodes, similar to the Self Node Remediation Operator. FAR is designed to run an existing set of upstream fencing agents on environments with a traditional API end-point, for example, IPMI, for power cycling cluster nodes, while their pods are quickly evicted based on the remediation strategy.
The Fence Agents Remediation Operator is a "Rolling Stream" Operator, meaning updates are available asynchronously of OpenShift Container Platform releases. For more information, see OpenShift Operator Life Cycles on the Red Hat Customer Portal.
3.1. About the Fence Agents Remediation Operator Copy linkLink copied to clipboard!
The Fence Agents Remediation (FAR) Operator uses external tools to fence unhealthy nodes. These tools are a set of fence agents, where each fence agent can be used for different environments to fence a node, and using a traditional Application Programming Interface (API) call that reboots a node. By doing so, FAR can minimize downtime for stateful applications, restores compute capacity if transient failures occur, and increases the availability of workloads.
FAR not only fences a node when it becomes unhealthy, it also tries to remediate the node from being unhealthy to healthy. It adds a taint to evict stateless pods, fences the node with a fence agent, and after a reboot, it completes the remediation with resource deletion to remove any remaining workloads (mostly stateful workloads). Adding the taint and deleting the workloads accelerates the workload rescheduling.
The Operator watches for new or deleted custom resources (CRs) called FenceAgentsRemediation which trigger a fence agent to remediate a node, based on the CR’s name. FAR uses the NodeHealthCheck controller to detect the health of a node in the cluster. When a node is identified as unhealthy, the NodeHealthCheck resource creates the FenceAgentsRemediation CR, based on the FenceAgentsRemediationTemplate CR, which then triggers the Fence Agents Remediation Operator.
FAR uses a fence agent to fence a Kubernetes node. Generally, fencing is the process of taking unresponsive/unhealthy computers into a safe state, and isolating the computer. Fence agent is a software code that uses a management interface to perform fencing, mostly power-based fencing which enables power-cycling, reset, or turning off the computer. An example fence agent is fence_ipmilan which is used for Intelligent Platform Management Interface (IPMI) environments.
apiVersion: fence-agents-remediation.medik8s.io/v1alpha1
kind: FenceAgentsRemediation
metadata:
name: node-name
namespace: openshift-workload-availability
spec:
remediationStrategy: <remediation_strategy>
- 1
- The node-name should match the name of the unhealthy cluster node.
- 2
- Specifies the remediation strategy for the nodes. For more information on the remediation strategies available, see the Understanding the Fence Agents Remediation Template configuration topic.
The Operator includes a set of fence agents, that are also available in the Red Hat High Availability Add-On, which use a management interface, such as IPMI or an API, to provision/reboot a node for bare metal servers, virtual machines, and cloud platforms.
3.2. Installing the Fence Agents Remediation Operator by using the web console Copy linkLink copied to clipboard!
You can use the Red Hat OpenShift web console to install the Fence Agents Remediation Operator.
Prerequisites
-
Log in as a user with
cluster-adminprivileges.
Procedure
- In the Red Hat OpenShift web console, navigate to Operators → OperatorHub.
- Select the Fence Agents Remediation Operator, or FAR, from the list of available Operators, and then click Install.
-
Keep the default selection of Installation mode and namespace to ensure that the Operator is installed to the
openshift-workload-availabilitynamespace. - Click Install.
Verification
To confirm that the installation is successful:
- Navigate to the Operators → Installed Operators page.
-
Check that the Operator is installed in the
openshift-workload-availabilitynamespace and its status isSucceeded.
If the Operator is not installed successfully:
- Navigate to the Operators → Installed Operators page and inspect the Status column for any errors or failures.
-
Navigate to the Workloads → Pods page and check the log of the
fence-agents-remediation-controller-managerpod for any reported issues.
3.3. Installing the Fence Agents Remediation Operator by using the CLI Copy linkLink copied to clipboard!
You can use the OpenShift CLI (oc) to install the Fence Agents Remediation Operator.
You can install the Fence Agents Remediation Operator in your own namespace or in the openshift-workload-availability namespace.
Prerequisites
-
Install the OpenShift CLI (
oc). -
Log in as a user with
cluster-adminprivileges.
Procedure
Create a
Namespacecustom resource (CR) for the Fence Agents Remediation Operator:Define the
NamespaceCR and save the YAML file, for example,workload-availability-namespace.yaml:apiVersion: v1 kind: Namespace metadata: name: openshift-workload-availabilityTo create the
NamespaceCR, run the following command:$ oc create -f workload-availability-namespace.yaml
Create an
OperatorGroupCR:Define the
OperatorGroupCR and save the YAML file, for example,workload-availability-operator-group.yaml:apiVersion: operators.coreos.com/v1 kind: OperatorGroup metadata: name: workload-availability-operator-group namespace: openshift-workload-availabilityTo create the
OperatorGroupCR, run the following command:$ oc create -f workload-availability-operator-group.yaml
Create a
SubscriptionCR:Define the
SubscriptionCR and save the YAML file, for example,fence-agents-remediation-subscription.yaml:apiVersion: operators.coreos.com/v1alpha1 kind: Subscription metadata: name: fence-agents-remediation-subscription namespace: openshift-workload-availability1 spec: channel: stable name: fence-agents-remediation source: redhat-operators sourceNamespace: openshift-marketplace package: fence-agents-remediation- 1
- Specify the
Namespacewhere you want to install the Fence Agents Remediation Operator, for example, theopenshift-workload-availabilityoutlined earlier in this procedure. You can install theSubscriptionCR for the Fence Agents Remediation Operator in theopenshift-workload-availabilitynamespace where there is already a matchingOperatorGroupCR.
To create the
SubscriptionCR, run the following command:$ oc create -f fence-agents-remediation-subscription.yaml
Verification
Verify that the installation succeeded by inspecting the CSV resource:
$ oc get csv -n openshift-workload-availabilityExample output
NAME DISPLAY VERSION REPLACES PHASE fence-agents-remediation.v0.3.0 Fence Agents Remediation Operator 0.3.0 fence-agents-remediation.v0.2.1 SucceededVerify that the Fence Agents Remediation Operator is up and running:
$ oc get deployment -n openshift-workload-availabilityExample output
NAME READY UP-TO-DATE AVAILABLE AGE fence-agents-remediation-controller-manager 2/2 2 2 110m
3.4. Configuring the Fence Agents Remediation Operator Copy linkLink copied to clipboard!
You can use the Fence Agents Remediation Operator to create the FenceAgentsRemediationTemplate Custom Resource (CR), which is used by the Node Health Check Operator (NHC). This CR defines the fence agent to be used in the cluster with all the required parameters for remediating the nodes. There may be many FenceAgentsRemediationTemplate CRs, at most one for each fence agent, and when NHC is being used it can choose the FenceAgentsRemediationTemplate as the remediationTemplate to be used for power-cycling the node.
The FenceAgentsRemediationTemplate CR resembles the following YAML file:
apiVersion: fence-agents-remediation.medik8s.io/v1alpha1
kind: FenceAgentsRemediationTemplate
metadata:
name: fence-agents-remediation-template-fence-ipmilan
namespace: openshift-workload-availability
spec:
template:
spec:
agent: fence_ipmilan
nodeparameters:
--ipport:
master-0-0: '6230'
master-0-1: '6231'
master-0-2: '6232'
worker-0-0: '6233'
worker-0-1: '6234'
worker-0-2: '6235'
sharedparameters:
'--action': reboot
'--ip': 192.168.123.1
'--lanplus': ''
'--password': password
'--username': admin
retryCount: '5'
retryInterval: '5s'
timeout: '60s'
- 1
- Displays the name of the fence agent to be executed, for example,
fence_ipmilan. - 2
- Displays the node-specific parameters for executing the fence agent, for example,
ipport. - 3
- Displays the cluster-wide parameters for executing the fence agent, for example,
username. - 4
- Displays the number of times to retry the fence agent command in case of failure. The default number of attempts is 5.
- 5
- Displays the interval between retries in seconds. The default is 5 seconds.
- 6
- Displays the timeout for the fence agent command. The default is 60 seconds. For values of 60 seconds or greater, the timeout value is expressed in both minutes and seconds in the YAML file.
3.4.1. Understanding the Fence Agents Remediation Template configuration Copy linkLink copied to clipboard!
The Fence Agents Remediation Operator also creates the FenceAgentsRemediationTemplate Custom Resource Definition (CRD). This CRD defines the remediation strategy for the nodes that is aimed to recover workloads faster. The following remediation strategies are available:
ResourceDeletion- This remediation strategy removes the pods on the node.
OutOfServiceTaint-
This remediation strategy implicitly causes the removal of the pods and associated volume attachments on the node. It achieves this by placing the
OutOfServiceTainttaint on the node. TheOutOfServiceTaintstrategy also represents a non-graceful node shutdown. A non-graceful node shutdown occurs when a node is shut down and not detected, instead of triggering an in-operating system shutdown. This strategy has been supported on technology preview since OpenShift Container Platform version 4.13, and on general availability since OpenShift Container Platform version 4.15.
The FenceAgentsRemediationTemplate CR resembles the following YAML file:
apiVersion: fence-agents-remediation.medik8s.io/v1alpha1
kind: FenceAgentsRemediationTemplate
metadata:
name: fence-agents-remediation-<remediation_object>-deletion-template
namespace: openshift-workload-availability
spec:
template:
spec:
remediationStrategy: <remediation_strategy>
- 1
- Specifies the type of remediation template based on the remediation strategy. Replace
<remediation_object>with eitherresourceortaint; for example,fence-agents-remediation-resource-deletion-template. - 2
- Specifies the remediation strategy. The remediation strategy can either be
ResourceDeletionorOutOfServiceTaint.
3.5. Troubleshooting the Fence Agents Remediation Operator Copy linkLink copied to clipboard!
3.5.1. General troubleshooting Copy linkLink copied to clipboard!
- Issue
- You want to troubleshoot issues with the Fence Agents Remediation Operator.
- Resolution
Check the Operator logs.
$ oc logs <fence-agents-remediation-controller-manager-name> -c manager -n <namespace-name>
3.5.2. Unsuccessful remediation Copy linkLink copied to clipboard!
- Issue
- An unhealthy node was not remediated.
- Resolution
Verify that the
FenceAgentsRemediationCR was created by running the following command:$ oc get far -AIf the
NodeHealthCheckcontroller did not create theFenceAgentsRemediationCR when the node turned unhealthy, check the logs of theNodeHealthCheckcontroller. Additionally, ensure that theNodeHealthCheckCR includes the required specification to use the remediation template.If the
FenceAgentsRemediationCR was created, ensure that its name matches the unhealthy node object.
3.5.3. Fence Agents Remediation Operator resources exist after uninstalling the Operator Copy linkLink copied to clipboard!
- Issue
- The Fence Agents Remediation Operator resources, such as the remediation CR and the remediation template CR, exist after uninstalling the Operator.
- Resolution
To remove the Fence Agents Remediation Operator resources, you can delete the resources by selecting the "Delete all operand instances for this operator" checkbox before uninstalling. This checkbox feature is only available in Red Hat OpenShift since version 4.13. For all versions of Red Hat OpenShift, you can delete the resources by running the following relevant command for each resource type:
$ oc delete far <fence-agents-remediation> -n <namespace>$ oc delete fartemplate <fence-agents-remediation-template> -n <namespace>The remediation CR
farmust be created and deleted by the same entity, for example, NHC. If the remediation CRfaris still present, it is deleted, together with the FAR operator.The remediation template CR
fartemplateonly exists if you use FAR with NHC. When the FAR operator is deleted using the web console, the remediation template CRfartemplateis also deleted.
3.6. Gathering data about the Fence Agents Remediation Operator Copy linkLink copied to clipboard!
To collect debugging information about the Fence Agents Remediation Operator, use the must-gather tool. For information about the must-gather image for the Fence Agents Remediation Operator, see Gathering data about specific features.
3.7. Agents supported by the Fence Agents Remediation Operator Copy linkLink copied to clipboard!
This section describes the agents currently supported by the Fence Agents Remediation Operator.
Most of the supported agents can be grouped by the node’s hardware proprietary and usage, as follows:
- BareMetal
- Virtualization
- Intel
- HP
- IBM
- VMware
- Cisco
- APC
- Dell
- Other
| Agent | Description |
|---|---|
|
| An I/O Fencing agent that can be used with Out-of-Band controllers that support Redfish APIs. |
|
| An I/O Fencing agent that can be used with machines controlled by IPMI. |
[a]
This description also applies for the agents fence_ilo3, fence_ilo4, fence_ilo5, fence_imm, fence_idrac, and fence_ipmilanplus.
| |
| Agent | Description |
|---|---|
|
| An I/O Fencing agent that can be used with RHEV-M REST API to fence virtual machines. |
| Agent | Description |
|---|---|
|
| An I/O Fencing agent that can be used with Intel AMT (WS). |
|
| An I/O Fencing agent that can be used with Intel Modular device (tested on Intel MFSYS25, should also work with MFSYS35). |
| Agent | Description |
|---|---|
|
| An I/O Fencing agent that can be used for HP servers with the Integrated Light Out (iLO) PCI card. |
[a]
This description also applies for the agent fence_ilo2.
| |
| Agent | Description |
|---|---|
|
| An I/O Fencing agent that can be used with IBM BladeCenter chassis. |
|
| An I/O Fencing agent that can be used with the IBM iPDU network power switch. |
| Agent | Description |
|---|---|
|
| An I/O Fencing agent that can be used with VMware API to fence virtual machines. |
|
| An I/O Fencing agent that can be used with the virtual machines managed by VMWare products that have SOAP API v4.1+. |
| Agent | Description |
|---|---|
|
| An I/O Fencing agent that can be used with any Cisco MDS 9000 series with SNMP enabled device. |
|
| An I/O Fencing agent that can be used with Cisco UCS to fence machines. |
| Agent | Description |
|---|---|
|
| An I/O Fencing agent that can be used with the APC network power switch or Tripplite PDU devices. |
[a]
This description also applies for the fence_tripplite_snmp agent.
| |
| Agent | Description |
|---|---|
|
| A resource that can be used to tell Nova that compute nodes are down and to reschedule flagged instances. |
|
| An I/O Fencing agent that can be used with the Eaton network power switch. |
|
| An I/O Fencing agent that can be used with MPX and MPH2 managed rack PDU. |
|
| An I/O Fencing agent that can be used with the ePowerSwitch 8M+ power switch to fence connected machines. |
|
| A resource that can be used to reschedule flagged instances. |
|
| A resource that can be used with ping-heuristics to control execution of another fence agent on the same fencing level. |
|
| An I/O Fencing agent that can be used with any SNMP IF-MIB capable device. |
|
| An I/O Fencing agent that can be used with the kdump crash recovery service. |
|
| An I/O Fencing agent that can be used with SCSI-3 persistent reservations to control access multipath devices. |
|
| An I/O Fencing agent that can be used in environments where sbd can be used (shared storage). |
|
| An I/O Fencing agent that can be used with SCSI-3 persistent reservations to control access to shared storage devices. |
[a]
This description also applies for the fence_eps agent.
| |
3.8. Additional resources Copy linkLink copied to clipboard!
Chapter 4. Using Machine Deletion Remediation Copy linkLink copied to clipboard!
You can use the Machine Deletion Remediation Operator to reprovision unhealthy nodes using the Machine API. You can use the Machine Deletion Remediation Operator in conjunction with the Node Health Check Operator.
The Machine Deletion Remediation Operator is a "Rolling Stream" Operator, meaning updates are available asynchronously of OpenShift Container Platform releases. For more information, see OpenShift Operator Life Cycles on the Red Hat Customer Portal.
4.1. About the Machine Deletion Remediation Operator Copy linkLink copied to clipboard!
The Machine Deletion Remediation (MDR) operator works with the NodeHealthCheck controller, to reprovision unhealthy nodes using the Machine API. MDR follows the annotation on the node to the associated machine object, confirms that it has an owning controller (for example, MachineSetController), and deletes it. Once the machine CR is deleted, the owning controller creates a replacement.
The prerequisites for MDR include:
- a Machine API-based cluster that is able to programmatically destroy and create cluster nodes,
- nodes that are associated with machines, and
- declaratively managed machines.
You can then modify the NodeHealthCheck CR to use MDR as its remediator. An example MDR template object and NodeHealthCheck configuration are provided in the documentation.
The MDR process works as follows:
- the Node Health Check Operator detects an unhealthy node and creates a MDR CR.
- the MDR Operator watches for the MDR CR associated with the unhealthy node and deletes it, if the machine has an owning controller.
-
when the node is healthy again, the MDR CR is deleted by the
NodeHealthCheckcontroller.
4.2. Installing the Machine Deletion Remediation Operator by using the web console Copy linkLink copied to clipboard!
You can use the Red Hat OpenShift web console to install the Machine Deletion Remediation Operator.
Prerequisites
-
Log in as a user with
cluster-adminprivileges.
Procedure
- In the Red Hat OpenShift web console, navigate to Operators → OperatorHub.
- Select the Machine Deletion Remediation Operator, or MDR, from the list of available Operators, and then click Install.
-
Keep the default selection of Installation mode and namespace to ensure that the Operator is installed to the
openshift-workload-availabilitynamespace. - Click Install.
Verification
To confirm that the installation is successful:
- Navigate to the Operators → Installed Operators page.
-
Check that the Operator is installed in the
openshift-workload-availabilitynamespace and its status isSucceeded.
If the Operator is not installed successfully:
- Navigate to the Operators → Installed Operators page and inspect the Status column for any errors or failures.
-
Navigate to the Workloads → Pods page and check the log of the pod in the
openshift-workload-availabilityproject for any reported issues.
4.3. Installing the Machine Deletion Remediation Operator by using the CLI Copy linkLink copied to clipboard!
You can use the OpenShift CLI (oc) to install the Machine Deletion Remediation Operator.
You can install the Machine Deletion Remediation Operator in your own namespace or in the openshift-workload-availability namespace.
Prerequisites
-
Install the OpenShift CLI (
oc). -
Log in as a user with
cluster-adminprivileges.
Procedure
Create a
Namespacecustom resource (CR) for the Machine Deletion Remediation Operator:Define the
NamespaceCR and save the YAML file, for example,workload-availability-namespace.yaml:apiVersion: v1 kind: Namespace metadata: name: openshift-workload-availabilityTo create the
NamespaceCR, run the following command:$ oc create -f workload-availability-namespace.yaml
Create an
OperatorGroupCR:Define the
OperatorGroupCR and save the YAML file, for example,workload-availability-operator-group.yaml:apiVersion: operators.coreos.com/v1 kind: OperatorGroup metadata: name: workload-availability-operator-group namespace: openshift-workload-availabilityTo create the
OperatorGroupCR, run the following command:$ oc create -f workload-availability-operator-group.yaml
Create a
SubscriptionCR:Define the
SubscriptionCR and save the YAML file, for example,machine-deletion-remediation-subscription.yaml:apiVersion: operators.coreos.com/v1alpha1 kind: Subscription metadata: name: machine-deletion-remediation-operator namespace: openshift-workload-availability1 spec: channel: stable name: machine-deletion-remediation-operator source: redhat-operators sourceNamespace: openshift-marketplace package: machine-deletion-remediation- 1
- Specify the
Namespacewhere you want to install the Machine Deletion Remediation Operator. When installing the Machine Deletion Remediation Operator in theopenshift-workload-availabilitySubscriptionCR, theNamespaceandOperatorGroupCRs will already exist.
To create the
SubscriptionCR, run the following command:$ oc create -f machine-deletion-remediation-subscription.yaml
Verification
Verify that the installation succeeded by inspecting the CSV resource:
$ oc get csv -n openshift-workload-availabilityExample output
NAME DISPLAY VERSION REPLACES PHASE machine-deletion-remediation.v0.3.0 Machine Deletion Remediation Operator 0.3.0 machine-deletion-remediation.v0.2.1 Succeeded
4.4. Configuring the Machine Deletion Remediation Operator Copy linkLink copied to clipboard!
You can use the Machine Deletion Remediation Operator, with the Node Health Check Operator, to create the MachineDeletionRemediationTemplate Custom Resource (CR). This CR defines the remediation strategy for the nodes.
The MachineDeletionRemediationTemplate CR resembles the following YAML file:
apiVersion: machine-deletion-remediation.medik8s.io/v1alpha1
kind: MachineDeletionRemediationTemplate
metadata:
name: machinedeletionremediationtemplate-sample
namespace: openshift-workload-availability
spec:
template:
spec: {}
4.5. Troubleshooting the Machine Deletion Remediation Operator Copy linkLink copied to clipboard!
4.5.1. General troubleshooting Copy linkLink copied to clipboard!
- Issue
- You want to troubleshoot issues with the Machine Deletion Remediation Operator.
- Resolution
Check the Operator logs.
$ oc logs <machine-deletion-remediation-controller-manager-name> -c manager -n <namespace-name>
4.5.2. Unsuccessful remediation Copy linkLink copied to clipboard!
- Issue
- An unhealthy node was not remediated.
- Resolution
Verify that the
MachineDeletionRemediationCR was created by running the following command:$ oc get mdr -AIf the
NodeHealthCheckcontroller did not create theMachineDeletionRemediationCR when the node turned unhealthy, check the logs of theNodeHealthCheckcontroller. Additionally, ensure that theNodeHealthCheckCR includes the required specification to use the remediation template.If the
MachineDeletionRemediationCR was created, ensure that its name matches the unhealthy node object.
4.5.3. Machine Deletion Remediation Operator resources exist even after uninstalling the Operator Copy linkLink copied to clipboard!
- Issue
- The Machine Deletion Remediation Operator resources, such as the remediation CR and the remediation template CR, exist even after uninstalling the Operator.
- Resolution
To remove the Machine Deletion Remediation Operator resources, you can delete the resources by selecting the Delete all operand instances for this operator checkbox before uninstalling. This checkbox feature is only available in Red Hat OpenShift since version 4.13. For all versions of Red Hat OpenShift, you can delete the resources by running the following relevant command for each resource type:
$ oc delete mdr <machine-deletion-remediation> -n <namespace>$ oc delete mdrt <machine-deletion-remediation-template> -n <namespace>The remediation CR
mdrmust be created and deleted by the same entity, for example, NHC. If the remediation CRmdris still present, it is deleted, together with the MDR operator.The remediation template CR
mdrtonly exists if you use MDR with NHC. When the MDR operator is deleted using the web console, the remediation template CRmdrtis also deleted.
4.6. Gathering data about the Machine Deletion Remediation Operator Copy linkLink copied to clipboard!
To collect debugging information about the Machine Deletion Remediation Operator, use the must-gather tool. For information about the must-gather image for the Machine Deletion Remediation Operator, see Gathering data about specific features.
4.7. Additional resources Copy linkLink copied to clipboard!
Chapter 5. Remediating nodes with Machine Health Checks Copy linkLink copied to clipboard!
Machine health checks automatically repair unhealthy machines in a particular machine pool.
5.1. About machine health checks Copy linkLink copied to clipboard!
You can only apply a machine health check to control plane machines on clusters that use control plane machine sets.
To monitor machine health, create a resource to define the configuration for a controller. Set a condition to check, such as staying in the NotReady status for five minutes or displaying a permanent condition in the node-problem-detector, and a label for the set of machines to monitor.
The controller that observes a MachineHealthCheck resource checks for the defined condition. If a machine fails the health check, the machine is automatically deleted and one is created to take its place. When a machine is deleted, you see a machine deleted event.
To limit disruptive impact of the machine deletion, the controller drains and deletes only one node at a time. If there are more unhealthy machines than the maxUnhealthy threshold allows for in the targeted pool of machines, remediation stops and therefore enables manual intervention.
Consider the timeouts carefully, accounting for workloads and requirements.
- Long timeouts can result in long periods of downtime for the workload on the unhealthy machine.
-
Too short timeouts can result in a remediation loop. For example, the timeout for checking the
NotReadystatus must be long enough to allow the machine to complete the startup process.
To stop the check, remove the resource.
5.1.1. Limitations when deploying machine health checks Copy linkLink copied to clipboard!
There are limitations to consider before deploying a machine health check:
- Only machines owned by a machine set are remediated by a machine health check.
- If the node for a machine is removed from the cluster, a machine health check considers the machine to be unhealthy and remediates it immediately.
-
If the corresponding node for a machine does not join the cluster after the
nodeStartupTimeout, the machine is remediated. -
A machine is remediated immediately if the
Machineresource phase isFailed.
5.2. Configuring machine health checks to use the Self Node Remediation Operator Copy linkLink copied to clipboard!
Use the following procedure to configure the worker or control-plane machine health checks to use the Self Node Remediation Operator as a remediation provider.
To use the Self Node Remediation Operator as a remediation provider for machine health checks, a machine must have an associated node in the cluster.
Prerequisites
-
Install the OpenShift CLI (
oc). -
Log in as a user with
cluster-adminprivileges.
Procedure
Create a
SelfNodeRemediationTemplateCR:Define the
SelfNodeRemediationTemplateCR:apiVersion: self-node-remediation.medik8s.io/v1alpha1 kind: SelfNodeRemediationTemplate metadata: namespace: openshift-machine-api name: selfnoderemediationtemplate-sample spec: template: spec: remediationStrategy: Automatic1 - 1
- Specifies the remediation strategy. The default remediation strategy is
Automatic.
To create the
SelfNodeRemediationTemplateCR, run the following command:$ oc create -f <snrt-name>.yaml
Create or update the
MachineHealthCheckCR to point to theSelfNodeRemediationTemplateCR:Define or update the
MachineHealthCheckCR:apiVersion: machine.openshift.io/v1beta1 kind: MachineHealthCheck metadata: name: machine-health-check namespace: openshift-machine-api spec: selector: matchLabels:1 machine.openshift.io/cluster-api-machine-role: "worker" machine.openshift.io/cluster-api-machine-type: "worker" unhealthyConditions: - type: "Ready" timeout: "300s" status: "False" - type: "Ready" timeout: "300s" status: "Unknown" maxUnhealthy: "40%" nodeStartupTimeout: "10m" remediationTemplate:2 kind: SelfNodeRemediationTemplate apiVersion: self-node-remediation.medik8s.io/v1alpha1 name: selfnoderemediationtemplate-sampleTo create a
MachineHealthCheckCR, run the following command:$ oc create -f <mhc-name>.yamlTo update a
MachineHealthCheckCR, run the following command:$ oc apply -f <mhc-name>.yaml
Chapter 6. Remediating nodes with Node Health Checks Copy linkLink copied to clipboard!
You can use the Node Health Check Operator to identify unhealthy nodes. The Operator then uses other remediation providers to remediate the unhealthy nodes.
The Node Health Check Operator can be used with the other remediation providers, including:
- The Self Node Remediation Operator.
- The Fence Agents Remediation Operator.
- The Machine Deletion Remediation Operator.
Due to the existence of preinstalled machine health checks on Red Hat OpenShift Service on AWS (ROSA) clusters, the Node Health Check Operator is unable to function in such an environment.
The Node Health Check Operator is a "Rolling Stream" Operator, meaning updates are available asynchronously of OpenShift Container Platform releases. For more information, see OpenShift Operator Life Cycles on the Red Hat Customer Portal.
6.1. About the Node Health Check Operator Copy linkLink copied to clipboard!
The Node Health Check Operator detects the health of the nodes in a cluster. The NodeHealthCheck controller creates the NodeHealthCheck custom resource (CR), which defines a set of criteria and thresholds to determine the health of a node.
When the Node Health Check Operator detects an unhealthy node, it creates a remediation CR that triggers the remediation provider. For example, the controller creates the SelfNodeRemediation CR, which triggers the Self Node Remediation Operator to remediate the unhealthy node.
The NodeHealthCheck CR resembles the following YAML file, with self-node-remediation as the remediation provider:
apiVersion: remediation.medik8s.io/v1alpha1
kind: NodeHealthCheck
metadata:
name: nodehealthcheck-sample
spec:
minHealthy: 51%
pauseRequests:
- <pause-test-cluster>
remediationTemplate:
apiVersion: self-node-remediation.medik8s.io/v1alpha1
name: self-node-remediation-resource-deletion-template
namespace: openshift-workload-availability
kind: SelfNodeRemediationTemplate
escalatingRemediations:
- remediationTemplate:
apiVersion: self-node-remediation.medik8s.io/v1alpha1
name: self-node-remediation-resource-deletion-template
namespace: openshift-workload-availability
kind: SelfNodeRemediationTemplate
order: 1
timeout: 300s
selector:
matchExpressions:
- key: node-role.kubernetes.io/worker
operator: Exists
unhealthyConditions:
- type: Ready
status: "False"
duration: 300s
- type: Ready
status: Unknown
duration: 300s
- 1
- Specifies the amount of healthy nodes(in percentage or number) required for a remediation provider to concurrently remediate nodes in the targeted pool. If the number of healthy nodes equals to or exceeds the limit set by
minHealthy, remediation occurs. The default value is 51%. - 2
- Prevents any new remediation from starting, while allowing any ongoing remediations to persist. The default value is empty. However, you can enter an array of strings that identify the cause of pausing the remediation. For example,
pause-test-cluster.NoteDuring the upgrade process, nodes in the cluster might become temporarily unavailable and get identified as unhealthy. In the case of worker nodes, when the Operator detects that the cluster is upgrading, it stops remediating new unhealthy nodes to prevent such nodes from rebooting.
- 3
- Specifies a remediation template from the remediation provider. For example, from the Self Node Remediation Operator.
remediationTemplateis mutually exclusive withescalatingRemediations. - 4
- Specifies a list of
RemediationTemplateswith order and timeout fields. To obtain a healthy node, use this field to sequence and configure multiple remediations. This strategy increases the likelihood of obtaining a healthy node, instead of depending on a single remediation that might not be successful. Theorderfield determines the order in which the remediations are invoked (lower order = earlier invocation). Thetimeoutfield determines when the next remediation is invoked.escalatingRemediationsis mutually exclusive withremediationTemplate.NoteWhen
escalatingRemediationsis used the remediation providers, Self Node Remediation Operator and Fence Agents Remediation Operator, can be used multiple times with differentremediationTemplateconfigurations. However, you can not use the same Machine Deletion Remediation configuration with differentremediationTemplateconfigurations. - 5
- Specifies a selector that matches labels or expressions that you want to check. Avoid selecting both control-plane and worker nodes in one CR.
- 6
- Specifies a list of the conditions that determine whether a node is considered unhealthy.
- 7 8
- Specifies the timeout duration for a node condition. If a condition is met for the duration of the timeout, the node will be remediated. Long timeouts can result in long periods of downtime for a workload on an unhealthy node.
The NodeHealthCheck CR resembles the following YAML file, with metal3 as the remediation provider:
apiVersion: remediation.medik8s.io/v1alpha1
kind: NodeHealthCheck
metadata:
name: nhc-worker-metal3
spec:
minHealthy: 30%
remediationTemplate:
apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
kind: Metal3RemediationTemplate
name: metal3-remediation
namespace: openshift-machine-api
selector:
matchExpressions:
- key: node-role.kubernetes.io/worker
operator: Exists
unhealthyConditions:
- duration: 300s
status: 'False'
type: Ready
- duration: 300s
status: 'Unknown'
type: Ready
The matchExpressions are examples only; you must map your machine groups based on your specific needs.
The Metal3RemediationTemplate resembles the following YAML file, with metal3 as the remediation provider:
apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
kind: Metal3RemediationTemplate
metadata:
name: metal3-remediation
namespace: openshift-machine-api
spec:
template:
spec:
strategy:
retryLimit: 1
timeout: 5m0s
type: Reboot
In addition to creating a NodeHealthCheck CR, you must also create the Metal3RemediationTemplate.
6.1.1. Understanding the Node Health Check Operator workflow Copy linkLink copied to clipboard!
When a node is identified as unhealthy, the Node Health Check Operator checks how many other nodes are unhealthy. If the number of healthy nodes exceeds the amount that is specified in the minHealthy field of the NodeHealthCheck CR, the controller creates a remediation CR from the details that are provided in the external remediation template by the remediation provider. After remediation, the kubelet updates the node’s health status.
When the node turns healthy, the controller deletes the external remediation template.
6.1.2. About how node health checks prevent conflicts with machine health checks Copy linkLink copied to clipboard!
When both, node health checks and machine health checks are deployed, the node health check avoids conflict with the machine health check.
Red Hat OpenShift deploys machine-api-termination-handler as the default MachineHealthCheck resource.
The following list summarizes the system behavior when node health checks and machine health checks are deployed:
If only the default machine health check exists, the node health check continues to identify unhealthy nodes. However, the node health check ignores unhealthy nodes in a Terminating state. The default machine health check handles the unhealthy nodes with a Terminating state.
Example log message
INFO MHCChecker ignoring unhealthy Node, it is terminating and will be handled by MHC {"NodeName": "node-1.example.com"}If the default machine health check is modified (for example, the
unhealthyConditionsisReady), or if additional machine health checks are created, the node health check is disabled.Example log message
INFO controllers.NodeHealthCheck disabling NHC in order to avoid conflict with custom MHCs configured in the cluster {"NodeHealthCheck": "/nhc-worker-default"}When, again, only the default machine health check exists, the node health check is re-enabled.
Example log message
INFO controllers.NodeHealthCheck re-enabling NHC, no conflicting MHC configured in the cluster {"NodeHealthCheck": "/nhc-worker-default"}
6.2. Control plane fencing Copy linkLink copied to clipboard!
In earlier releases, you could enable Self Node Remediation and Node Health Check on worker nodes. In the event of node failure, you can now also follow remediation strategies on control plane nodes.
Do not use the same NodeHealthCheck CR for worker nodes and control plane nodes. Grouping worker nodes and control plane nodes together can result in incorrect evaluation of the minimum healthy node count, and cause unexpected or missing remediations. This is because of the way the Node Health Check Operator handles control plane nodes. You should group the control plane nodes in their own group and the worker nodes in their own group. If required, you can also create multiple groups of worker nodes.
Considerations for remediation strategies:
- Avoid Node Health Check configurations that involve multiple configurations overlapping the same nodes because they can result in unexpected behavior. This suggestion applies to both worker and control plane nodes.
- The Node Health Check Operator implements a hardcoded limitation of remediating a maximum of one control plane node at a time. Multiple control plane nodes should not be remediated at the same time.
6.3. Installing the Node Health Check Operator by using the web console Copy linkLink copied to clipboard!
You can use the Red Hat OpenShift web console to install the Node Health Check Operator.
Prerequisites
-
Log in as a user with
cluster-adminprivileges.
Procedure
- In the Red Hat OpenShift web console, navigate to Operators → OperatorHub.
- Select the Node Health Check Operator, then click Install.
-
Keep the default selection of Installation mode and namespace to ensure that the Operator will be installed to the
openshift-workload-availabilitynamespace. -
Ensure that the Console plug-in is set to
Enable. - Click Install.
Verification
To confirm that the installation is successful:
- Navigate to the Operators → Installed Operators page.
-
Check that the Operator is installed in the
openshift-workload-availabilitynamespace and that its status isSucceeded.
If the Operator is not installed successfully:
- Navigate to the Operators → Installed Operators page and inspect the Status column for any errors or failures.
-
Navigate to the Workloads → Pods page and check the logs in any pods in the
openshift-workload-availabilityproject that are reporting issues.
6.4. Installing the Node Health Check Operator by using the CLI Copy linkLink copied to clipboard!
You can use the OpenShift CLI (oc) to install the Node Health Check Operator.
You can install the Node Health Check Operator in your own namespace or in the openshift-workload-availability namespace.
Prerequisites
-
Install the OpenShift CLI (
oc). -
Log in as a user with
cluster-adminprivileges.
Procedure
Create a
Namespacecustom resource (CR) for the Node Health Check Operator:Define the
NamespaceCR and save the YAML file, for example,node-health-check-namespace.yaml:apiVersion: v1 kind: Namespace metadata: name: openshift-workload-availabilityTo create the
NamespaceCR, run the following command:$ oc create -f node-health-check-namespace.yaml
Create an
OperatorGroupCR:Define the
OperatorGroupCR and save the YAML file, for example,workload-availability-operator-group.yaml:apiVersion: operators.coreos.com/v1 kind: OperatorGroup metadata: name: workload-availability-operator-group namespace: openshift-workload-availabilityTo create the
OperatorGroupCR, run the following command:$ oc create -f workload-availability-operator-group.yaml
Create a
SubscriptionCR:Define the
SubscriptionCR and save the YAML file, for example,node-health-check-subscription.yaml:apiVersion: operators.coreos.com/v1alpha1 kind: Subscription metadata: name: node-health-check-operator namespace: openshift-workload-availability1 spec: channel: stable2 installPlanApproval: Manual3 name: node-healthcheck-operator source: redhat-operators sourceNamespace: openshift-marketplace package: node-healthcheck-operator- 1
- Specify the
Namespacewhere you want to install the Node Health Check Operator. To install the Node Health Check Operator in theopenshift-workload-availabilitynamespace, specifyopenshift-workload-availabilityin theSubscriptionCR. - 2
- Specify the channel name for your subscription. To upgrade to the latest version of the Node Health Check Operator, you must manually change the channel name for your subscription from
candidatetostable. - 3
- Set the approval strategy to Manual in case your specified version is superseded by a later version in the catalog. This plan prevents an automatic upgrade to a later version and requires manual approval before the starting CSV can complete the installation.
To create the
SubscriptionCR, run the following command:$ oc create -f node-health-check-subscription.yaml
Verification
Verify that the installation succeeded by inspecting the CSV resource:
$ oc get csv -n openshift-workload-availabilityExample output
NAME DISPLAY VERSION REPLACES PHASE node-healthcheck-operator.v0.7.0 Node Health Check Operator 0.7.0 node-healthcheck-operator.v0.6.1 SucceededVerify that the Node Health Check Operator is up and running:
$ oc get deployment -n openshift-workload-availabilityExample output
NAME READY UP-TO-DATE AVAILABLE AGE node-healthcheck-controller-manager 2/2 2 2 10d
6.5. Creating a node health check Copy linkLink copied to clipboard!
Using the web console, you can create a node health check to identify unhealthy nodes and specify the remediation type and strategy to fix them.
Procedure
- From the Administrator perspective of the Red Hat OpenShift web console, click Compute → NodeHealthChecks → CreateNodeHealthCheck.
- Specify whether to configure the node health check using the Form view or the YAML view.
- Enter a Name for the node health check. The name must consist of lower case, alphanumeric characters, '-' or '.', and must start and end with an alphanumeric character.
- Specify the Remediator type, as Self node remediation or Other. You must install the Self Node Remediation Operator manually. Selecting Other requires an API version, Kind, Name, and Namespace to be entered, which then points to the remediation template resource of a remediator.
Make a Nodes selection by specifying the labels of the nodes you want to remediate. The selection matches labels that you want to check. If more than one label is specified, the nodes must contain each label. The default value is empty, which selects both worker and control-plane nodes.
NoteWhen creating a node health check with the Self Node Remediation Operator, you must select either
node-role.kubernetes.io/workerornode-role.kubernetes.io/control-planeas the value.- Specify the minimum number of healthy nodes, using either a percentage or a number, required for a NodeHealthCheck to remediate nodes in the targeted pool. If the number of healthy nodes equals to or exceeds the limit set by Min healthy, remediation occurs. The default value is 51%.
- Specify a list of Unhealthy conditions that if a node meets determines whether the node is considered unhealthy, and requires remediation. You can specify the Type, Status and Duration. You can also create your own custom type.
- Click Create to create the node health check.
Verification
- Navigate to the Compute → NodeHealthCheck page and verify that the corresponding node health check is listed, and their status displayed. Once created, node health checks can be paused, modified, and deleted.
6.6. Gathering data about the Node Health Check Operator Copy linkLink copied to clipboard!
To collect debugging information about the Node Health Check Operator, use the must-gather tool. For information about the must-gather image for the Node Health Check Operator, see Gathering data about specific features.
6.7. Additional resources Copy linkLink copied to clipboard!
Chapter 7. Placing nodes in maintenance mode with Node Maintenance Operator Copy linkLink copied to clipboard!
You can use the Node Maintenance Operator to place nodes in maintenance mode by using the oc adm utility or NodeMaintenance custom resources (CRs).
The Node Maintenance Operator is a "Rolling Stream" Operator, meaning updates are available asynchronously of OpenShift Container Platform releases. For more information, see OpenShift Operator Life Cycles on the Red Hat Customer Portal.
7.1. About the Node Maintenance Operator Copy linkLink copied to clipboard!
The Node Maintenance Operator watches for new or deleted NodeMaintenance CRs. When a new NodeMaintenance CR is detected, no new workloads are scheduled and the node is cordoned off from the rest of the cluster. All pods that can be evicted are evicted from the node. When a NodeMaintenance CR is deleted, the node that is referenced in the CR is made available for new workloads.
Using a NodeMaintenance CR for node maintenance tasks achieves the same results as the oc adm cordon and oc adm drain commands using standard Red Hat OpenShift CR processing.
7.2. Installing the Node Maintenance Operator Copy linkLink copied to clipboard!
You can install the Node Maintenance Operator using the web console or the OpenShift CLI (oc).
If OpenShift Virtualization version 4.10 or less is installed in your cluster, it includes an outdated version of the Node Maintenance Operator.
7.2.1. Installing the Node Maintenance Operator by using the web console Copy linkLink copied to clipboard!
You can use the Red Hat OpenShift web console to install the Node Maintenance Operator.
Prerequisites
-
Log in as a user with
cluster-adminprivileges.
Procedure
- In the Red Hat OpenShift web console, navigate to Operators → OperatorHub.
- Select the Node Maintenance Operator, then click Install.
-
Keep the default selection of Installation mode and namespace to ensure that the Operator will be installed to the
openshift-workload-availabilitynamespace. - Click Install.
Verification
To confirm that the installation is successful:
- Navigate to the Operators → Installed Operators page.
-
Check that the Operator is installed in the
openshift-workload-availabilitynamespace and that its status isSucceeded.
If the Operator is not installed successfully:
- Navigate to the Operators → Installed Operators page and inspect the Status column for any errors or failures.
-
Navigate to the Operators → Installed Operators → Node Maintenance Operator → Details page, and inspect the
Conditionssection for errors before pod creation. -
Navigate to the Workloads → Pods page, search for the
Node Maintenance Operatorpod in the installed namespace, and check the logs in theLogstab.
7.2.2. Installing the Node Maintenance Operator by using the CLI Copy linkLink copied to clipboard!
You can use the OpenShift CLI (oc) to install the Node Maintenance Operator.
You can install the Node Maintenance Operator in your own namespace or in the openshift-workload-availability namespace.
Prerequisites
-
Install the OpenShift CLI (
oc). -
Log in as a user with
cluster-adminprivileges.
Procedure
Create a
NamespaceCR for the Node Maintenance Operator:Define the
NamespaceCR and save the YAML file, for example,workload-availability-namespace.yaml:apiVersion: v1 kind: Namespace metadata: name: openshift-workload-availabilityTo create the
NamespaceCR, run the following command:$ oc create -f workload-availability-namespace.yaml
Create an
OperatorGroupCR:Define the
OperatorGroupCR and save the YAML file, for example,workload-availability-operator-group.yaml:apiVersion: operators.coreos.com/v1 kind: OperatorGroup metadata: name: workload-availability-operator-group namespace: openshift-workload-availabilityTo create the
OperatorGroupCR, run the following command:$ oc create -f workload-availability-operator-group.yaml
Create a
SubscriptionCR:Define the
SubscriptionCR and save the YAML file, for example,node-maintenance-subscription.yaml:apiVersion: operators.coreos.com/v1alpha1 kind: Subscription metadata: name: node-maintenance-operator namespace: openshift-workload-availability1 spec: channel: stable installPlanApproval: Automatic name: node-maintenance-operator source: redhat-operators sourceNamespace: openshift-marketplace package: node-maintenance-operator- 1
- Specify the
Namespacewhere you want to install the Node Maintenance Operator.
ImportantTo install the Node Maintenance Operator in the
openshift-workload-availabilitynamespace, specifyopenshift-workload-availabilityin theSubscriptionCR.To create the
SubscriptionCR, run the following command:$ oc create -f node-maintenance-subscription.yaml
Verification
Verify that the installation succeeded by inspecting the CSV resource:
$ oc get csv -n openshift-workload-availabilityExample output
NAME DISPLAY VERSION REPLACES PHASE node-maintenance-operator.v5.3.0 Node Maintenance Operator 5.3.0 node-maintenance-operator.v5.2.1 SucceededVerify that the Node Maintenance Operator is running:
$ oc get deployment -n openshift-workload-availabilityExample output
NAME READY UP-TO-DATE AVAILABLE AGE node-maintenance-operator-controller-manager 1/1 1 1 10d
The Node Maintenance Operator is supported in a restricted network environment. For more information, see Using Operator Lifecycle Manager on restricted networks.
7.3. Setting a node to maintenance mode Copy linkLink copied to clipboard!
You can place a node into maintenance mode from the web console or from the CLI by using a NodeMaintenance CR.
7.3.1. Setting a node to maintenance mode by using the web console Copy linkLink copied to clipboard!
To set a node to maintenance mode, you can create a NodeMaintenance custom resource (CR) by using the web console.
Prerequisites
-
Log in as a user with
cluster-adminprivileges. - Install the Node Maintenance Operator from the OperatorHub.
Procedure
- From the Administrator perspective in the web console, navigate to Operators → Installed Operators.
- Select the Node Maintenance Operator from the list of Operators.
- In the Node Maintenance tab, click Create NodeMaintenance.
-
In the Create NodeMaintenance page, select the Form view or the YAML view to configure the
NodeMaintenanceCR. -
To apply the
NodeMaintenanceCR that you have configured, click Create.
Verification
In the Node Maintenance tab, inspect the Status column and verify that its status is Succeeded.
7.3.2. Setting a node to maintenance mode by using the CLI Copy linkLink copied to clipboard!
You can put a node into maintenance mode with a NodeMaintenance custom resource (CR). When you apply a NodeMaintenance CR, all allowed pods are evicted and the node is rendered unschedulable. Evicted pods are queued to be moved to another node in the cluster.
Prerequisites
-
Install the Red Hat OpenShift CLI
oc. -
Log in to the cluster as a user with
cluster-adminprivileges.
Procedure
Create the following
NodeMaintenanceCR, and save the file asnodemaintenance-cr.yaml:apiVersion: nodemaintenance.medik8s.io/v1beta1 kind: NodeMaintenance metadata: name: nodemaintenance-cr1 spec: nodeName: node-1.example.com2 reason: "NIC replacement"3 Apply the node maintenance CR by running the following command:
$ oc apply -f nodemaintenance-cr.yaml
Verification
Check the progress of the maintenance task by running the following command:
$ oc describe node <node-name>where
<node-name>is the name of your node; for example,node-1.example.comCheck the example output:
Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal NodeNotSchedulable 61m kubelet Node node-1.example.com status is now: NodeNotSchedulable
7.3.3. Checking status of current NodeMaintenance CR tasks Copy linkLink copied to clipboard!
You can check the status of current NodeMaintenance CR tasks.
Prerequisites
-
Install the Red Hat OpenShift CLI
oc. -
Log in as a user with
cluster-adminprivileges.
Procedure
Check the status of current node maintenance tasks, for example the
NodeMaintenanceCR ornmobject, by running the following command:$ oc get nm -o yamlExample output
apiVersion: v1 items: - apiVersion: nodemaintenance.medik8s.io/v1beta1 kind: NodeMaintenance metadata: ... spec: nodeName: node-1.example.com reason: Node maintenance status: drainProgress: 1001 evictionPods: 32 lastError: "Last failure message"3 lastUpdate: "2022-06-23T11:43:18Z"4 phase: Succeeded totalpods: 55 ...
7.4. Resuming a node from maintenance mode Copy linkLink copied to clipboard!
You can resume a node from maintenance mode from the web console or from the CLI by using a NodeMaintenance CR. Resuming a node brings it out of maintenance mode and makes it schedulable again.
7.4.1. Resuming a node from maintenance mode by using the web console Copy linkLink copied to clipboard!
To resume a node from maintenance mode, you can delete a NodeMaintenance custom resource (CR) by using the web console.
Prerequisites
-
Log in as a user with
cluster-adminprivileges. - Install the Node Maintenance Operator from the OperatorHub.
Procedure
- From the Administrator perspective in the web console, navigate to Operators → Installed Operators.
- Select the Node Maintenance Operator from the list of Operators.
-
In the Node Maintenance tab, select the
NodeMaintenanceCR that you want to delete. -
Click the Options menu
at the end of the node and select Delete NodeMaintenance.
Verification
- In the Red Hat OpenShift console, click Compute → Nodes.
-
Inspect the
Statuscolumn of the node for which you deleted theNodeMaintenanceCR and verify that its status isReady.
7.4.2. Resuming a node from maintenance mode by using the CLI Copy linkLink copied to clipboard!
You can resume a node from maintenance mode that was initiated with a NodeMaintenance CR by deleting the NodeMaintenance CR.
Prerequisites
-
Install the Red Hat OpenShift CLI
oc. -
Log in to the cluster as a user with
cluster-adminprivileges.
Procedure
When your node maintenance task is complete, delete the active
NodeMaintenanceCR:$ oc delete -f nodemaintenance-cr.yamlExample output
nodemaintenance.nodemaintenance.medik8s.io "maintenance-example" deleted
Verification
Check the progress of the maintenance task by running the following command:
$ oc describe node <node-name>where
<node-name>is the name of your node; for example,node-1.example.comCheck the example output:
Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal NodeSchedulable 2m kubelet Node node-1.example.com status is now: NodeSchedulable
7.5. Working with bare-metal nodes Copy linkLink copied to clipboard!
For clusters with bare-metal nodes, you can place a node into maintenance mode, and resume a node from maintenance mode, by using the web console Actions control.
Clusters with bare-metal nodes can also place a node into maintenance mode, and resume a node from maintenance mode, by using the web console and CLI, as outlined. These methods, by using the web console Actions control, are applicable to bare-metal clusters only.
7.5.1. Maintaining bare-metal nodes Copy linkLink copied to clipboard!
When you deploy Red Hat OpenShift on bare-metal infrastructure, you must take additional considerations into account compared to deploying on cloud infrastructure. Unlike in cloud environments, where the cluster nodes are considered ephemeral, reprovisioning a bare-metal node requires significantly more time and effort for maintenance tasks.
When a bare-metal node fails due to a kernel error or a NIC card hardware failure, workloads on the failed node need to be restarted on another node in the cluster while the problem node is repaired or replaced. Node maintenance mode allows cluster administrators to gracefully turn-off nodes, move workloads to other parts of the cluster, and ensure that workloads do not get interrupted. Detailed progress and node status details are provided during maintenance.
7.5.2. Setting a bare-metal node to maintenance mode Copy linkLink copied to clipboard!
Set a bare-metal node to maintenance mode using the Options menu
found on each node in the Compute → Nodes list, or using the Actions control of the Node Details screen.
Procedure
- From the Administrator perspective of the web console, click Compute → Nodes.
You can set the node to maintenance from this screen, which makes it easier to perform actions on multiple nodes, or from the Node Details screen, where you can view comprehensive details of the selected node:
-
Click the Options menu
at the end of the node and select Start Maintenance.
- Click the node name to open the Node Details screen and click Actions → Start Maintenance.
-
Click the Options menu
- Click Start Maintenance in the confirmation window.
The node is no longer schedulable. If it had virtual machines with the LiveMigration eviction strategy, then it will live migrate them. All other pods and virtual machines on the node are deleted and recreated on another node.
Verification
-
Navigate to the Compute → Nodes page and verify that the corresponding node has a status of
Under maintenance.
7.5.3. Resuming a bare-metal node from maintenance mode Copy linkLink copied to clipboard!
Resume a bare-metal node from maintenance mode using the Options menu
found on each node in the Compute → Nodes list, or using the Actions control of the Node Details screen.
Procedure
- From the Administrator perspective of the web console, click Compute → Nodes.
You can resume the node from this screen, which makes it easier to perform actions on multiple nodes, or from the Node Details screen, where you can view comprehensive details of the selected node:
-
Click the Options menu
at the end of the node and select Stop Maintenance.
- Click the node name to open the Node Details screen and click Actions → Stop Maintenance.
-
Click the Options menu
- Click Stop Maintenance in the confirmation window.
The node becomes schedulable. If it had virtual machine instances that were running on the node prior to maintenance, then they will not automatically migrate back to this node.
Verification
-
Navigate to the Compute → Nodes page and verify that the corresponding node has a status of
Ready.
7.6. Gathering data about the Node Maintenance Operator Copy linkLink copied to clipboard!
To collect debugging information about the Node Maintenance Operator, use the must-gather tool. For information about the must-gather image for the Node Maintenance Operator, see Gathering data about specific features.