Ce contenu n'est pas disponible dans la langue sélectionnée.
Chapter 7. Placing nodes in maintenance mode with Node Maintenance Operator
You can use the Node Maintenance Operator to place nodes in maintenance mode by using the oc adm utility or NodeMaintenance custom resources (CRs).
7.1. About the Node Maintenance Operator Copier lienLien copié sur presse-papiers!
The Node Maintenance Operator watches for new or deleted NodeMaintenance CRs. When a new NodeMaintenance CR is detected, no new workloads are scheduled and the node is cordoned off from the rest of the cluster. All pods that can be evicted are evicted from the node. When a NodeMaintenance CR is deleted, the node that is referenced in the CR is made available for new workloads.
Using a NodeMaintenance CR for node maintenance tasks achieves the same results as the oc adm cordon and oc adm drain commands using standard Red Hat OpenShift CR processing.
7.2. Installing the Node Maintenance Operator Copier lienLien copié sur presse-papiers!
You can install the Node Maintenance Operator using the web console or the OpenShift CLI (oc).
If OpenShift Virtualization version 4.10 or less is installed in your cluster, it includes an outdated version of the Node Maintenance Operator.
7.2.1. Installing the Node Maintenance Operator by using the web console Copier lienLien copié sur presse-papiers!
You can use the Red Hat OpenShift web console to install the Node Maintenance Operator.
Prerequisites
-
Log in as a user with
cluster-adminprivileges.
Procedure
-
In the Red Hat OpenShift web console, navigate to Operators
OperatorHub. - Select the Node Maintenance Operator, then click Install.
-
Keep the default selection of Installation mode and namespace to ensure that the Operator will be installed to the
openshift-workload-availabilitynamespace. - Click Install.
Verification
To confirm that the installation is successful:
-
Navigate to the Operators
Installed Operators page. -
Check that the Operator is installed in the
openshift-workload-availabilitynamespace and that its status isSucceeded.
If the Operator is not installed successfully:
-
Navigate to the Operators
Installed Operators page and inspect the Status column for any errors or failures. -
Navigate to the Operators
Installed Operators Node Maintenance Operator Details page, and inspect the Conditionssection for errors before pod creation. -
Navigate to the Workloads
Pods page, search for the Node Maintenance Operatorpod in the installed namespace, and check the logs in theLogstab.
7.2.2. Installing the Node Maintenance Operator by using the CLI Copier lienLien copié sur presse-papiers!
You can use the OpenShift CLI (oc) to install the Node Maintenance Operator.
You can install the Node Maintenance Operator in your own namespace or in the openshift-workload-availability namespace.
Prerequisites
-
Install the OpenShift CLI (
oc). -
Log in as a user with
cluster-adminprivileges.
Procedure
Create a
NamespaceCR for the Node Maintenance Operator:Define the
NamespaceCR and save the YAML file, for example,workload-availability-namespace.yaml:apiVersion: v1 kind: Namespace metadata: name: openshift-workload-availability
apiVersion: v1 kind: Namespace metadata: name: openshift-workload-availabilityCopy to Clipboard Copied! Toggle word wrap Toggle overflow To create the
NamespaceCR, run the following command:oc create -f workload-availability-namespace.yaml
$ oc create -f workload-availability-namespace.yamlCopy to Clipboard Copied! Toggle word wrap Toggle overflow
Create an
OperatorGroupCR:Define the
OperatorGroupCR and save the YAML file, for example,workload-availability-operator-group.yaml:apiVersion: operators.coreos.com/v1 kind: OperatorGroup metadata: name: workload-availability-operator-group namespace: openshift-workload-availability
apiVersion: operators.coreos.com/v1 kind: OperatorGroup metadata: name: workload-availability-operator-group namespace: openshift-workload-availabilityCopy to Clipboard Copied! Toggle word wrap Toggle overflow To create the
OperatorGroupCR, run the following command:oc create -f workload-availability-operator-group.yaml
$ oc create -f workload-availability-operator-group.yamlCopy to Clipboard Copied! Toggle word wrap Toggle overflow
Create a
SubscriptionCR:Define the
SubscriptionCR and save the YAML file, for example,node-maintenance-subscription.yaml:Copy to Clipboard Copied! Toggle word wrap Toggle overflow - 1
- Specify the
Namespacewhere you want to install the Node Maintenance Operator.
ImportantTo install the Node Maintenance Operator in the
openshift-workload-availabilitynamespace, specifyopenshift-workload-availabilityin theSubscriptionCR.To create the
SubscriptionCR, run the following command:oc create -f node-maintenance-subscription.yaml
$ oc create -f node-maintenance-subscription.yamlCopy to Clipboard Copied! Toggle word wrap Toggle overflow
Verification
Verify that the installation succeeded by inspecting the CSV resource:
oc get csv -n openshift-workload-availability
$ oc get csv -n openshift-workload-availabilityCopy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
NAME DISPLAY VERSION REPLACES PHASE node-maintenance-operator.v5.3.0 Node Maintenance Operator 5.3.0 node-maintenance-operator.v5.2.1 Succeeded
NAME DISPLAY VERSION REPLACES PHASE node-maintenance-operator.v5.3.0 Node Maintenance Operator 5.3.0 node-maintenance-operator.v5.2.1 SucceededCopy to Clipboard Copied! Toggle word wrap Toggle overflow Verify that the Node Maintenance Operator is running:
oc get deployment -n openshift-workload-availability
$ oc get deployment -n openshift-workload-availabilityCopy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
NAME READY UP-TO-DATE AVAILABLE AGE node-maintenance-operator-controller-manager 1/1 1 1 10d
NAME READY UP-TO-DATE AVAILABLE AGE node-maintenance-operator-controller-manager 1/1 1 1 10dCopy to Clipboard Copied! Toggle word wrap Toggle overflow
The Node Maintenance Operator is supported in a restricted network environment. For more information, see Using Operator Lifecycle Manager on restricted networks.
7.3. Setting a node to maintenance mode Copier lienLien copié sur presse-papiers!
You can place a node into maintenance mode from the web console or from the CLI by using a NodeMaintenance CR.
7.3.1. Setting a node to maintenance mode by using the web console Copier lienLien copié sur presse-papiers!
To set a node to maintenance mode, you can create a NodeMaintenance custom resource (CR) by using the web console.
Prerequisites
-
Log in as a user with
cluster-adminprivileges. - Install the Node Maintenance Operator from the OperatorHub.
Procedure
-
From the Administrator perspective in the web console, navigate to Operators
Installed Operators. - Select the Node Maintenance Operator from the list of Operators.
- In the Node Maintenance tab, click Create NodeMaintenance.
-
In the Create NodeMaintenance page, select the Form view or the YAML view to configure the
NodeMaintenanceCR. -
To apply the
NodeMaintenanceCR that you have configured, click Create.
Verification
In the Node Maintenance tab, inspect the Status column and verify that its status is Succeeded.
7.3.2. Setting a node to maintenance mode by using the CLI Copier lienLien copié sur presse-papiers!
You can put a node into maintenance mode with a NodeMaintenance custom resource (CR). When you apply a NodeMaintenance CR, all allowed pods are evicted and the node is rendered unschedulable. Evicted pods are queued to be moved to another node in the cluster.
Prerequisites
-
Install the Red Hat OpenShift CLI
oc. -
Log in to the cluster as a user with
cluster-adminprivileges.
Procedure
Create the following
NodeMaintenanceCR, and save the file asnodemaintenance-cr.yaml:Copy to Clipboard Copied! Toggle word wrap Toggle overflow Apply the node maintenance CR by running the following command:
oc apply -f nodemaintenance-cr.yaml
$ oc apply -f nodemaintenance-cr.yamlCopy to Clipboard Copied! Toggle word wrap Toggle overflow
Verification
Check the progress of the maintenance task by running the following command:
oc describe node <node-name>
$ oc describe node <node-name>Copy to Clipboard Copied! Toggle word wrap Toggle overflow where
<node-name>is the name of your node; for example,node-1.example.comCheck the example output:
Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal NodeNotSchedulable 61m kubelet Node node-1.example.com status is now: NodeNotSchedulable
Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal NodeNotSchedulable 61m kubelet Node node-1.example.com status is now: NodeNotSchedulableCopy to Clipboard Copied! Toggle word wrap Toggle overflow
7.3.3. Checking status of current NodeMaintenance CR tasks Copier lienLien copié sur presse-papiers!
You can check the status of current NodeMaintenance CR tasks.
Prerequisites
-
Install the Red Hat OpenShift CLI
oc. -
Log in as a user with
cluster-adminprivileges.
Procedure
Check the status of current node maintenance tasks, for example the
NodeMaintenanceCR ornmobject, by running the following command:oc get nm -o yaml
$ oc get nm -o yamlCopy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
7.4. Resuming a node from maintenance mode Copier lienLien copié sur presse-papiers!
You can resume a node from maintenance mode from the web console or from the CLI by using a NodeMaintenance CR. Resuming a node brings it out of maintenance mode and makes it schedulable again.
7.4.1. Resuming a node from maintenance mode by using the web console Copier lienLien copié sur presse-papiers!
To resume a node from maintenance mode, you can delete a NodeMaintenance custom resource (CR) by using the web console.
Prerequisites
-
Log in as a user with
cluster-adminprivileges. - Install the Node Maintenance Operator from the OperatorHub.
Procedure
-
From the Administrator perspective in the web console, navigate to Operators
Installed Operators. - Select the Node Maintenance Operator from the list of Operators.
-
In the Node Maintenance tab, select the
NodeMaintenanceCR that you want to delete. -
Click the Options menu
at the end of the node and select Delete NodeMaintenance.
Verification
-
In the Red Hat OpenShift console, click Compute
Nodes. -
Inspect the
Statuscolumn of the node for which you deleted theNodeMaintenanceCR and verify that its status isReady.
7.4.2. Resuming a node from maintenance mode by using the CLI Copier lienLien copié sur presse-papiers!
You can resume a node from maintenance mode that was initiated with a NodeMaintenance CR by deleting the NodeMaintenance CR.
Prerequisites
-
Install the Red Hat OpenShift CLI
oc. -
Log in to the cluster as a user with
cluster-adminprivileges.
Procedure
When your node maintenance task is complete, delete the active
NodeMaintenanceCR:oc delete -f nodemaintenance-cr.yaml
$ oc delete -f nodemaintenance-cr.yamlCopy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
nodemaintenance.nodemaintenance.medik8s.io "maintenance-example" deleted
nodemaintenance.nodemaintenance.medik8s.io "maintenance-example" deletedCopy to Clipboard Copied! Toggle word wrap Toggle overflow
Verification
Check the progress of the maintenance task by running the following command:
oc describe node <node-name>
$ oc describe node <node-name>Copy to Clipboard Copied! Toggle word wrap Toggle overflow where
<node-name>is the name of your node; for example,node-1.example.comCheck the example output:
Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal NodeSchedulable 2m kubelet Node node-1.example.com status is now: NodeSchedulable
Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal NodeSchedulable 2m kubelet Node node-1.example.com status is now: NodeSchedulableCopy to Clipboard Copied! Toggle word wrap Toggle overflow
7.5. Working with bare-metal nodes Copier lienLien copié sur presse-papiers!
For clusters with bare-metal nodes, you can place a node into maintenance mode, and resume a node from maintenance mode, by using the web console Actions control.
Clusters with bare-metal nodes can also place a node into maintenance mode, and resume a node from maintenance mode, by using the web console and CLI, as outlined. These methods, by using the web console Actions control, are applicable to bare-metal clusters only.
7.5.1. Maintaining bare-metal nodes Copier lienLien copié sur presse-papiers!
When you deploy Red Hat OpenShift on bare-metal infrastructure, you must take additional considerations into account compared to deploying on cloud infrastructure. Unlike in cloud environments, where the cluster nodes are considered ephemeral, reprovisioning a bare-metal node requires significantly more time and effort for maintenance tasks.
When a bare-metal node fails due to a kernel error or a NIC card hardware failure, workloads on the failed node need to be restarted on another node in the cluster while the problem node is repaired or replaced. Node maintenance mode allows cluster administrators to gracefully turn-off nodes, move workloads to other parts of the cluster, and ensure that workloads do not get interrupted. Detailed progress and node status details are provided during maintenance.
7.5.2. Setting a bare-metal node to maintenance mode Copier lienLien copié sur presse-papiers!
Set a bare-metal node to maintenance mode using the Options menu
found on each node in the Compute
Procedure
-
From the Administrator perspective of the web console, click Compute
Nodes. You can set the node to maintenance from this screen, which makes it easier to perform actions on multiple nodes, or from the Node Details screen, where you can view comprehensive details of the selected node:
-
Click the Options menu
at the end of the node and select Start Maintenance.
-
Click the node name to open the Node Details screen and click Actions
Start Maintenance.
-
Click the Options menu
- Click Start Maintenance in the confirmation window.
The node is no longer schedulable. If it had virtual machines with the LiveMigration eviction strategy, then it will live migrate them. All other pods and virtual machines on the node are deleted and recreated on another node.
Verification
-
Navigate to the Compute
Nodes page and verify that the corresponding node has a status of Under maintenance.
7.5.3. Resuming a bare-metal node from maintenance mode Copier lienLien copié sur presse-papiers!
Resume a bare-metal node from maintenance mode using the Options menu
found on each node in the Compute
Procedure
-
From the Administrator perspective of the web console, click Compute
Nodes. You can resume the node from this screen, which makes it easier to perform actions on multiple nodes, or from the Node Details screen, where you can view comprehensive details of the selected node:
-
Click the Options menu
at the end of the node and select Stop Maintenance.
-
Click the node name to open the Node Details screen and click Actions
Stop Maintenance.
-
Click the Options menu
- Click Stop Maintenance in the confirmation window.
The node becomes schedulable. If it had virtual machine instances that were running on the node prior to maintenance, then they will not automatically migrate back to this node.
Verification
-
Navigate to the Compute
Nodes page and verify that the corresponding node has a status of Ready.
7.6. Gathering data about the Node Maintenance Operator Copier lienLien copié sur presse-papiers!
To collect debugging information about the Node Maintenance Operator, use the must-gather tool. For information about the must-gather image for the Node Maintenance Operator, see Gathering data about specific features.