Este contenido no está disponible en el idioma seleccionado.
Chapter 11. Node maintenance
11.1. About node maintenance Copiar enlaceEnlace copiado en el portapapeles!
11.1.1. Understanding node maintenance mode Copiar enlaceEnlace copiado en el portapapeles!
Nodes can be placed into maintenance mode using the
oc adm
NodeMaintenance
Placing a node into maintenance marks the node as unschedulable and drains all the virtual machines and pods from it. Virtual machine instances that have a
LiveMigrate
Virtual machine instances without an eviction strategy are shut down. Virtual machines with a
RunStrategy
Running
RerunOnFailure
RunStrategy
Manual
Virtual machines must have a persistent volume claim (PVC) with a shared
ReadWriteMany
When installed as part of OpenShift Virtualization, Node Maintenance Operator watches for new or deleted
NodeMaintenance
NodeMaintenance
NodeMaintenance
Using a
NodeMaintenance
oc adm cordon
oc adm drain
11.1.2. Maintaining bare metal nodes Copiar enlaceEnlace copiado en el portapapeles!
When you deploy OpenShift Container Platform on bare metal infrastructure, there are additional considerations that must be taken into account compared to deploying on cloud infrastructure. Unlike in cloud environments where the cluster nodes are considered ephemeral, re-provisioning a bare metal node requires significantly more time and effort for maintenance tasks.
When a bare metal node fails, for example, if a fatal kernel error happens or a NIC card hardware failure occurs, workloads on the failed node need to be restarted elsewhere else on the cluster while the problem node is repaired or replaced. Node maintenance mode allows cluster administrators to gracefully power down nodes, moving workloads to other parts of the cluster and ensuring workloads do not get interrupted. Detailed progress and node status details are provided during maintenance.
11.2. Setting a node to maintenance mode Copiar enlaceEnlace copiado en el portapapeles!
Place a node into maintenance from the web console, CLI, or using a
NodeMaintenance
11.2.1. Setting a node to maintenance mode in the web console Copiar enlaceEnlace copiado en el portapapeles!
Set a node to maintenance mode using the Options menu
found on each node in the Compute
Procedure
-
In the OpenShift Virtualization console, click Compute
Nodes. You can set the node to maintenance from this screen, which makes it easier to perform actions on multiple nodes in the one screen or from the Node Details screen where you can view comprehensive details of the selected node:
-
Click the Options menu
at the end of the node and select Start Maintenance.
-
Click the node name to open the Node Details screen and click Actions
Start Maintenance.
-
Click the Options menu
- Click Start Maintenance in the confirmation window.
The node will live migrate virtual machine instances that have the
LiveMigration
11.2.2. Setting a node to maintenance mode in the CLI Copiar enlaceEnlace copiado en el portapapeles!
Set a node to maintenance mode by marking it as unschedulable and using the
oc adm drain
Procedure
Mark the node as unschedulable. The node status changes to
.NotReady,SchedulingDisabled$ oc adm cordon <node1>Drain the node in preparation for maintenance. The node live migrates virtual machine instances that have the
condition set toLiveMigratableand theTruefield set tospec:evictionStrategy. All other pods and virtual machines on the node are deleted and recreated on another node.LiveMigrate$ oc adm drain <node1> --delete-emptydir-data --ignore-daemonsets=true --force-
The flag removes any virtual machine instances on the node that use
--delete-emptydir-datavolumes. Data in these volumes is ephemeral and is safe to be deleted after termination.emptyDir -
The flag ensures that daemon sets are ignored and pod eviction can continue successfully.
--ignore-daemonsets=true -
The flag is required to delete pods that are not managed by a replica set or daemon set controller.
--force
-
The
11.2.3. Setting a node to maintenance mode with a NodeMaintenance custom resource Copiar enlaceEnlace copiado en el portapapeles!
You can put a node into maintenance mode with a
NodeMaintenance
NodeMaintenance
Prerequisites
-
Install the OpenShift Container Platform CLI .
oc -
Log in to the cluster as a user with privileges.
cluster-admin
Procedure
Create the following node maintenance CR, and save the file as
:nodemaintenance-cr.yamlapiVersion: nodemaintenance.kubevirt.io/v1beta1 kind: NodeMaintenance metadata: name: maintenance-example1 spec: nodeName: node-1.example.com2 reason: "Node maintenance"3 Apply the node maintenance schedule by running the following command:
$ oc apply -f nodemaintenance-cr.yamlCheck the progress of the maintenance task by running the following command, replacing
with the name of your node:<node-name>$ oc describe node <node-name>Example output
Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal NodeNotSchedulable 61m kubelet Node node-1.example.com status is now: NodeNotSchedulable
11.2.3.1. Checking status of current NodeMaintenance CR tasks Copiar enlaceEnlace copiado en el portapapeles!
You can check the status of current
NodeMaintenance
Prerequisites
-
Install the OpenShift Container Platform CLI .
oc -
Log in as a user with privileges.
cluster-admin
Procedure
Check the status of current node maintenance tasks by running the following command:
$ oc get NodeMaintenance -o yamlExample output
apiVersion: v1 items: - apiVersion: nodemaintenance.kubevirt.io/v1beta1 kind: NodeMaintenance metadata: ... spec: nodeName: node-1.example.com reason: Node maintenance status: evictionPods: 31 pendingPods: - pod-example-workload-0 - httpd - httpd-manual phase: Running lastError: "Last failure message"2 totalpods: 5 ...
11.3. Resuming a node from maintenance mode Copiar enlaceEnlace copiado en el portapapeles!
Resuming a node brings it out of maintenance mode and makes it schedulable again.
Resume a node from maintenance mode from the web console, CLI, or by deleting the
NodeMaintenance
11.3.1. Resuming a node from maintenance mode in the web console Copiar enlaceEnlace copiado en el portapapeles!
Resume a node from maintenance mode using the Options menu
found on each node in the Compute
Procedure
-
In the OpenShift Virtualization console, click Compute
Nodes. You can resume the node from this screen, which makes it easier to perform actions on multiple nodes in the one screen, or from the Node Details screen where you can view comprehensive details of the selected node:
-
Click the Options menu
at the end of the node and select Stop Maintenance.
-
Click the node name to open the Node Details screen and click Actions
Stop Maintenance.
-
Click the Options menu
- Click Stop Maintenance in the confirmation window.
The node becomes schedulable, but virtual machine instances that were running on the node prior to maintenance will not automatically migrate back to this node.
11.3.2. Resuming a node from maintenance mode in the CLI Copiar enlaceEnlace copiado en el portapapeles!
Resume a node from maintenance mode by making it schedulable again.
Procedure
Mark the node as schedulable. You can then resume scheduling new workloads on the node.
$ oc adm uncordon <node1>
11.3.3. Resuming a node from maintenance mode that was initiated with a NodeMaintenance CR Copiar enlaceEnlace copiado en el portapapeles!
You can resume a node by deleting the
NodeMaintenance
Prerequisites
-
Install the OpenShift Container Platform CLI .
oc -
Log in to the cluster as a user with privileges.
cluster-admin
Procedure
When your node maintenance task is complete, delete the active
CR:NodeMaintenance$ oc delete -f nodemaintenance-cr.yamlExample output
nodemaintenance.nodemaintenance.kubevirt.io "maintenance-example" deleted
11.4. Automatic renewal of TLS certificates Copiar enlaceEnlace copiado en el portapapeles!
All TLS certificates for OpenShift Virtualization components are renewed and rotated automatically. You are not required to refresh them manually.
11.4.1. TLS certificates automatic renewal schedules Copiar enlaceEnlace copiado en el portapapeles!
TLS certificates are automatically deleted and replaced according to the following schedule:
- KubeVirt certificates are renewed daily.
- Containerized Data Importer controller (CDI) certificates are renewed every 15 days.
- MAC pool certificates are renewed every year.
Automatic TLS certificate rotation does not disrupt any operations. For example, the following operations continue to function without any disruption:
- Migrations
- Image uploads
- VNC and console connections
11.5. Managing node labeling for obsolete CPU models Copiar enlaceEnlace copiado en el portapapeles!
You can schedule a virtual machine (VM) on a node as long as the VM CPU model and policy are supported by the node.
11.5.1. About node labeling for obsolete CPU models Copiar enlaceEnlace copiado en el portapapeles!
The OpenShift Virtualization Operator uses a predefined list of obsolete CPU models to ensure that a node supports only valid CPU models for scheduled VMs.
By default, the following CPU models are eliminated from the list of labels generated for the node:
Example 11.1. Obsolete CPU models
"486"
Conroe
athlon
core2duo
coreduo
kvm32
kvm64
n270
pentium
pentium2
pentium3
pentiumpro
phenom
qemu32
qemu64
This predefined list is not visible in the
HyperConverged
spec.obsoleteCPUs.cpuModels
HyperConverged
11.5.2. About node labeling for CPU features Copiar enlaceEnlace copiado en el portapapeles!
Through the process of iteration, the base CPU features in the minimum CPU model are eliminated from the list of labels generated for the node.
For example:
-
An environment might have two supported CPU models: and
Penryn.Haswell If
is specified as the CPU model forPenryn, each base CPU feature forminCPUis compared to the list of CPU features supported byPenryn.HaswellExample 11.2. CPU features supported by
Penrynapic clflush cmov cx16 cx8 de fpu fxsr lahf_lm lm mca mce mmx msr mtrr nx pae pat pge pni pse pse36 sep sse sse2 sse4.1 ssse3 syscall tscExample 11.3. CPU features supported by
Haswellaes apic avx avx2 bmi1 bmi2 clflush cmov cx16 cx8 de erms fma fpu fsgsbase fxsr hle invpcid lahf_lm lm mca mce mmx movbe msr mtrr nx pae pat pcid pclmuldq pge pni popcnt pse pse36 rdtscp rtm sep smep sse sse2 sse4.1 sse4.2 ssse3 syscall tsc tsc-deadline x2apic xsaveIf both
andPenrynsupport a specific CPU feature, a label is not created for that feature. Labels are generated for CPU features that are supported only byHaswelland not byHaswell.PenrynExample 11.4. Node labels created for CPU features after iteration
aes avx avx2 bmi1 bmi2 erms fma fsgsbase hle invpcid movbe pcid pclmuldq popcnt rdtscp rtm sse4.2 tsc-deadline x2apic xsave
11.5.3. Configuring obsolete CPU models Copiar enlaceEnlace copiado en el portapapeles!
You can configure a list of obsolete CPU models by editing the
HyperConverged
Procedure
Edit the
custom resource, specifying the obsolete CPU models in theHyperConvergedarray. For example:obsoleteCPUsapiVersion: hco.kubevirt.io/v1beta1 kind: HyperConverged metadata: name: kubevirt-hyperconverged namespace: openshift-cnv spec: obsoleteCPUs: cpuModels:1 - "<obsolete_cpu_1>" - "<obsolete_cpu_2>" minCPUModel: "<minimum_cpu_model>"2 - 1
- Replace the example values in the
cpuModelsarray with obsolete CPU models. Any value that you specify is added to a predefined list of obsolete CPU models. The predefined list is not visible in the CR. - 2
- Replace this value with the minimum CPU model that you want to use for basic CPU features. If you do not specify a value,
Penrynis used by default.
11.6. Preventing node reconciliation Copiar enlaceEnlace copiado en el portapapeles!
Use
skip-node
node-labeller
11.6.1. Using skip-node annotation Copiar enlaceEnlace copiado en el portapapeles!
If you want the
node-labeller
oc
Prerequisites
-
You have installed the OpenShift CLI ().
oc
Procedure
Annotate the node that you want to skip by running the following command:
$ oc annotate node <node_name> node-labeller.kubevirt.io/skip-node=true1 - 1
- Replace
<node_name>with the name of the relevant node to skip.
Reconciliation resumes on the next cycle after the node annotation is removed or set to false.