Chapter 11. Nodes

11.1. Node maintenance
Copy link

Nodes can be placed into maintenance mode by using the oc adm utility or NodeMaintenance custom resources (CRs).

Note

The node-maintenance-operator (NMO) is no longer shipped with OpenShift Virtualization. It is deployed as a standalone Operator from the OperatorHub in the OpenShift Container Platform web console or by using the OpenShift CLI (oc).

For more information on remediation, fencing, and maintaining nodes, see the Workload Availability for Red Hat OpenShift documentation.

Important

Virtual machines (VMs) must have a persistent volume claim (PVC) with a shared ReadWriteMany (RWX) access mode to be live migrated.

The Node Maintenance Operator watches for new or deleted NodeMaintenance CRs. When a new NodeMaintenance CR is detected, no new workloads are scheduled and the node is cordoned off from the rest of the cluster. All pods that can be evicted are evicted from the node. When a NodeMaintenance CR is deleted, the node that is referenced in the CR is made available for new workloads.

Note

Using a NodeMaintenance CR for node maintenance tasks achieves the same results as the oc adm cordon and oc adm drain commands using standard OpenShift Container Platform custom resource processing.

11.1.1. Eviction strategies
Copy link

Placing a node into maintenance marks the node as unschedulable and drains all the VMs and pods from it.

You can configure eviction strategies for virtual machines (VMs) or for the cluster.

VM eviction strategy

The VM LiveMigrate eviction strategy ensures that a virtual machine instance (VMI) is not interrupted if the node is placed into maintenance or drained. VMIs with this eviction strategy will be live migrated to another node.

You can configure eviction strategies for virtual machines (VMs) by using the OpenShift Virtualization web console or the command line.

Important

The default eviction strategy is LiveMigrate. A non-migratable VM with a LiveMigrate eviction strategy might prevent nodes from draining or block an infrastructure upgrade because the VM is not evicted from the node. This situation causes a migration to remain in a Pending or Scheduling state unless you shut down the VM manually.

You must set the eviction strategy of non-migratable VMs to LiveMigrateIfPossible, which does not block an upgrade, or to None, for VMs that should not be migrated.

Cluster eviction strategy: You can configure an eviction strategy for the cluster to prioritize workload continuity or infrastructure upgrade.

Important

Configuring a cluster eviction strategy is a Technology Preview feature only. Technology Preview features are not supported with Red Hat production service level agreements (SLAs) and might not be functionally complete. Red Hat does not recommend using them in production. These features provide early access to upcoming product features, enabling customers to test functionality and provide feedback during the development process.

For more information about the support scope of Red Hat Technology Preview features, see Technology Preview Features Support Scope.

Expand

Table 11.1. Cluster eviction strategies
Eviction strategy	Description	Interrupts workflow	Blocks upgrades
`LiveMigrate` ¹	Prioritizes workload continuity over upgrades.	No	Yes ²
`LiveMigrateIfPossible`	Prioritizes upgrades over workload continuity to ensure that the environment is updated.	Yes	No
`None` ³	Shuts down VMs with no eviction strategy.	Yes	No

Default eviction strategy for multi-node clusters.
If a VM blocks an upgrade, you must shut down the VM manually.
Default eviction strategy for single-node OpenShift.

11.1.1.1. Configuring a VM eviction strategy using the command line
Copy link

You can configure an eviction strategy for a virtual machine (VM) by using the command line.

Important

The default eviction strategy is LiveMigrate. A non-migratable VM with a LiveMigrate eviction strategy might prevent nodes from draining or block an infrastructure upgrade because the VM is not evicted from the node. This situation causes a migration to remain in a Pending or Scheduling state unless you shut down the VM manually.

You must set the eviction strategy of non-migratable VMs to LiveMigrateIfPossible, which does not block an upgrade, or to None, for VMs that should not be migrated.

Procedure

Edit the VirtualMachine resource by running the following command:

$ oc edit vm <vm_name> -n <namespace>

Example eviction strategy

apiVersion: kubevirt.io/v1
kind: VirtualMachine
metadata:
  name: <vm_name>
spec:
  template:
    spec:
      evictionStrategy: LiveMigrateIfPossible

1


# ...

1: Specify the eviction strategy. The default value is LiveMigrate.

Restart the VM to apply the changes:

$ virtctl restart <vm_name> -n <namespace>

11.1.1.2. Configuring a cluster eviction strategy by using the command line
Copy link

You can configure an eviction strategy for a cluster by using the command line.

Important

Configuring a cluster eviction strategy is a Technology Preview feature only. Technology Preview features are not supported with Red Hat production service level agreements (SLAs) and might not be functionally complete. Red Hat does not recommend using them in production. These features provide early access to upcoming product features, enabling customers to test functionality and provide feedback during the development process.

For more information about the support scope of Red Hat Technology Preview features, see Technology Preview Features Support Scope.

Procedure

Edit the hyperconverged resource by running the following command:

$ oc edit hyperconverged kubevirt-hyperconverged -n openshift-cnv

Set the cluster eviction strategy as shown in the following example:

Example cluster eviction strategy

apiVersion: hco.kubevirt.io/v1beta1
kind: HyperConverged
metadata:
  name: kubevirt-hyperconverged
spec:
  evictionStrategy: LiveMigrate
# ...

11.1.2. Run strategies
Copy link

A virtual machine (VM) configured with spec.running: true is immediately restarted. The spec.runStrategy key provides greater flexibility for determining how a VM behaves under certain conditions.

Important

The spec.runStrategy and spec.running keys are mutually exclusive. Only one of them can be used.

A VM configuration with both keys is invalid.

11.1.2.1. Run strategies
Copy link

The spec.runStrategy key has four possible values:

Always: The virtual machine instance (VMI) is always present when a virtual machine (VM) is created on another node. A new VMI is created if the original stops for any reason. This is the same behavior as running: true.
RerunOnFailure: The VMI is re-created on another node if the previous instance fails. The instance is not re-created if the VM stops successfully, such as when it is shut down.
Important
Setting spec.runStrategy: RerunOnFailure is an explicit command to start the VM. If the VM is intentionally stopped and you do not want it to start, do not change the run strategy value to RerunOnFailure.
Manual: You control the VMI state manually with the start, stop, and restart virtctl client commands. The VM is not automatically restarted.
Halted: No VMI is present when a VM is created. This is the same behavior as running: false.

Different combinations of the virtctl start, stop and restart commands affect the run strategy.

The following table describes a VM’s transition between states. The first column shows the VM’s initial run strategy. The remaining columns show a virtctl command and the new run strategy after that command is run.

Expand

Table 11.2. Run strategy before and after virtctl commands
Initial run strategy	Start	Stop	Restart
Always	-	Halted	Always
RerunOnFailure	RerunOnFailure	RerunOnFailure	RerunOnFailure
Manual	Manual	Manual	Manual
Halted	Always	-	-

Note

If a node in a cluster installed by using installer-provisioned infrastructure fails the machine health check and is unavailable, VMs with runStrategy: Always or runStrategy: RerunOnFailure are rescheduled on a new node.

11.1.2.2. Configuring a VM run strategy by using the command line
Copy link

You can configure a run strategy for a virtual machine (VM) by using the command line.

Important

The spec.runStrategy and spec.running keys are mutually exclusive. A VM configuration that contains values for both keys is invalid.

Procedure

Edit the VirtualMachine resource by running the following command:

$ oc edit vm <vm_name> -n <namespace>

Example run strategy

apiVersion: kubevirt.io/v1
kind: VirtualMachine
spec:
  runStrategy: Always
# ...

11.1.3. Maintaining bare metal nodes
Copy link

When you deploy OpenShift Container Platform on bare metal infrastructure, there are additional considerations that must be taken into account compared to deploying on cloud infrastructure. Unlike in cloud environments where the cluster nodes are considered ephemeral, re-provisioning a bare metal node requires significantly more time and effort for maintenance tasks.

When a bare metal node fails, for example, if a fatal kernel error happens or a NIC card hardware failure occurs, workloads on the failed node need to be restarted elsewhere else on the cluster while the problem node is repaired or replaced. Node maintenance mode allows cluster administrators to gracefully power down nodes, moving workloads to other parts of the cluster and ensuring workloads do not get interrupted. Detailed progress and node status details are provided during maintenance.

11.2. Managing node labeling for obsolete CPU models
Copy link

You can schedule a virtual machine (VM) on a node as long as the VM CPU model and policy are supported by the node.

11.2.1. About node labeling for obsolete CPU models
Copy link

The OpenShift Virtualization Operator uses a predefined list of obsolete CPU models to ensure that a node supports only valid CPU models for scheduled VMs.

By default, the following CPU models are eliminated from the list of labels generated for the node:

Example 11.1. Obsolete CPU models

"486"
Conroe
athlon
core2duo
coreduo
kvm32
kvm64
n270
pentium
pentium2
pentium3
pentiumpro
phenom
qemu32
qemu64

This predefined list is not visible in the HyperConverged CR. You cannot remove CPU models from this list, but you can add to the list by editing the spec.obsoleteCPUs.cpuModels field of the HyperConverged CR.

11.2.2. About node labeling for CPU features
Copy link

Through the process of iteration, the base CPU features in the minimum CPU model are eliminated from the list of labels generated for the node.

For example:

An environment might have two supported CPU models: Penryn and Haswell.

If Penryn is specified as the CPU model for minCPU, each base CPU feature for Penryn is compared to the list of CPU features supported by Haswell.

Example 11.2. CPU features supported by Penryn

apic
clflush
cmov
cx16
cx8
de
fpu
fxsr
lahf_lm
lm
mca
mce
mmx
msr
mtrr
nx
pae
pat
pge
pni
pse
pse36
sep
sse
sse2
sse4.1
ssse3
syscall
tsc

Example 11.3. CPU features supported by Haswell

aes
apic
avx
avx2
bmi1
bmi2
clflush
cmov
cx16
cx8
de
erms
fma
fpu
fsgsbase
fxsr
hle
invpcid
lahf_lm
lm
mca
mce
mmx
movbe
msr
mtrr
nx
pae
pat
pcid
pclmuldq
pge
pni
popcnt
pse
pse36
rdtscp
rtm
sep
smep
sse
sse2
sse4.1
sse4.2
ssse3
syscall
tsc
tsc-deadline
x2apic
xsave

If both Penryn and Haswell support a specific CPU feature, a label is not created for that feature. Labels are generated for CPU features that are supported only by Haswell and not by Penryn.
Example 11.4. Node labels created for CPU features after iteration
aes avx avx2 bmi1 bmi2 erms fma fsgsbase hle invpcid movbe pcid pclmuldq popcnt rdtscp rtm sse4.2 tsc-deadline x2apic xsave

11.2.3. Configuring obsolete CPU models
Copy link

You can configure a list of obsolete CPU models by editing the HyperConverged custom resource (CR).

Procedure

Edit the HyperConverged custom resource, specifying the obsolete CPU models in the obsoleteCPUs array. For example:
```
apiVersion: hco.kubevirt.io/v1beta1
kind: HyperConverged
metadata:
  name: kubevirt-hyperconverged
  namespace: openshift-cnv
spec:
  obsoleteCPUs:
    cpuModels: 
```
1
```
      - "<obsolete_cpu_1>"
      - "<obsolete_cpu_2>"
    minCPUModel: "<minimum_cpu_model>" 
```
2
1
Replace the example values in the cpuModels array with obsolete CPU models. Any value that you specify is added to a predefined list of obsolete CPU models. The predefined list is not visible in the CR.
2
Replace this value with the minimum CPU model that you want to use for basic CPU features. If you do not specify a value, Penryn is used by default.

11.3. Preventing node reconciliation
Copy link

Use skip-node annotation to prevent the node-labeller from reconciling a node.

11.3.1. Using skip-node annotation
Copy link

If you want the node-labeller to skip a node, annotate that node by using the OpenShift CLI (oc).

Prerequisites

You have installed the OpenShift CLI (oc).

Procedure

Annotate the node that you want to skip by running the following command:
```
$ oc annotate node <node_name> node-labeller.kubevirt.io/skip-node=true
```
Replace <node_name> with the name of the relevant node to skip.
Reconciliation resumes on the next cycle after the node annotation is removed or set to false.

11.4. Deleting a failed node to trigger virtual machine failover
Copy link

If a node fails and node health checks are not deployed on your cluster, virtual machines (VMs) with runStrategy: Always configured are not automatically relocated to healthy nodes.

11.4.1. Prerequisites
Copy link

A node where a virtual machine was running has the NotReady condition.
The virtual machine that was running on the failed node has runStrategy set to Always.
You have installed the OpenShift CLI (oc).

11.4.2. Deleting nodes from a bare metal cluster
Copy link

You can delete a node from a OpenShift Container Platform cluster that does not use machine sets by using the oc delete node command and decommissioning the node.

When you delete a node using the CLI, the node object is deleted in Kubernetes, but the pods that exist on the node are not deleted. Any bare pods not backed by a replication controller become inaccessible to OpenShift Container Platform. Pods backed by replication controllers are rescheduled to other available nodes. You must delete local manifest pods.

The following procedure deletes a node from an OpenShift Container Platform cluster running on bare metal.

Procedure

Mark the node as unschedulable:
```
$ oc adm cordon <node_name>
```
Drain all pods on the node:
```
$ oc adm drain <node_name> --force=true
```
This step might fail if the node is offline or unresponsive. Even if the node does not respond, the node might still be running a workload that writes to shared storage. To avoid data corruption, power down the physical hardware before you proceed.
Delete the node from the cluster:
```
$ oc delete node <node_name>
```
Although the node object is now deleted from the cluster, it can still rejoin the cluster after reboot or if the kubelet service is restarted. To permanently delete the node and all its data, you must decommission the node.
If you powered down the physical hardware, turn it back on so that the node can rejoin the cluster.

11.4.3. Verifying virtual machine failover
Copy link

After all resources are terminated on the unhealthy node, a new virtual machine instance (VMI) is automatically created on a healthy node for each relocated VM. To confirm that the VMI was created, view all VMIs by using the oc CLI.

11.4.3.1. Listing all virtual machine instances using the CLI
Copy link

You can list all virtual machine instances (VMIs) in your cluster, including standalone VMIs and those owned by virtual machines, by using the oc command-line interface (CLI).

Procedure

List all VMIs by running the following command:
```
$ oc get vmis -A
```

11.1. Node maintenance
Copy link

11.1.1. Eviction strategies
Copy link

11.1.1.1. Configuring a VM eviction strategy using the command line
Copy link

11.1.1.2. Configuring a cluster eviction strategy by using the command line
Copy link

11.1.2. Run strategies
Copy link

11.1.2.1. Run strategies
Copy link

11.1.2.2. Configuring a VM run strategy by using the command line
Copy link

11.1.3. Maintaining bare metal nodes
Copy link

11.2. Managing node labeling for obsolete CPU models
Copy link

11.2.1. About node labeling for obsolete CPU models
Copy link

11.2.2. About node labeling for CPU features
Copy link

11.2.3. Configuring obsolete CPU models
Copy link

11.3. Preventing node reconciliation
Copy link

11.3.1. Using skip-node annotation
Copy link

11.4. Deleting a failed node to trigger virtual machine failover
Copy link

11.4.1. Prerequisites
Copy link

11.4.2. Deleting nodes from a bare metal cluster
Copy link

11.4.3. Verifying virtual machine failover
Copy link

11.4.3.1. Listing all virtual machine instances using the CLI
Copy link

Learn

Try, buy, & sell

Communities

About Red Hat

Making open source more inclusive

About Red Hat Documentation

Theme

Red Hat legal and privacy links

Red Hat legal and privacy links

Chapter 11. Nodes

11.1. Node maintenanceCopy linkLink copied to clipboard!

11.1.1. Eviction strategiesCopy linkLink copied to clipboard!

11.1.1.1. Configuring a VM eviction strategy using the command lineCopy linkLink copied to clipboard!

11.1.1.2. Configuring a cluster eviction strategy by using the command lineCopy linkLink copied to clipboard!

11.1.2. Run strategiesCopy linkLink copied to clipboard!

11.1.2.1. Run strategiesCopy linkLink copied to clipboard!

11.1.2.2. Configuring a VM run strategy by using the command lineCopy linkLink copied to clipboard!

11.1.3. Maintaining bare metal nodesCopy linkLink copied to clipboard!

11.2. Managing node labeling for obsolete CPU modelsCopy linkLink copied to clipboard!

11.2.1. About node labeling for obsolete CPU modelsCopy linkLink copied to clipboard!

11.2.2. About node labeling for CPU featuresCopy linkLink copied to clipboard!

11.2.3. Configuring obsolete CPU modelsCopy linkLink copied to clipboard!

11.3. Preventing node reconciliationCopy linkLink copied to clipboard!

11.3.1. Using skip-node annotationCopy linkLink copied to clipboard!

11.4. Deleting a failed node to trigger virtual machine failoverCopy linkLink copied to clipboard!

11.4.1. PrerequisitesCopy linkLink copied to clipboard!

11.4.2. Deleting nodes from a bare metal clusterCopy linkLink copied to clipboard!

11.4.3. Verifying virtual machine failoverCopy linkLink copied to clipboard!

11.4.3.1. Listing all virtual machine instances using the CLICopy linkLink copied to clipboard!

Learn

Try, buy, & sell

Communities

About Red Hat

Making open source more inclusive

About Red Hat Documentation

Theme

Red Hat legal and privacy links

Red Hat legal and privacy links

11.1. Node maintenance
Copy link

11.1.1. Eviction strategies
Copy link

11.1.1.1. Configuring a VM eviction strategy using the command line
Copy link

11.1.1.2. Configuring a cluster eviction strategy by using the command line
Copy link

11.1.2. Run strategies
Copy link

11.1.2.1. Run strategies
Copy link

11.1.2.2. Configuring a VM run strategy by using the command line
Copy link

11.1.3. Maintaining bare metal nodes
Copy link

11.2. Managing node labeling for obsolete CPU models
Copy link

11.2.1. About node labeling for obsolete CPU models
Copy link

11.2.2. About node labeling for CPU features
Copy link

11.2.3. Configuring obsolete CPU models
Copy link

11.3. Preventing node reconciliation
Copy link

11.3.1. Using skip-node annotation
Copy link

11.4. Deleting a failed node to trigger virtual machine failover
Copy link

11.4.1. Prerequisites
Copy link

11.4.2. Deleting nodes from a bare metal cluster
Copy link

11.4.3. Verifying virtual machine failover
Copy link

11.4.3.1. Listing all virtual machine instances using the CLI
Copy link