Chapter 5. Postinstallation configuration
5.1. Postinstallation configuration
The following procedures are typically performed after OpenShift Virtualization is installed. You can configure the components that are relevant for your environment:
- Node placement rules for OpenShift Virtualization Operators, workloads, and controllers
- Installing the Kubernetes NMState and SR-IOV Operators
- Configuring a Linux bridge network for external access to virtual machines (VMs)
- Configuring a dedicated secondary network for live migration
- Configuring an SR-IOV network
- Enabling the creation of load balancer services by using the OpenShift Container Platform web console
- Defining a default storage class for the Container Storage Interface (CSI)
- Configuring local storage by using the Hostpath Provisioner (HPP)
5.2. Specifying nodes for OpenShift Virtualization components
The default scheduling for virtual machines (VMs) on bare metal nodes is appropriate. Optionally, you can specify the nodes where you want to deploy OpenShift Virtualization Operators, workloads, and controllers by configuring node placement rules.
You can configure node placement rules for some components after installing OpenShift Virtualization, but virtual machines cannot be present if you want to configure node placement rules for workloads.
5.2.1. About node placement rules for OpenShift Virtualization components
You can use node placement rules for the following tasks:
- Deploy virtual machines only on nodes intended for virtualization workloads.
- Deploy Operators only on infrastructure nodes.
- Maintain separation between workloads.
Depending on the object, you can use one or more of the following rule types:
nodeSelector
- Allows pods to be scheduled on nodes that are labeled with the key-value pair or pairs that you specify in this field. The node must have labels that exactly match all listed pairs.
affinity
- Enables you to use more expressive syntax to set rules that match nodes with pods. Affinity also allows for more nuance in how the rules are applied. For example, you can specify that a rule is a preference, not a requirement. If a rule is a preference, pods are still scheduled when the rule is not satisfied.
tolerations
- Allows pods to be scheduled on nodes that have matching taints. If a taint is applied to a node, that node only accepts pods that tolerate the taint.
5.2.2. Applying node placement rules
You can apply node placement rules by editing a Subscription
, HyperConverged
, or HostPathProvisioner
object using the command line.
Prerequisites
-
The
oc
CLI tool is installed. - You are logged in with cluster administrator permissions.
Procedure
Edit the object in your default editor by running the following command:
$ oc edit <resource_type> <resource_name> -n {CNVNamespace}
- Save the file to apply the changes.
5.2.3. Node placement rule examples
You can specify node placement rules for a OpenShift Virtualization component by editing a Subscription
, HyperConverged
, or HostPathProvisioner
object.
5.2.3.1. Subscription object node placement rule examples
To specify the nodes where OLM deploys the OpenShift Virtualization Operators, edit the Subscription
object during OpenShift Virtualization installation.
Currently, you cannot configure node placement rules for the Subscription
object by using the web console.
The Subscription
object does not support the affinity
node pplacement rule.
Example Subscription
object with nodeSelector
rule
apiVersion: operators.coreos.com/v1alpha1
kind: Subscription
metadata:
name: hco-operatorhub
namespace: openshift-cnv
spec:
source: redhat-operators
sourceNamespace: openshift-marketplace
name: kubevirt-hyperconverged
startingCSV: kubevirt-hyperconverged-operator.v4.16.3
channel: "stable"
config:
nodeSelector:
example.io/example-infra-key: example-infra-value 1
- 1
- OLM deploys the OpenShift Virtualization Operators on nodes labeled
example.io/example-infra-key = example-infra-value
.
Example Subscription
object with tolerations
rule
apiVersion: operators.coreos.com/v1alpha1
kind: Subscription
metadata:
name: hco-operatorhub
namespace: openshift-cnv
spec:
source: redhat-operators
sourceNamespace: openshift-marketplace
name: kubevirt-hyperconverged
startingCSV: kubevirt-hyperconverged-operator.v4.16.3
channel: "stable"
config:
tolerations:
- key: "key"
operator: "Equal"
value: "virtualization" 1
effect: "NoSchedule"
- 1
- OLM deploys OpenShift Virtualization Operators on nodes labeled
key = virtualization:NoSchedule
taint. Only pods with the matching tolerations are scheduled on these nodes.
5.2.3.2. HyperConverged object node placement rule example
To specify the nodes where OpenShift Virtualization deploys its components, you can edit the nodePlacement
object in the HyperConverged custom resource (CR) file that you create during OpenShift Virtualization installation.
Example HyperConverged
object with nodeSelector
rule
apiVersion: hco.kubevirt.io/v1beta1 kind: HyperConverged metadata: name: kubevirt-hyperconverged namespace: openshift-cnv spec: infra: nodePlacement: nodeSelector: example.io/example-infra-key: example-infra-value 1 workloads: nodePlacement: nodeSelector: example.io/example-workloads-key: example-workloads-value 2
Example HyperConverged
object with affinity
rule
apiVersion: hco.kubevirt.io/v1beta1 kind: HyperConverged metadata: name: kubevirt-hyperconverged namespace: openshift-cnv spec: infra: nodePlacement: affinity: nodeAffinity: requiredDuringSchedulingIgnoredDuringExecution: nodeSelectorTerms: - matchExpressions: - key: example.io/example-infra-key operator: In values: - example-infra-value 1 workloads: nodePlacement: affinity: nodeAffinity: requiredDuringSchedulingIgnoredDuringExecution: nodeSelectorTerms: - matchExpressions: - key: example.io/example-workloads-key 2 operator: In values: - example-workloads-value preferredDuringSchedulingIgnoredDuringExecution: - weight: 1 preference: matchExpressions: - key: example.io/num-cpus operator: Gt values: - 8 3
- 1
- Infrastructure resources are placed on nodes labeled
example.io/example-infra-key = example-value
. - 2
- workloads are placed on nodes labeled
example.io/example-workloads-key = example-workloads-value
. - 3
- Nodes that have more than eight CPUs are preferred for workloads, but if they are not available, pods are still scheduled.
Example HyperConverged
object with tolerations
rule
apiVersion: hco.kubevirt.io/v1beta1
kind: HyperConverged
metadata:
name: kubevirt-hyperconverged
namespace: openshift-cnv
spec:
workloads:
nodePlacement:
tolerations: 1
- key: "key"
operator: "Equal"
value: "virtualization"
effect: "NoSchedule"
- 1
- Nodes reserved for OpenShift Virtualization components are labeled with the
key = virtualization:NoSchedule
taint. Only pods with matching tolerations are scheduled on reserved nodes.
5.2.3.3. HostPathProvisioner object node placement rule example
You can edit the HostPathProvisioner
object directly or by using the web console.
You must schedule the hostpath provisioner and the OpenShift Virtualization components on the same nodes. Otherwise, virtualization pods that use the hostpath provisioner cannot run. You cannot run virtual machines.
After you deploy a virtual machine (VM) with the hostpath provisioner (HPP) storage class, you can remove the hostpath provisioner pod from the same node by using the node selector. However, you must first revert that change, at least for that specific node, and wait for the pod to run before trying to delete the VM.
You can configure node placement rules by specifying nodeSelector
, affinity
, or tolerations
for the spec.workload
field of the HostPathProvisioner
object that you create when you install the hostpath provisioner.
Example HostPathProvisioner
object with nodeSelector
rule
apiVersion: hostpathprovisioner.kubevirt.io/v1beta1
kind: HostPathProvisioner
metadata:
name: hostpath-provisioner
spec:
imagePullPolicy: IfNotPresent
pathConfig:
path: "</path/to/backing/directory>"
useNamingPrefix: false
workload:
nodeSelector:
example.io/example-workloads-key: example-workloads-value 1
- 1
- Workloads are placed on nodes labeled
example.io/example-workloads-key = example-workloads-value
.
5.2.4. Additional resources
5.3. Postinstallation network configuration
By default, OpenShift Virtualization is installed with a single, internal pod network.
After you install OpenShift Virtualization, you can install networking Operators and configure additional networks.
5.3.1. Installing networking Operators
You must install the Kubernetes NMState Operator to configure a Linux bridge network for live migration or external access to virtual machines (VMs). For installation instructions, see Installing the Kubernetes NMState Operator by using the web console.
You can install the SR-IOV Operator to manage SR-IOV network devices and network attachments. For installation instructions, see Installing the SR-IOV Network Operator.
You can add the MetalLB Operator to manage the lifecycle for an instance of MetalLB on your cluster. For installation instructions, see Installing the MetalLB Operator from the OperatorHub using the web console.
5.3.2. Configuring a Linux bridge network
After you install the Kubernetes NMState Operator, you can configure a Linux bridge network for live migration or external access to virtual machines (VMs).
5.3.2.1. Creating a Linux bridge NNCP
You can create a NodeNetworkConfigurationPolicy
(NNCP) manifest for a Linux bridge network.
Prerequisites
- You have installed the Kubernetes NMState Operator.
Procedure
Create the
NodeNetworkConfigurationPolicy
manifest. This example includes sample values that you must replace with your own information.apiVersion: nmstate.io/v1 kind: NodeNetworkConfigurationPolicy metadata: name: br1-eth1-policy 1 spec: desiredState: interfaces: - name: br1 2 description: Linux bridge with eth1 as a port 3 type: linux-bridge 4 state: up 5 ipv4: enabled: false 6 bridge: options: stp: enabled: false 7 port: - name: eth1 8
- 1
- Name of the policy.
- 2
- Name of the interface.
- 3
- Optional: Human-readable description of the interface.
- 4
- The type of interface. This example creates a bridge.
- 5
- The requested state for the interface after creation.
- 6
- Disables IPv4 in this example.
- 7
- Disables STP in this example.
- 8
- The node NIC to which the bridge is attached.
5.3.2.2. Creating a Linux bridge NAD by using the web console
You can create a network attachment definition (NAD) to provide layer-2 networking to pods and virtual machines by using the OpenShift Container Platform web console.
A Linux bridge network attachment definition is the most efficient method for connecting a virtual machine to a VLAN.
Configuring IP address management (IPAM) in a network attachment definition for virtual machines is not supported.
Procedure
-
In the web console, click Networking
NetworkAttachmentDefinitions. Click Create Network Attachment Definition.
NoteThe network attachment definition must be in the same namespace as the pod or virtual machine.
- Enter a unique Name and optional Description.
- Select CNV Linux bridge from the Network Type list.
- Enter the name of the bridge in the Bridge Name field.
- Optional: If the resource has VLAN IDs configured, enter the ID numbers in the VLAN Tag Number field.
- Optional: Select MAC Spoof Check to enable MAC spoof filtering. This feature provides security against a MAC spoofing attack by allowing only a single MAC address to exit the pod.
- Click Create.
5.3.3. Configuring a network for live migration
After you have configured a Linux bridge network, you can configure a dedicated network for live migration. A dedicated network minimizes the effects of network saturation on tenant workloads during live migration.
5.3.3.1. Configuring a dedicated secondary network for live migration
To configure a dedicated secondary network for live migration, you must first create a bridge network attachment definition (NAD) by using the CLI. Then, you add the name of the NetworkAttachmentDefinition
object to the HyperConverged
custom resource (CR).
Prerequisites
-
You installed the OpenShift CLI (
oc
). -
You logged in to the cluster as a user with the
cluster-admin
role. - Each node has at least two Network Interface Cards (NICs).
- The NICs for live migration are connected to the same VLAN.
Procedure
Create a
NetworkAttachmentDefinition
manifest according to the following example:Example configuration file
apiVersion: "k8s.cni.cncf.io/v1" kind: NetworkAttachmentDefinition metadata: name: my-secondary-network 1 namespace: openshift-cnv 2 spec: config: '{ "cniVersion": "0.3.1", "name": "migration-bridge", "type": "macvlan", "master": "eth1", 3 "mode": "bridge", "ipam": { "type": "whereabouts", 4 "range": "10.200.5.0/24" 5 } }'
- 1
- Specify the name of the
NetworkAttachmentDefinition
object. - 2 3
- Specify the name of the NIC to be used for live migration.
- 4
- Specify the name of the CNI plugin that provides the network for the NAD.
- 5
- Specify an IP address range for the secondary network. This range must not overlap the IP addresses of the main network.
Open the
HyperConverged
CR in your default editor by running the following command:oc edit hyperconverged kubevirt-hyperconverged -n openshift-cnv
Add the name of the
NetworkAttachmentDefinition
object to thespec.liveMigrationConfig
stanza of theHyperConverged
CR:Example
HyperConverged
manifestapiVersion: hco.kubevirt.io/v1beta1 kind: HyperConverged metadata: name: kubevirt-hyperconverged spec: liveMigrationConfig: completionTimeoutPerGiB: 800 network: <network> 1 parallelMigrationsPerCluster: 5 parallelOutboundMigrationsPerNode: 2 progressTimeout: 150 # ...
- 1
- Specify the name of the Multus
NetworkAttachmentDefinition
object to be used for live migrations.
-
Save your changes and exit the editor. The
virt-handler
pods restart and connect to the secondary network.
Verification
When the node that the virtual machine runs on is placed into maintenance mode, the VM automatically migrates to another node in the cluster. You can verify that the migration occurred over the secondary network and not the default pod network by checking the target IP address in the virtual machine instance (VMI) metadata.
$ oc get vmi <vmi_name> -o jsonpath='{.status.migrationState.targetNodeAddress}'
5.3.3.2. Selecting a dedicated network by using the web console
You can select a dedicated network for live migration by using the OpenShift Container Platform web console.
Prerequisites
- You configured a Multus network for live migration.
Procedure
- Navigate to Virtualization > Overview in the OpenShift Container Platform web console.
- Click the Settings tab and then click Live migration.
- Select the network from the Live migration network list.
5.3.4. Configuring an SR-IOV network
After you install the SR-IOV Operator, you can configure an SR-IOV network.
5.3.4.1. Configuring SR-IOV network devices
The SR-IOV Network Operator adds the SriovNetworkNodePolicy.sriovnetwork.openshift.io
CustomResourceDefinition to OpenShift Container Platform. You can configure an SR-IOV network device by creating a SriovNetworkNodePolicy custom resource (CR).
When applying the configuration specified in a SriovNetworkNodePolicy
object, the SR-IOV Operator might drain the nodes, and in some cases, reboot nodes. Reboot only happens in the following cases:
-
With Mellanox NICs (
mlx5
driver) a node reboot happens every time the number of virtual functions (VFs) increase on a physical function (PF). -
With Intel NICs, a reboot only happens if the kernel parameters do not include
intel_iommu=on
andiommu=pt
.
It might take several minutes for a configuration change to apply.
Prerequisites
-
You installed the OpenShift CLI (
oc
). -
You have access to the cluster as a user with the
cluster-admin
role. - You have installed the SR-IOV Network Operator.
- You have enough available nodes in your cluster to handle the evicted workload from drained nodes.
- You have not selected any control plane nodes for SR-IOV network device configuration.
Procedure
Create an
SriovNetworkNodePolicy
object, and then save the YAML in the<name>-sriov-node-network.yaml
file. Replace<name>
with the name for this configuration.apiVersion: sriovnetwork.openshift.io/v1 kind: SriovNetworkNodePolicy metadata: name: <name> 1 namespace: openshift-sriov-network-operator 2 spec: resourceName: <sriov_resource_name> 3 nodeSelector: feature.node.kubernetes.io/network-sriov.capable: "true" 4 priority: <priority> 5 mtu: <mtu> 6 numVfs: <num> 7 nicSelector: 8 vendor: "<vendor_code>" 9 deviceID: "<device_id>" 10 pfNames: ["<pf_name>", ...] 11 rootDevices: ["<pci_bus_id>", "..."] 12 deviceType: vfio-pci 13 isRdma: false 14
- 1
- Specify a name for the CR object.
- 2
- Specify the namespace where the SR-IOV Operator is installed.
- 3
- Specify the resource name of the SR-IOV device plugin. You can create multiple
SriovNetworkNodePolicy
objects for a resource name. - 4
- Specify the node selector to select which nodes are configured. Only SR-IOV network devices on selected nodes are configured. The SR-IOV Container Network Interface (CNI) plugin and device plugin are deployed only on selected nodes.
- 5
- Optional: Specify an integer value between
0
and99
. A smaller number gets higher priority, so a priority of10
is higher than a priority of99
. The default value is99
. - 6
- Optional: Specify a value for the maximum transmission unit (MTU) of the virtual function. The maximum MTU value can vary for different NIC models.
- 7
- Specify the number of the virtual functions (VF) to create for the SR-IOV physical network device. For an Intel network interface controller (NIC), the number of VFs cannot be larger than the total VFs supported by the device. For a Mellanox NIC, the number of VFs cannot be larger than
127
. - 8
- The
nicSelector
mapping selects the Ethernet device for the Operator to configure. You do not need to specify values for all the parameters. It is recommended to identify the Ethernet adapter with enough precision to minimize the possibility of selecting an Ethernet device unintentionally. If you specifyrootDevices
, you must also specify a value forvendor
,deviceID
, orpfNames
. If you specify bothpfNames
androotDevices
at the same time, ensure that they point to an identical device. - 9
- Optional: Specify the vendor hex code of the SR-IOV network device. The only allowed values are either
8086
or15b3
. - 10
- Optional: Specify the device hex code of SR-IOV network device. The only allowed values are
158b
,1015
,1017
. - 11
- Optional: The parameter accepts an array of one or more physical function (PF) names for the Ethernet device.
- 12
- The parameter accepts an array of one or more PCI bus addresses for the physical function of the Ethernet device. Provide the address in the following format:
0000:02:00.1
. - 13
- The
vfio-pci
driver type is required for virtual functions in OpenShift Virtualization. - 14
- Optional: Specify whether to enable remote direct memory access (RDMA) mode. For a Mellanox card, set
isRdma
tofalse
. The default value isfalse
.
NoteIf
isRDMA
flag is set totrue
, you can continue to use the RDMA enabled VF as a normal network device. A device can be used in either mode.-
Optional: Label the SR-IOV capable cluster nodes with
SriovNetworkNodePolicy.Spec.NodeSelector
if they are not already labeled. For more information about labeling nodes, see "Understanding how to update labels on nodes". Create the
SriovNetworkNodePolicy
object:$ oc create -f <name>-sriov-node-network.yaml
where
<name>
specifies the name for this configuration.After applying the configuration update, all the pods in
sriov-network-operator
namespace transition to theRunning
status.To verify that the SR-IOV network device is configured, enter the following command. Replace
<node_name>
with the name of a node with the SR-IOV network device that you just configured.$ oc get sriovnetworknodestates -n openshift-sriov-network-operator <node_name> -o jsonpath='{.status.syncStatus}'
5.3.5. Enabling load balancer service creation by using the web console
You can enable the creation of load balancer services for a virtual machine (VM) by using the OpenShift Container Platform web console.
Prerequisites
- You have configured a load balancer for the cluster.
-
You are logged in as a user with the
cluster-admin
role.
Procedure
-
Navigate to Virtualization
Overview. - On the Settings tab, click Cluster.
- Expand General settings and SSH configuration.
- Set SSH over LoadBalancer service to on.
5.4. Postinstallation storage configuration
The following storage configuration tasks are mandatory:
- You must configure a default storage class for your cluster. Otherwise, the cluster cannot receive automated boot source updates.
- You must configure storage profiles if your storage provider is not recognized by CDI. A storage profile provides recommended storage settings based on the associated storage class.
Optional: You can configure local storage by using the hostpath provisioner (HPP).
See the storage configuration overview for more options, including configuring the Containerized Data Importer (CDI), data volumes, and automatic boot source updates.
5.4.1. Configuring local storage by using the HPP
When you install the OpenShift Virtualization Operator, the Hostpath Provisioner (HPP) Operator is automatically installed. The HPP Operator creates the HPP provisioner.
The HPP is a local storage provisioner designed for OpenShift Virtualization. To use the HPP, you must create an HPP custom resource (CR).
HPP storage pools must not be in the same partition as the operating system. Otherwise, the storage pools might fill the operating system partition. If the operating system partition is full, performance can be effected or the node can become unstable or unusable.
5.4.1.1. Creating a storage class for the CSI driver with the storagePools stanza
To use the hostpath provisioner (HPP) you must create an associated storage class for the Container Storage Interface (CSI) driver.
When you create a storage class, you set parameters that affect the dynamic provisioning of persistent volumes (PVs) that belong to that storage class. You cannot update a StorageClass
object’s parameters after you create it.
Virtual machines use data volumes that are based on local PVs. Local PVs are bound to specific nodes. While a disk image is prepared for consumption by the virtual machine, it is possible that the virtual machine cannot be scheduled to the node where the local storage PV was previously pinned.
To solve this problem, use the Kubernetes pod scheduler to bind the persistent volume claim (PVC) to a PV on the correct node. By using the StorageClass
value with volumeBindingMode
parameter set to WaitForFirstConsumer
, the binding and provisioning of the PV is delayed until a pod is created using the PVC.
Procedure
Create a
storageclass_csi.yaml
file to define the storage class:apiVersion: storage.k8s.io/v1 kind: StorageClass metadata: name: hostpath-csi provisioner: kubevirt.io.hostpath-provisioner reclaimPolicy: Delete 1 volumeBindingMode: WaitForFirstConsumer 2 parameters: storagePool: my-storage-pool 3
- 1
- The two possible
reclaimPolicy
values areDelete
andRetain
. If you do not specify a value, the default value isDelete
. - 2
- The
volumeBindingMode
parameter determines when dynamic provisioning and volume binding occur. SpecifyWaitForFirstConsumer
to delay the binding and provisioning of a persistent volume (PV) until after a pod that uses the persistent volume claim (PVC) is created. This ensures that the PV meets the pod’s scheduling requirements. - 3
- Specify the name of the storage pool defined in the HPP CR.
- Save the file and exit.
Create the
StorageClass
object by running the following command:$ oc create -f storageclass_csi.yaml
5.5. Configuring higher VM workload density
To increase the number of virtual machines (VMs), you can configure a higher VM workload density in your cluster by overcommitting the amount of memory (RAM).
Configuring higher workload density is a Technology Preview feature only. Technology Preview features are not supported with Red Hat production service level agreements (SLAs) and might not be functionally complete. Red Hat does not recommend using them in production. These features provide early access to upcoming product features, enabling customers to test functionality and provide feedback during the development process.
For more information about the support scope of Red Hat Technology Preview features, see Technology Preview Features Support Scope.
The following workloads are especially suited for higher workload density:
- Many similar workloads
- Underused workloads
While overcommitted memory can lead to a higher workload density, it can also lower workload performance of a highly utilized system.
5.5.1. Using wasp-agent
to configure higher VM workload density
The wasp-agent
component enables an OpenShift Container Platform cluster to assign swap resources to virtual machine (VM) workloads. Swap usage is only supported on worker nodes.
Swap resources can be only assigned to virtual machine workloads (VM pods) of the Burstable
Quality of Service (QoS) class. VM pods of the Guaranteed
QoS class and pods of any QoS class that do not belong to VMs cannot swap resources.
For descriptions of QoS classes, see Configure Quality of Service for Pods (Kubernetes documentation).
Prerequisites
-
The
oc
tool is available. - You are logged into the cluster with the cluster-admin role.
- A memory over-commit ratio is defined.
- The node belongs to a worker pool.
Procedure
Create a privileged service account by entering the following commands:
$ oc adm new-project wasp
$ oc create sa -n wasp wasp
$ oc create clusterrolebinding wasp --clusterrole=cluster-admin --serviceaccount=wasp:wasp
$ oc adm policy add-scc-to-user -n wasp privileged -z wasp
NoteThe
wasp-agent
component deploys an OCI hook to enable swap usage for containers on the node level. The low-level nature requires theDaemonSet
object to be privileged.Deploy
wasp-agent
by creating aDaemonSet
object as follows:kind: DaemonSet apiVersion: apps/v1 metadata: name: wasp-agent namespace: wasp labels: app: wasp tier: node spec: selector: matchLabels: name: wasp template: metadata: annotations: description: >- Configures swap for workloads labels: name: wasp spec: serviceAccountName: wasp hostPID: true hostUsers: true terminationGracePeriodSeconds: 5 containers: - name: wasp-agent image: >- registry.redhat.io/container-native-virtualization/wasp-agent-rhel9:v4.16 imagePullPolicy: Always env: - name: "FSROOT" value: "/host" resources: requests: cpu: 100m memory: 50M securityContext: privileged: true volumeMounts: - name: host mountPath: "/host" volumes: - name: host hostPath: path: "/" priorityClassName: system-node-critical updateStrategy: type: RollingUpdate rollingUpdate: maxUnavailable: 10% maxSurge: 0 status: {}
Configure the
kubelet
service to permit swap:Create a
KubeletConfiguration
file as shown in the example:Example of a
KubeletConfiguration
fileapiVersion: machineconfiguration.openshift.io/v1 kind: KubeletConfig metadata: name: custom-config spec: machineConfigPoolSelector: matchLabels: pools.operator.machineconfiguration.openshift.io/worker: '' # MCP #machine.openshift.io/cluster-api-machine-role: worker # machine #node-role.kubernetes.io/worker: '' # node kubeletConfig: failSwapOn: false evictionSoft: memory.available: "1Gi" evictionSoftGracePeriod: memory.available: "10s"
If the cluster is already using an existing
KubeletConfiguration
file, add the following to thespec
section:apiVersion: machineconfiguration.openshift.io/v1 kind: KubeletConfig metadata: name: custom-config # ... spec # ... kubeletConfig: evictionSoft: memory.available: 1Gi evictionSoftGracePeriod: memory.available: 1m30s failSwapOn: false
Run the following command:
$ oc wait mcp worker --for condition=Updated=True
Create a
MachineConfig
object to provision swap as follows:apiVersion: machineconfiguration.openshift.io/v1 kind: MachineConfig metadata: labels: machineconfiguration.openshift.io/role: worker name: 90-worker-swap spec: config: ignition: version: 3.4.0 systemd: units: - contents: | [Unit] Description=Provision and enable swap ConditionFirstBoot=no [Service] Type=oneshot Environment=SWAP_SIZE_MB=5000 ExecStart=/bin/sh -c "sudo dd if=/dev/zero of=/var/tmp/swapfile count=${SWAP_SIZE_MB} bs=1M && \ sudo chmod 600 /var/tmp/swapfile && \ sudo mkswap /var/tmp/swapfile && \ sudo swapon /var/tmp/swapfile && \ free -h && \ sudo systemctl set-property --runtime system.slice MemorySwapMax=0 IODeviceLatencyTargetSec=\"/ 50ms\"" [Install] RequiredBy=kubelet-dependencies.target enabled: true name: swap-provision.service
To have enough swap space for the worst-case scenario, make sure to have at least as much swap space provisioned as overcommitted RAM. Calculate the amount of swap space to be provisioned on a node using the following formula:
NODE_SWAP_SPACE = NODE_RAM * (MEMORY_OVER_COMMIT_PERCENT / 100% - 1)
Example:
NODE_SWAP_SPACE = 16 GB * (150% / 100% - 1) = 16 GB * (1.5 - 1) = 16 GB * (0.5) = 8 GB
Deploy alerting rules as follows:
apiVersion: monitoring.openshift.io/v1 kind: AlertingRule metadata: name: wasp-alerts namespace: openshift-monitoring spec: groups: - name: wasp.rules rules: - alert: NodeSwapping annotations: description: Node {{ $labels.instance }} is swapping at a rate of {{ printf "%.2f" $value }} MB/s runbook_url: https://github.com/openshift-virtualization/wasp-agent/tree/main/runbooks/alerts/NodeSwapping.md summary: A node is swapping memory pages expr: | # In MB/s irate(node_memory_SwapFree_bytes{job="node-exporter"}[5m]) / 1024^2 > 0 for: 1m labels: severity: critical
Configure OpenShift Virtualization to use memory overcommit either by using the OpenShift Container Platform web console or by editing the HyperConverged custom resource (CR) file as shown in the following example.
Example:
apiVersion: hco.kubevirt.io/v1beta1 kind: HyperConverged metadata: name: kubevirt-hyperconverged namespace: openshift-cnv spec: higherWorkloadDensity: memoryOvercommitPercentage: 150
Apply all the configurations to compute nodes in your cluster by entering the following command:
$ oc patch --type=merge \ -f <../manifests/hco-set-memory-overcommit.yaml> \ --patch-file <../manifests/hco-set-memory-overcommit.yaml>
NoteAfter applying all configurations, the swap feature is fully available only after all
MachineConfigPool
rollouts are complete.
Verification
To verify the deployment of
wasp-agent
, run the following command:$ oc rollout status ds wasp-agent -n wasp
If the deployment is successful, the following message is displayed:
daemon set "wasp-agent" successfully rolled out
To verify that swap is correctly provisioned, do the following:
Run the following command:
$ oc get nodes -l node-role.kubernetes.io/worker
Select a node from the provided list and run the following command:
$ oc debug node/<selected-node> -- free -m
If swap is provisioned correctly, an amount greater than zero is displayed, similar to the following:
total
used
free
shared
buff/cache
available
Mem:
31846
23155
1044
6014
14483
8690
Swap:
8191
2337
5854
Verify the OpenShift Virtualization memory overcommitment configuration by running the following command:
$ oc get -n openshift-cnv HyperConverged kubevirt-hyperconverged -o jsonpath="{.spec.higherWorkloadDensity.memoryOvercommitPercentage}" 150
The returned value, for example
150
, must match the value you had previously configured.