Chapter 5. Postinstallation configuration


5.1. Postinstallation configuration

The following procedures are typically performed after OpenShift Virtualization is installed. You can configure the components that are relevant for your environment:

5.2. Specifying nodes for OpenShift Virtualization components

The default scheduling for virtual machines (VMs) on bare metal nodes is appropriate. Optionally, you can specify the nodes where you want to deploy OpenShift Virtualization Operators, workloads, and controllers by configuring node placement rules.

Note

You can configure node placement rules for some components after installing OpenShift Virtualization, but virtual machines cannot be present if you want to configure node placement rules for workloads.

5.2.1. About node placement rules for OpenShift Virtualization components

You can use node placement rules for the following tasks:

  • Deploy virtual machines only on nodes intended for virtualization workloads.
  • Deploy Operators only on infrastructure nodes.
  • Maintain separation between workloads.

Depending on the object, you can use one or more of the following rule types:

nodeSelector
Allows pods to be scheduled on nodes that are labeled with the key-value pair or pairs that you specify in this field. The node must have labels that exactly match all listed pairs.
affinity
Enables you to use more expressive syntax to set rules that match nodes with pods. Affinity also allows for more nuance in how the rules are applied. For example, you can specify that a rule is a preference, not a requirement. If a rule is a preference, pods are still scheduled when the rule is not satisfied.
tolerations
Allows pods to be scheduled on nodes that have matching taints. If a taint is applied to a node, that node only accepts pods that tolerate the taint.

5.2.2. Applying node placement rules

You can apply node placement rules by editing a Subscription, HyperConverged, or HostPathProvisioner object using the command line.

Prerequisites

  • The oc CLI tool is installed.
  • You are logged in with cluster administrator permissions.

Procedure

  1. Edit the object in your default editor by running the following command:

    $ oc edit <resource_type> <resource_name> -n {CNVNamespace}
  2. Save the file to apply the changes.

5.2.3. Node placement rule examples

You can specify node placement rules for a OpenShift Virtualization component by editing a Subscription, HyperConverged, or HostPathProvisioner object.

5.2.3.1. Subscription object node placement rule examples

To specify the nodes where OLM deploys the OpenShift Virtualization Operators, edit the Subscription object during OpenShift Virtualization installation.

Currently, you cannot configure node placement rules for the Subscription object by using the web console.

The Subscription object does not support the affinity node pplacement rule.

Example Subscription object with nodeSelector rule

apiVersion: operators.coreos.com/v1alpha1
kind: Subscription
metadata:
  name: hco-operatorhub
  namespace: openshift-cnv
spec:
  source: redhat-operators
  sourceNamespace: openshift-marketplace
  name: kubevirt-hyperconverged
  startingCSV: kubevirt-hyperconverged-operator.v4.16.5
  channel: "stable"
  config:
    nodeSelector:
      example.io/example-infra-key: example-infra-value 1

1
OLM deploys the OpenShift Virtualization Operators on nodes labeled example.io/example-infra-key = example-infra-value.

Example Subscription object with tolerations rule

apiVersion: operators.coreos.com/v1alpha1
kind: Subscription
metadata:
  name: hco-operatorhub
  namespace: openshift-cnv
spec:
  source:  redhat-operators
  sourceNamespace: openshift-marketplace
  name: kubevirt-hyperconverged
  startingCSV: kubevirt-hyperconverged-operator.v4.16.5
  channel: "stable"
  config:
    tolerations:
    - key: "key"
      operator: "Equal"
      value: "virtualization" 1
      effect: "NoSchedule"

1
OLM deploys OpenShift Virtualization Operators on nodes labeled key = virtualization:NoSchedule taint. Only pods with the matching tolerations are scheduled on these nodes.

5.2.3.2. HyperConverged object node placement rule example

To specify the nodes where OpenShift Virtualization deploys its components, you can edit the nodePlacement object in the HyperConverged custom resource (CR) file that you create during OpenShift Virtualization installation.

Example HyperConverged object with nodeSelector rule

apiVersion: hco.kubevirt.io/v1beta1
kind: HyperConverged
metadata:
  name: kubevirt-hyperconverged
  namespace: openshift-cnv
spec:
  infra:
    nodePlacement:
      nodeSelector:
        example.io/example-infra-key: example-infra-value 1
  workloads:
    nodePlacement:
      nodeSelector:
        example.io/example-workloads-key: example-workloads-value 2

1
Infrastructure resources are placed on nodes labeled example.io/example-infra-key = example-infra-value.
2
workloads are placed on nodes labeled example.io/example-workloads-key = example-workloads-value.

Example HyperConverged object with affinity rule

apiVersion: hco.kubevirt.io/v1beta1
kind: HyperConverged
metadata:
  name: kubevirt-hyperconverged
  namespace: openshift-cnv
spec:
  infra:
    nodePlacement:
      affinity:
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
            - matchExpressions:
              - key: example.io/example-infra-key
                operator: In
                values:
                - example-infra-value 1
  workloads:
    nodePlacement:
      affinity:
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
            - matchExpressions:
              - key: example.io/example-workloads-key 2
                operator: In
                values:
                - example-workloads-value
          preferredDuringSchedulingIgnoredDuringExecution:
          - weight: 1
            preference:
              matchExpressions:
              - key: example.io/num-cpus
                operator: Gt
                values:
                - 8 3

1
Infrastructure resources are placed on nodes labeled example.io/example-infra-key = example-value.
2
workloads are placed on nodes labeled example.io/example-workloads-key = example-workloads-value.
3
Nodes that have more than eight CPUs are preferred for workloads, but if they are not available, pods are still scheduled.

Example HyperConverged object with tolerations rule

apiVersion: hco.kubevirt.io/v1beta1
kind: HyperConverged
metadata:
  name: kubevirt-hyperconverged
  namespace: openshift-cnv
spec:
  workloads:
    nodePlacement:
      tolerations: 1
      - key: "key"
        operator: "Equal"
        value: "virtualization"
        effect: "NoSchedule"

1
Nodes reserved for OpenShift Virtualization components are labeled with the key = virtualization:NoSchedule taint. Only pods with matching tolerations are scheduled on reserved nodes.

5.2.3.3. HostPathProvisioner object node placement rule example

You can edit the HostPathProvisioner object directly or by using the web console.

Warning

You must schedule the hostpath provisioner and the OpenShift Virtualization components on the same nodes. Otherwise, virtualization pods that use the hostpath provisioner cannot run. You cannot run virtual machines.

After you deploy a virtual machine (VM) with the hostpath provisioner (HPP) storage class, you can remove the hostpath provisioner pod from the same node by using the node selector. However, you must first revert that change, at least for that specific node, and wait for the pod to run before trying to delete the VM.

You can configure node placement rules by specifying nodeSelector, affinity, or tolerations for the spec.workload field of the HostPathProvisioner object that you create when you install the hostpath provisioner.

Example HostPathProvisioner object with nodeSelector rule

apiVersion: hostpathprovisioner.kubevirt.io/v1beta1
kind: HostPathProvisioner
metadata:
  name: hostpath-provisioner
spec:
  imagePullPolicy: IfNotPresent
  pathConfig:
    path: "</path/to/backing/directory>"
    useNamingPrefix: false
  workload:
    nodeSelector:
      example.io/example-workloads-key: example-workloads-value 1

1
Workloads are placed on nodes labeled example.io/example-workloads-key = example-workloads-value.

5.2.4. Additional resources

5.3. Postinstallation network configuration

By default, OpenShift Virtualization is installed with a single, internal pod network.

After you install OpenShift Virtualization, you can install networking Operators and configure additional networks.

5.3.1. Installing networking Operators

You must install the Kubernetes NMState Operator to configure a Linux bridge network for live migration or external access to virtual machines (VMs). For installation instructions, see Installing the Kubernetes NMState Operator by using the web console.

You can install the SR-IOV Operator to manage SR-IOV network devices and network attachments. For installation instructions, see Installing the SR-IOV Network Operator.

You can add the About MetalLB and the MetalLB Operator to manage the lifecycle for an instance of MetalLB on your cluster. For installation instructions, see Installing the MetalLB Operator from the OperatorHub using the web console.

5.3.2. Configuring a Linux bridge network

After you install the Kubernetes NMState Operator, you can configure a Linux bridge network for live migration or external access to virtual machines (VMs).

5.3.2.1. Creating a Linux bridge NNCP

You can create a NodeNetworkConfigurationPolicy (NNCP) manifest for a Linux bridge network.

Prerequisites

  • You have installed the Kubernetes NMState Operator.

Procedure

  • Create the NodeNetworkConfigurationPolicy manifest. This example includes sample values that you must replace with your own information.

    apiVersion: nmstate.io/v1
    kind: NodeNetworkConfigurationPolicy
    metadata:
      name: br1-eth1-policy 1
    spec:
      desiredState:
        interfaces:
          - name: br1 2
            description: Linux bridge with eth1 as a port 3
            type: linux-bridge 4
            state: up 5
            ipv4:
              enabled: false 6
            bridge:
              options:
                stp:
                  enabled: false 7
              port:
                - name: eth1 8
    1
    Name of the policy.
    2
    Name of the interface.
    3
    Optional: Human-readable description of the interface.
    4
    The type of interface. This example creates a bridge.
    5
    The requested state for the interface after creation.
    6
    Disables IPv4 in this example.
    7
    Disables STP in this example.
    8
    The node NIC to which the bridge is attached.

5.3.2.2. Creating a Linux bridge NAD by using the web console

You can create a network attachment definition (NAD) to provide layer-2 networking to pods and virtual machines by using the OpenShift Container Platform web console.

A Linux bridge network attachment definition is the most efficient method for connecting a virtual machine to a VLAN.

Warning

Configuring IP address management (IPAM) in a network attachment definition for virtual machines is not supported.

Procedure

  1. In the web console, click Networking NetworkAttachmentDefinitions.
  2. Click Create Network Attachment Definition.

    Note

    The network attachment definition must be in the same namespace as the pod or virtual machine.

  3. Enter a unique Name and optional Description.
  4. Select CNV Linux bridge from the Network Type list.
  5. Enter the name of the bridge in the Bridge Name field.
  6. Optional: If the resource has VLAN IDs configured, enter the ID numbers in the VLAN Tag Number field.
  7. Optional: Select MAC Spoof Check to enable MAC spoof filtering. This feature provides security against a MAC spoofing attack by allowing only a single MAC address to exit the pod.
  8. Click Create.

5.3.3. Configuring a network for live migration

After you have configured a Linux bridge network, you can configure a dedicated network for live migration. A dedicated network minimizes the effects of network saturation on tenant workloads during live migration.

5.3.3.1. Configuring a dedicated secondary network for live migration

To configure a dedicated secondary network for live migration, you must first create a bridge network attachment definition (NAD) by using the CLI. Then, you add the name of the NetworkAttachmentDefinition object to the HyperConverged custom resource (CR).

Prerequisites

  • You installed the OpenShift CLI (oc).
  • You logged in to the cluster as a user with the cluster-admin role.
  • Each node has at least two Network Interface Cards (NICs).
  • The NICs for live migration are connected to the same VLAN.

Procedure

  1. Create a NetworkAttachmentDefinition manifest according to the following example:

    Example configuration file

    apiVersion: "k8s.cni.cncf.io/v1"
    kind: NetworkAttachmentDefinition
    metadata:
      name: my-secondary-network 1
      namespace: openshift-cnv
    spec:
      config: '{
        "cniVersion": "0.3.1",
        "name": "migration-bridge",
        "type": "macvlan",
        "master": "eth1", 2
        "mode": "bridge",
        "ipam": {
          "type": "whereabouts", 3
          "range": "10.200.5.0/24" 4
        }
      }'

    1
    Specify the name of the NetworkAttachmentDefinition object.
    2
    Specify the name of the NIC to be used for live migration.
    3
    Specify the name of the CNI plugin that provides the network for the NAD.
    4
    Specify an IP address range for the secondary network. This range must not overlap the IP addresses of the main network.
  2. Open the HyperConverged CR in your default editor by running the following command:

    oc edit hyperconverged kubevirt-hyperconverged -n openshift-cnv
  3. Add the name of the NetworkAttachmentDefinition object to the spec.liveMigrationConfig stanza of the HyperConverged CR:

    Example HyperConverged manifest

    apiVersion: hco.kubevirt.io/v1beta1
    kind: HyperConverged
    metadata:
      name: kubevirt-hyperconverged
    spec:
      liveMigrationConfig:
        completionTimeoutPerGiB: 800
        network: <network> 1
        parallelMigrationsPerCluster: 5
        parallelOutboundMigrationsPerNode: 2
        progressTimeout: 150
    # ...

    1
    Specify the name of the Multus NetworkAttachmentDefinition object to be used for live migrations.
  4. Save your changes and exit the editor. The virt-handler pods restart and connect to the secondary network.

Verification

  • When the node that the virtual machine runs on is placed into maintenance mode, the VM automatically migrates to another node in the cluster. You can verify that the migration occurred over the secondary network and not the default pod network by checking the target IP address in the virtual machine instance (VMI) metadata.

    $ oc get vmi <vmi_name> -o jsonpath='{.status.migrationState.targetNodeAddress}'

5.3.3.2. Selecting a dedicated network by using the web console

You can select a dedicated network for live migration by using the OpenShift Container Platform web console.

Prerequisites

  • You configured a Multus network for live migration.
  • You created a network attachment definition for the network.

Procedure

  1. Navigate to Virtualization > Overview in the OpenShift Container Platform web console.
  2. Click the Settings tab and then click Live migration.
  3. Select the network from the Live migration network list.

5.3.4. Configuring an SR-IOV network

After you install the SR-IOV Operator, you can configure an SR-IOV network.

5.3.4.1. Configuring SR-IOV network devices

The SR-IOV Network Operator adds the SriovNetworkNodePolicy.sriovnetwork.openshift.io CustomResourceDefinition to OpenShift Container Platform. You can configure an SR-IOV network device by creating a SriovNetworkNodePolicy custom resource (CR).

Note

When applying the configuration specified in a SriovNetworkNodePolicy object, the SR-IOV Operator might drain the nodes, and in some cases, reboot nodes. Reboot only happens in the following cases:

  • With Mellanox NICs (mlx5 driver) a node reboot happens every time the number of virtual functions (VFs) increase on a physical function (PF).
  • With Intel NICs, a reboot only happens if the kernel parameters do not include intel_iommu=on and iommu=pt.

It might take several minutes for a configuration change to apply.

Prerequisites

  • You installed the OpenShift CLI (oc).
  • You have access to the cluster as a user with the cluster-admin role.
  • You have installed the SR-IOV Network Operator.
  • You have enough available nodes in your cluster to handle the evicted workload from drained nodes.
  • You have not selected any control plane nodes for SR-IOV network device configuration.

Procedure

  1. Create an SriovNetworkNodePolicy object, and then save the YAML in the <name>-sriov-node-network.yaml file. Replace <name> with the name for this configuration.

    apiVersion: sriovnetwork.openshift.io/v1
    kind: SriovNetworkNodePolicy
    metadata:
      name: <name> 1
      namespace: openshift-sriov-network-operator 2
    spec:
      resourceName: <sriov_resource_name> 3
      nodeSelector:
        feature.node.kubernetes.io/network-sriov.capable: "true" 4
      priority: <priority> 5
      mtu: <mtu> 6
      numVfs: <num> 7
      nicSelector: 8
        vendor: "<vendor_code>" 9
        deviceID: "<device_id>" 10
        pfNames: ["<pf_name>", ...] 11
        rootDevices: ["<pci_bus_id>", "..."] 12
      deviceType: vfio-pci 13
      isRdma: false 14
    1
    Specify a name for the CR object.
    2
    Specify the namespace where the SR-IOV Operator is installed.
    3
    Specify the resource name of the SR-IOV device plugin. You can create multiple SriovNetworkNodePolicy objects for a resource name.
    4
    Specify the node selector to select which nodes are configured. Only SR-IOV network devices on selected nodes are configured. The SR-IOV Container Network Interface (CNI) plugin and device plugin are deployed only on selected nodes.
    5
    Optional: Specify an integer value between 0 and 99. A smaller number gets higher priority, so a priority of 10 is higher than a priority of 99. The default value is 99.
    6
    Optional: Specify a value for the maximum transmission unit (MTU) of the virtual function. The maximum MTU value can vary for different NIC models.
    7
    Specify the number of the virtual functions (VF) to create for the SR-IOV physical network device. For an Intel network interface controller (NIC), the number of VFs cannot be larger than the total VFs supported by the device. For a Mellanox NIC, the number of VFs cannot be larger than 127.
    8
    The nicSelector mapping selects the Ethernet device for the Operator to configure. You do not need to specify values for all the parameters. It is recommended to identify the Ethernet adapter with enough precision to minimize the possibility of selecting an Ethernet device unintentionally. If you specify rootDevices, you must also specify a value for vendor, deviceID, or pfNames. If you specify both pfNames and rootDevices at the same time, ensure that they point to an identical device.
    9
    Optional: Specify the vendor hex code of the SR-IOV network device. The only allowed values are either 8086 or 15b3.
    10
    Optional: Specify the device hex code of SR-IOV network device. The only allowed values are 158b, 1015, 1017.
    11
    Optional: The parameter accepts an array of one or more physical function (PF) names for the Ethernet device.
    12
    The parameter accepts an array of one or more PCI bus addresses for the physical function of the Ethernet device. Provide the address in the following format: 0000:02:00.1.
    13
    The vfio-pci driver type is required for virtual functions in OpenShift Virtualization.
    14
    Optional: Specify whether to enable remote direct memory access (RDMA) mode. For a Mellanox card, set isRdma to false. The default value is false.
    Note

    If isRDMA flag is set to true, you can continue to use the RDMA enabled VF as a normal network device. A device can be used in either mode.

  2. Optional: Label the SR-IOV capable cluster nodes with SriovNetworkNodePolicy.Spec.NodeSelector if they are not already labeled. For more information about labeling nodes, see "Understanding how to update labels on nodes".
  3. Create the SriovNetworkNodePolicy object:

    $ oc create -f <name>-sriov-node-network.yaml

    where <name> specifies the name for this configuration.

    After applying the configuration update, all the pods in sriov-network-operator namespace transition to the Running status.

  4. To verify that the SR-IOV network device is configured, enter the following command. Replace <node_name> with the name of a node with the SR-IOV network device that you just configured.

    $ oc get sriovnetworknodestates -n openshift-sriov-network-operator <node_name> -o jsonpath='{.status.syncStatus}'

5.3.5. Enabling load balancer service creation by using the web console

You can enable the creation of load balancer services for a virtual machine (VM) by using the OpenShift Container Platform web console.

Prerequisites

  • You have configured a load balancer for the cluster.
  • You are logged in as a user with the cluster-admin role.
  • You created a network attachment definition for the network.

Procedure

  1. Navigate to Virtualization Overview.
  2. On the Settings tab, click Cluster.
  3. Expand General settings and SSH configuration.
  4. Set SSH over LoadBalancer service to on.

5.4. Postinstallation storage configuration

The following storage configuration tasks are mandatory:

  • You must configure a default storage class for your cluster. Otherwise, the cluster cannot receive automated boot source updates.
  • You must configure storage profiles if your storage provider is not recognized by CDI. A storage profile provides recommended storage settings based on the associated storage class.

Optional: You can configure local storage by using the hostpath provisioner (HPP).

See the storage configuration overview for more options, including configuring the Containerized Data Importer (CDI), data volumes, and automatic boot source updates.

5.4.1. Configuring local storage by using the HPP

When you install the OpenShift Virtualization Operator, the Hostpath Provisioner (HPP) Operator is automatically installed. The HPP Operator creates the HPP provisioner.

The HPP is a local storage provisioner designed for OpenShift Virtualization. To use the HPP, you must create an HPP custom resource (CR).

Important

HPP storage pools must not be in the same partition as the operating system. Otherwise, the storage pools might fill the operating system partition. If the operating system partition is full, performance can be effected or the node can become unstable or unusable.

5.4.1.1. Creating a storage class for the CSI driver with the storagePools stanza

To use the hostpath provisioner (HPP) you must create an associated storage class for the Container Storage Interface (CSI) driver.

When you create a storage class, you set parameters that affect the dynamic provisioning of persistent volumes (PVs) that belong to that storage class. You cannot update a StorageClass object’s parameters after you create it.

Note

Virtual machines use data volumes that are based on local PVs. Local PVs are bound to specific nodes. While a disk image is prepared for consumption by the virtual machine, it is possible that the virtual machine cannot be scheduled to the node where the local storage PV was previously pinned.

To solve this problem, use the Kubernetes pod scheduler to bind the persistent volume claim (PVC) to a PV on the correct node. By using the StorageClass value with volumeBindingMode parameter set to WaitForFirstConsumer, the binding and provisioning of the PV is delayed until a pod is created using the PVC.

Procedure

  1. Create a storageclass_csi.yaml file to define the storage class:

    apiVersion: storage.k8s.io/v1
    kind: StorageClass
    metadata:
      name: hostpath-csi
    provisioner: kubevirt.io.hostpath-provisioner
    reclaimPolicy: Delete 1
    volumeBindingMode: WaitForFirstConsumer 2
    parameters:
      storagePool: my-storage-pool 3
    1
    The two possible reclaimPolicy values are Delete and Retain. If you do not specify a value, the default value is Delete.
    2
    The volumeBindingMode parameter determines when dynamic provisioning and volume binding occur. Specify WaitForFirstConsumer to delay the binding and provisioning of a persistent volume (PV) until after a pod that uses the persistent volume claim (PVC) is created. This ensures that the PV meets the pod’s scheduling requirements.
    3
    Specify the name of the storage pool defined in the HPP CR.
  2. Save the file and exit.
  3. Create the StorageClass object by running the following command:

    $ oc create -f storageclass_csi.yaml

5.5. Configuring higher VM workload density

To increase the number of virtual machines (VMs), you can configure a higher VM workload density in your cluster by overcommitting the amount of memory (RAM).

Important

Configuring higher workload density is a Technology Preview feature only. Technology Preview features are not supported with Red Hat production service level agreements (SLAs) and might not be functionally complete. Red Hat does not recommend using them in production. These features provide early access to upcoming product features, enabling customers to test functionality and provide feedback during the development process.

For more information about the support scope of Red Hat Technology Preview features, see Technology Preview Features Support Scope.

The following workloads are especially suited for higher workload density:

  • Many similar workloads
  • Underused workloads
Note

While overcommitted memory can lead to a higher workload density, it can also lower workload performance of a highly utilized system.

5.5.1. Using wasp-agent to configure higher VM workload density

The wasp-agent component enables an OpenShift Container Platform cluster to assign swap resources to virtual machine (VM) workloads. Swap usage is only supported on worker nodes.

Important

Swap resources can be only assigned to virtual machine workloads (VM pods) of the Burstable Quality of Service (QoS) class. VM pods of the Guaranteed QoS class and pods of any QoS class that do not belong to VMs cannot swap resources.

For descriptions of QoS classes, see Configure Quality of Service for Pods (Kubernetes documentation).

Prerequisites

  • The oc tool is available.
  • You are logged into the cluster with the cluster-admin role.
  • A memory over-commit ratio is defined.
  • The node belongs to a worker pool.

Procedure

  1. Create a privileged service account by entering the following commands:

    $ oc adm new-project wasp
    $ oc create sa -n wasp wasp
    $ oc create clusterrolebinding wasp --clusterrole=cluster-admin --serviceaccount=wasp:wasp
    $ oc adm policy add-scc-to-user -n wasp privileged -z wasp
    Note

    The wasp-agent component deploys an OCI hook to enable swap usage for containers on the node level. The low-level nature requires the DaemonSet object to be privileged.

  2. Deploy wasp-agent by creating a DaemonSet object as follows:

    kind: DaemonSet
    apiVersion: apps/v1
    metadata:
      name: wasp-agent
      namespace: wasp
      labels:
        app: wasp
        tier: node
    spec:
      selector:
        matchLabels:
          name: wasp
      template:
        metadata:
          annotations:
            description: >-
              Configures swap for workloads
          labels:
              name: wasp
        spec:
          serviceAccountName: wasp
          hostPID: true
          hostUsers: true
          terminationGracePeriodSeconds: 5
          containers:
            - name: wasp-agent
              image: >-
                registry.redhat.io/container-native-virtualization/wasp-agent-rhel9:v4.16
              imagePullPolicy: Always
              env:
              - name: "FSROOT"
                value: "/host"
              resources:
                requests:
                  cpu: 100m
                  memory: 50M
              securityContext:
                privileged: true
              volumeMounts:
              - name: host
                mountPath: "/host"
          volumes:
          - name: host
            hostPath:
              path: "/"
          priorityClassName: system-node-critical
      updateStrategy:
        type: RollingUpdate
        rollingUpdate:
          maxUnavailable: 10%
          maxSurge: 0
    status: {}
  3. Configure the kubelet service to permit swap:

    1. Create a KubeletConfiguration file as shown in the example:

      Example of a KubeletConfiguration file

      apiVersion: machineconfiguration.openshift.io/v1
      kind: KubeletConfig
      metadata:
        name: custom-config
      spec:
        machineConfigPoolSelector:
          matchLabels:
            pools.operator.machineconfiguration.openshift.io/worker: ''  # MCP
            #machine.openshift.io/cluster-api-machine-role: worker # machine
            #node-role.kubernetes.io/worker: '' # node
        kubeletConfig:
          failSwapOn: false
          evictionSoft:
            memory.available: "1Gi"
          evictionSoftGracePeriod:
            memory.available: "10s"

      If the cluster is already using an existing KubeletConfiguration file, add the following to the spec section:

      apiVersion: machineconfiguration.openshift.io/v1
      kind: KubeletConfig
      metadata:
        name: custom-config
      # ...
      spec
      # ...
          kubeletConfig:
            evictionSoft:
              memory.available: 1Gi
            evictionSoftGracePeriod:
              memory.available: 1m30s
            failSwapOn: false
    2. Run the following command:

      $ oc wait mcp worker --for condition=Updated=True
  4. Create a MachineConfig object to provision swap as follows:

    apiVersion: machineconfiguration.openshift.io/v1
    kind: MachineConfig
    metadata:
      labels:
        machineconfiguration.openshift.io/role: worker
      name: 90-worker-swap
    spec:
      config:
        ignition:
          version: 3.4.0
        systemd:
          units:
          - contents: |
              [Unit]
              Description=Provision and enable swap
              ConditionFirstBoot=no
    
              [Service]
              Type=oneshot
              Environment=SWAP_SIZE_MB=5000
              ExecStart=/bin/sh -c "sudo dd if=/dev/zero of=/var/tmp/swapfile count=${SWAP_SIZE_MB} bs=1M && \
              sudo chmod 600 /var/tmp/swapfile && \
              sudo mkswap /var/tmp/swapfile && \
              sudo swapon /var/tmp/swapfile && \
              free -h && \
              sudo systemctl set-property --runtime system.slice MemorySwapMax=0 IODeviceLatencyTargetSec=\"/ 50ms\""
    
              [Install]
              RequiredBy=kubelet-dependencies.target
            enabled: true
            name: swap-provision.service

    To have enough swap space for the worst-case scenario, make sure to have at least as much swap space provisioned as overcommitted RAM. Calculate the amount of swap space to be provisioned on a node using the following formula:

    NODE_SWAP_SPACE = NODE_RAM * (MEMORY_OVER_COMMIT_PERCENT / 100% - 1)

    Example:

    NODE_SWAP_SPACE = 16 GB * (150% / 100% - 1)
                    = 16 GB * (1.5 - 1)
                    = 16 GB * (0.5)
                    =  8 GB
  5. Deploy alerting rules as follows:

    apiVersion: monitoring.openshift.io/v1
    kind: AlertingRule
    metadata:
      name: wasp-alerts
      namespace: openshift-monitoring
    spec:
      groups:
      - name: wasp.rules
        rules:
        - alert: NodeSwapping
          annotations:
            description: Node {{ $labels.instance }} is swapping at a rate of {{ printf "%.2f" $value }} MB/s
            runbook_url: https://github.com/openshift-virtualization/wasp-agent/tree/main/runbooks/alerts/NodeSwapping.md
            summary: A node is swapping memory pages
          expr: |
            # In MB/s
            irate(node_memory_SwapFree_bytes{job="node-exporter"}[5m]) / 1024^2 > 0
          for: 1m
          labels:
            severity: critical
  6. Configure OpenShift Virtualization to use memory overcommit either by using the OpenShift Container Platform web console or by editing the HyperConverged custom resource (CR) file as shown in the following example.

    Example:

    apiVersion: hco.kubevirt.io/v1beta1
    kind: HyperConverged
    metadata:
      name: kubevirt-hyperconverged
      namespace: openshift-cnv
    spec:
      higherWorkloadDensity:
        memoryOvercommitPercentage: 150
  7. Apply all the configurations to compute nodes in your cluster by entering the following command:

    $ oc patch --type=merge \
      -f <../manifests/hco-set-memory-overcommit.yaml> \
      --patch-file <../manifests/hco-set-memory-overcommit.yaml>
    Note

    After applying all configurations, the swap feature is fully available only after all MachineConfigPool rollouts are complete.

Verification

  1. To verify the deployment of wasp-agent, run the following command:

    $  oc rollout status ds wasp-agent -n wasp

    If the deployment is successful, the following message is displayed:

    daemon set "wasp-agent" successfully rolled out
  2. To verify that swap is correctly provisioned, do the following:

    1. Run the following command:

      $ oc get nodes -l node-role.kubernetes.io/worker
    2. Select a node from the provided list and run the following command:

      $ oc debug node/<selected-node> -- free -m

      If swap is provisioned correctly, an amount greater than zero is displayed, similar to the following:

       

      total

      used

      free

      shared

      buff/cache

      available

      Mem:

      31846

      23155

      1044

      6014

      14483

      8690

      Swap:

      8191

      2337

      5854

         
  3. Verify the OpenShift Virtualization memory overcommitment configuration by running the following command:

    $ oc get -n openshift-cnv HyperConverged kubevirt-hyperconverged -o jsonpath="{.spec.higherWorkloadDensity.memoryOvercommitPercentage}"
    150

    The returned value, for example 150, must match the value you had previously configured.

Red Hat logoGithubRedditYoutubeTwitter

Learn

Try, buy, & sell

Communities

About Red Hat Documentation

We help Red Hat users innovate and achieve their goals with our products and services with content they can trust.

Making open source more inclusive

Red Hat is committed to replacing problematic language in our code, documentation, and web properties. For more details, see the Red Hat Blog.

About Red Hat

We deliver hardened solutions that make it easier for enterprises to work across platforms and environments, from the core datacenter to the network edge.

© 2024 Red Hat, Inc.