Chapter 12. Migrating virtual machines


If the current host of a virtual machine (VM) becomes unsuitable or cannot be used anymore, or if you want to redistribute the hosting workload, you can migrate the VM to another KVM host.

12.1. How migrating virtual machines works

The essential part of virtual machine (VM) migration is copying the XML configuration of a VM to a different host machine. If the migrated VM is not shut down, the migration also transfers the state of the VM’s memory and any virtualized devices to a destination host machine. For the VM to remain functional on the destination host, the VM’s disk images must remain available to it.

By default, the migrated VM is transient on the destination host, and remains defined also on the source host.

You can migrate a running VM by using live or non-live migrations. To migrate a shut-off VM, you must use an offline migration. For details, see the following table.

Table 12.1. VM migration types
Migration typeDescriptionUse caseStorage requirements

Live migration

The VM continues to run on the source host machine while KVM is transferring the VM’s memory pages to the destination host. When the migration is nearly complete, KVM very briefly suspends the VM, and resumes it on the destination host.

Useful for VMs that require constant uptime. However, VMs that modify memory pages faster than KVM can transfer them, such as VMs under heavy I/O load, cannot be live-migrated, and non-live migration must be used instead.

The VM’s disk images must be accessible both to the source host and the destination host during the migration. (1)

Non-live migration

Suspends the VM, copies its configuration and its memory to the destination host, and resumes the VM.

Creates downtime for the VM, but is generally more reliable than live migration. Recommended for VMs under heavy memory load.

The VM’s disk images must be accessible both to the source host and the destination host during the migration. (1)

Hybrid migration

Combines live migration and non-live migration. You suspend the source VM during live migration, which prevents additional dirty memory pages from being generated. As a result, the migration is significantly more likely to complete.

Recommended for example when live-migrating a VM that uses very many vCPUs or a large amount of memory, which prevents the migration from completing.

Based on guest workload and the number of static pages during migration, a hybrid migration might also cause significantly less downtime than a non-live migration.

The VM’s disk images must be accessible both to the source host and the destination host during the migration. (1)

Offline migration

Moves the VM’s configuration to the destination host

Recommended for shut-off VMs and in situations when shutting down the VM does not disrupt your workloads.

The VM’s disk images do not have to be available on a shared network, and can be copied or moved manually to the destination host instead.

(1) To achieve this, use one of the following:

  • Storage located on a shared network
  • The --copy-storage-all parameter for the virsh migrate command, which copies disk image contents from the source to the destination over the network.
  • Storage area network (SAN) logical units (LUNs).
  • Ceph storage clusters

12.2. Benefits of migrating virtual machines

Migrating virtual machines (VMs) can be useful for:

Load balancing
VMs can be moved to host machines with lower usage if their host becomes overloaded, or if another host is under-utilized.
Hardware independence
When you need to upgrade, add, or remove hardware devices on the host machine, you can safely relocate VMs to other hosts. This means that VMs do not experience any downtime for hardware improvements.
Energy saving
VMs can be redistributed to other hosts, and the unloaded host systems can thus be powered off to save energy and cut costs during low usage periods.
Geographic migration
VMs can be moved to another physical location for lower latency or when required for other reasons.

12.3. Limitations for migrating virtual machines

Before migrating virtual machines (VMs) in RHEL 9, ensure you are aware of the migration’s limitations.

  • Migrating VMs from or to a session connection of libvirt is unreliable and therefore not recommended.
  • VMs that use certain features and configurations will not work correctly if migrated, or the migration will fail. Such features include:

    • Device passthrough
    • SR-IOV device assignment
    • Mediated devices, such as vGPUs
  • A migration between hosts that use Non-Uniform Memory Access (NUMA) pinning works only if the hosts have similar topology. However, the performance on running workloads might be negatively affected by the migration.
  • The emulated CPUs, both on the source VM and the destination VM, must be identical, otherwise the migration might fail. Any differences between the VMs in the following CPU related areas can cause problems with the migration:

    • CPU model

    • Firmware settings
    • Microcode version
    • BIOS version
    • BIOS settings
    • QEMU version
    • Kernel version
  • Live migrating a VM that uses more than 1 TB of memory might in some cases be unreliable. For instructions on how to prevent or fix this problem, see Live migration of a VM takes a long time without completing.

12.4. Verifying host CPU compatibility for virtual machine migration

For migrated virtual machines (VMs) to work correctly on the destination host, the CPUs on the source and the destination hosts must be compatible. To ensure that this is the case, calculate a common CPU baseline before you begin the migration.

Note

The instructions in this section use an example migration scenario with the following host CPUs:

  • Source host: Intel Core i7-8650U
  • Destination hosts: Intel Xeon CPU E5-2620 v2

Prerequisites

  • Virtualization is installed and enabled on your system.
  • You have administrator access to the source host and the destination host for the migration.

Procedure

  1. On the source host, obtain its CPU features and paste them into a new XML file, such as domCaps-CPUs.xml.

    # virsh domcapabilities | xmllint --xpath "//cpu/mode[@name='host-model']" - > domCaps-CPUs.xml
  2. In the XML file, replace the <mode> </mode> tags with <cpu> </cpu>.
  3. Optional: Verify that the content of the domCaps-CPUs.xml file looks similar to the following:

    # cat domCaps-CPUs.xml
    
        <cpu>
              <model fallback="forbid">Skylake-Client-IBRS</model>
              <vendor>Intel</vendor>
              <feature policy="require" name="ss"/>
              <feature policy="require" name="vmx"/>
              <feature policy="require" name="pdcm"/>
              <feature policy="require" name="hypervisor"/>
              <feature policy="require" name="tsc_adjust"/>
              <feature policy="require" name="clflushopt"/>
              <feature policy="require" name="umip"/>
              <feature policy="require" name="md-clear"/>
              <feature policy="require" name="stibp"/>
              <feature policy="require" name="arch-capabilities"/>
              <feature policy="require" name="ssbd"/>
              <feature policy="require" name="xsaves"/>
              <feature policy="require" name="pdpe1gb"/>
              <feature policy="require" name="invtsc"/>
              <feature policy="require" name="ibpb"/>
              <feature policy="require" name="ibrs"/>
              <feature policy="require" name="amd-stibp"/>
              <feature policy="require" name="amd-ssbd"/>
              <feature policy="require" name="rsba"/>
              <feature policy="require" name="skip-l1dfl-vmentry"/>
              <feature policy="require" name="pschange-mc-no"/>
              <feature policy="disable" name="hle"/>
              <feature policy="disable" name="rtm"/>
        </cpu>
  4. On the destination host, use the following command to obtain its CPU features:

    # virsh domcapabilities | xmllint --xpath "//cpu/mode[@name='host-model']" -
    
        <mode name="host-model" supported="yes">
                <model fallback="forbid">IvyBridge-IBRS</model>
                <vendor>Intel</vendor>
                <feature policy="require" name="ss"/>
                <feature policy="require" name="vmx"/>
                <feature policy="require" name="pdcm"/>
                <feature policy="require" name="pcid"/>
                <feature policy="require" name="hypervisor"/>
                <feature policy="require" name="arat"/>
                <feature policy="require" name="tsc_adjust"/>
                <feature policy="require" name="umip"/>
                <feature policy="require" name="md-clear"/>
                <feature policy="require" name="stibp"/>
                <feature policy="require" name="arch-capabilities"/>
                <feature policy="require" name="ssbd"/>
                <feature policy="require" name="xsaveopt"/>
                <feature policy="require" name="pdpe1gb"/>
                <feature policy="require" name="invtsc"/>
                <feature policy="require" name="ibpb"/>
                <feature policy="require" name="amd-ssbd"/>
                <feature policy="require" name="skip-l1dfl-vmentry"/>
                <feature policy="require" name="pschange-mc-no"/>
        </mode>
  5. Add the obtained CPU features from the destination host to the domCaps-CPUs.xml file on the source host. Again, replace the <mode> </mode> tags with <cpu> </cpu> and save the file.
  6. Optional: Verify that the XML file now contains the CPU features from both hosts.

    # cat domCaps-CPUs.xml
    
        <cpu>
              <model fallback="forbid">Skylake-Client-IBRS</model>
              <vendor>Intel</vendor>
              <feature policy="require" name="ss"/>
              <feature policy="require" name="vmx"/>
              <feature policy="require" name="pdcm"/>
              <feature policy="require" name="hypervisor"/>
              <feature policy="require" name="tsc_adjust"/>
              <feature policy="require" name="clflushopt"/>
              <feature policy="require" name="umip"/>
              <feature policy="require" name="md-clear"/>
              <feature policy="require" name="stibp"/>
              <feature policy="require" name="arch-capabilities"/>
              <feature policy="require" name="ssbd"/>
              <feature policy="require" name="xsaves"/>
              <feature policy="require" name="pdpe1gb"/>
              <feature policy="require" name="invtsc"/>
              <feature policy="require" name="ibpb"/>
              <feature policy="require" name="ibrs"/>
              <feature policy="require" name="amd-stibp"/>
              <feature policy="require" name="amd-ssbd"/>
              <feature policy="require" name="rsba"/>
              <feature policy="require" name="skip-l1dfl-vmentry"/>
              <feature policy="require" name="pschange-mc-no"/>
              <feature policy="disable" name="hle"/>
              <feature policy="disable" name="rtm"/>
        </cpu>
        <cpu>
              <model fallback="forbid">IvyBridge-IBRS</model>
              <vendor>Intel</vendor>
              <feature policy="require" name="ss"/>
              <feature policy="require" name="vmx"/>
              <feature policy="require" name="pdcm"/>
              <feature policy="require" name="pcid"/>
              <feature policy="require" name="hypervisor"/>
              <feature policy="require" name="arat"/>
              <feature policy="require" name="tsc_adjust"/>
              <feature policy="require" name="umip"/>
              <feature policy="require" name="md-clear"/>
              <feature policy="require" name="stibp"/>
              <feature policy="require" name="arch-capabilities"/>
              <feature policy="require" name="ssbd"/>
              <feature policy="require" name="xsaveopt"/>
              <feature policy="require" name="pdpe1gb"/>
              <feature policy="require" name="invtsc"/>
              <feature policy="require" name="ibpb"/>
              <feature policy="require" name="amd-ssbd"/>
              <feature policy="require" name="skip-l1dfl-vmentry"/>
              <feature policy="require" name="pschange-mc-no"/>
        </cpu>
  7. Use the XML file to calculate the CPU feature baseline for the VM you intend to migrate.

    # virsh hypervisor-cpu-baseline domCaps-CPUs.xml
    
        <cpu mode='custom' match='exact'>
          <model fallback='forbid'>IvyBridge-IBRS</model>
          <vendor>Intel</vendor>
          <feature policy='require' name='ss'/>
          <feature policy='require' name='vmx'/>
          <feature policy='require' name='pdcm'/>
          <feature policy='require' name='pcid'/>
          <feature policy='require' name='hypervisor'/>
          <feature policy='require' name='arat'/>
          <feature policy='require' name='tsc_adjust'/>
          <feature policy='require' name='umip'/>
          <feature policy='require' name='md-clear'/>
          <feature policy='require' name='stibp'/>
          <feature policy='require' name='arch-capabilities'/>
          <feature policy='require' name='ssbd'/>
          <feature policy='require' name='xsaveopt'/>
          <feature policy='require' name='pdpe1gb'/>
          <feature policy='require' name='invtsc'/>
          <feature policy='require' name='ibpb'/>
          <feature policy='require' name='amd-ssbd'/>
          <feature policy='require' name='skip-l1dfl-vmentry'/>
          <feature policy='require' name='pschange-mc-no'/>
        </cpu>
  8. Open the XML configuration of the VM you intend to migrate, and replace the contents of the <cpu> section with the settings obtained in the previous step.

    # virsh edit VM-name
  9. If the VM is running, restart it.

    # virsh reboot VM-name

12.5. Sharing virtual machine disk images with other hosts

To perform a live migration of a virtual machine (VM) between supported KVM hosts, you must also migrate the storage of the running VM in a way that makes it possible for the VM to read from and write to the storage during the migration process.

One of the methods to do this is using shared VM storage. The following procedure provides instructions for sharing a locally stored VM image with the source host and the destination host by using the NFS protocol.

Prerequisites

  • The VM intended for migration is shut down.
  • Optional: A host system is available for hosting the storage that is not the source or destination host, but both the source and the destination host can reach it through the network. This is the optimal solution for shared storage and is recommended by Red Hat.
  • Make sure that NFS file locking is not used as it is not supported in KVM.
  • The NFS protocol is installed and enabled on the source and destination hosts. See
  • Deploying an NFS server.

Procedure

  1. Connect to the host that will provide shared storage. In this example, it is the example-shared-storage host:

    # ssh root@example-shared-storage
    root@example-shared-storage's password:
    Last login: Mon Sep 24 12:05:36 2019
    root~#
  2. Create a directory on the source host that will hold the disk image and will be shared with the migration hosts:

    # mkdir /var/lib/libvirt/shared-images
  3. Copy the disk image of the VM from the source host to the newly created directory. The following example copies the disk image example-disk-1 of the VM to the /var/lib/libvirt/shared-images/ directory of the example-shared-storage host:

    # scp /var/lib/libvirt/images/example-disk-1.qcow2 root@example-shared-storage:/var/lib/libvirt/shared-images/example-disk-1.qcow2
  4. On the host that you want to use for sharing the storage, add the sharing directory to the /etc/exports file. The following example shares the /var/lib/libvirt/shared-images directory with the example-source-machine and example-destination-machine hosts:

    # /var/lib/libvirt/shared-images example-source-machine(rw,no_root_squash) example-destination-machine(rw,no\_root_squash)
  5. On both the source and destination host, mount the shared directory in the /var/lib/libvirt/images directory:

    # mount example-shared-storage:/var/lib/libvirt/shared-images /var/lib/libvirt/images

Verification

  • Start the VM on the source host and observe if it boots successfully.

Additional resources

12.6. Migrating a virtual machine by using the command-line interface

If the current host of a virtual machine (VM) becomes unsuitable or cannot be used anymore, or if you want to redistribute the hosting workload, you can migrate the VM to another KVM host. The following procedure provides instructions and examples for various scenarios of such migrations.

Prerequisites

  • The source host and the destination host both use the KVM hypervisor.
  • The source host and the destination host are able to reach each other over the network. Use the ping utility to verify this.
  • Ensure the following ports are open on the destination host.

    • Port 22 is needed for connecting to the destination host by using SSH.
    • Port 16509 is needed for connecting to the destination host by using TLS.
    • Port 16514 is needed for connecting to the destination host by using TCP.
    • Ports 49152-49215 are needed by QEMU for transfering the memory and disk migration data.
  • For the migration to be supportable by Red Hat, the source host and destination host must be using specific operating systems and machine types. To ensure this is the case, see Supported hosts for virtual machine migration.
  • The VM must be compatible with the CPU features of the destination host. To ensure this is the case, see Verifying host CPU compatibility for virtual machine migration.
  • The disk images of VMs that will be migrated are accessible to both the source host and the destination host. This is optional for offline migration, but required for migrating a running VM. To ensure storage accessibility for both hosts, one of the following must apply:

    • You are using storage area network (SAN) logical units (LUNs).
    • You are using a Ceph storage clusters.
    • You have copied the disk image of the VM to the destination host, and you will use --copy-storage-all parameter when migrating the VM. Alternatively, you have created a disk image with the same format and size as the source VM disk.
    • The disk image is located on a separate networked location. For instructions to set up such shared VM storage, see Sharing virtual machine disk images with other hosts.
  • When migrating a running VM, your network bandwidth must be higher than the rate in which the VM generates dirty memory pages.

    To obtain the dirty page rate of your VM before you start the live migration, do the following:

    • Monitor the rate of dirty page generation of the VM for a short period of time.

      # virsh domdirtyrate-calc example-VM 30
    • After the monitoring finishes, obtain its results:

      # virsh domstats example-VM --dirtyrate
      Domain: 'example-VM'
        dirtyrate.calc_status=2
        dirtyrate.calc_start_time=200942
        dirtyrate.calc_period=30
        dirtyrate.megabytes_per_second=2

      In this example, the VM is generating 2 MB of dirty memory pages per second. Attempting to live-migrate such a VM on a network with a bandwidth of 2 MB/s or less will cause the live migration not to progress if you do not pause the VM or lower its workload.

      To ensure that the live migration finishes successfully, Red Hat recommends that your network bandwidth is significantly greater than the VM’s dirty page generation rate.

      Note

      The value of the calc_period option might differ based on the workload and dirty page rate. You can experiment with several calc_period values to determine the most suitable period that aligns with the dirty page rate in your environment.

  • When migrating an existing VM in a public bridge tap network, the source and destination hosts must be located on the same network. Otherwise, the VM network will not work after migration.
  • When performing a VM migration, the virsh client on the source host can use one of several protocols to connect to the libvirt daemon on the destination host. Examples in the following procedure use an SSH connection, but you can choose a different one.

    • If you want libvirt to use an SSH connection, ensure that the virtqemud socket is enabled and running on the destination host.

      # systemctl enable --now virtqemud.socket
    • If you want libvirt to use a TLS connection, ensure that the virtproxyd-tls socket is enabled and running on the destination host.

      # systemctl enable --now virtproxyd-tls.socket
    • If you want libvirt to use a TCP connection, ensure that the virtproxyd-tcp socket is enabled and running on the destination host.

      # systemctl enable --now virtproxyd-tcp.socket

Procedure

  1. Use the virsh migrate command with options appropriate for your migration requirements.

    1. The following command migrates the example-VM-1 VM from your local host to the system connection of the example-destination host by using an SSH tunnel. The VM keeps running during the migration.

      # virsh migrate --persistent --live example-VM-1 qemu+ssh://example-destination/system
    2. The following commands enable you to make manual adjustments to the configuration of the example-VM-2 VM running on your local host, and then migrate the VM to the example-destination host. The migrated VM will automatically use the updated configuration.

      # virsh dumpxml --migratable example-VM-2 > example-VM-2.xml
      # vi example-VM-2.xml
      # virsh migrate --live --persistent --xml example-VM-2.xml example-VM-2 qemu+ssh://example-destination/system

      This procedure can be useful for example when the destination host needs to use a different path to access the shared VM storage or when configuring a feature specific to the destination host.

    3. The following command suspends the example-VM-3 VM from the example-source host, migrates it to the example-destination host, and instructs it to use the adjusted XML configuration, provided by the example-VM-3-alt.xml file. When the migration is completed, libvirt resumes the VM on the destination host.

      # virsh migrate example-VM-3 qemu+ssh://example-source/system qemu+ssh://example-destination/system --xml example-VM-3-alt.xml

      After the migration, the VM is in the shut off state on the source host, and the migrated copy is deleted after it is shut down.

  1. The following deletes the shut-down example-VM-4 VM from the example-source host, and moves its configuration to the example-destination host.

    # virsh migrate --offline --persistent --undefinesource example-VM-4 qemu+ssh://example-source/system qemu+ssh://example-destination/system

    Note that this type of migration does not require moving the VM’s disk image to shared storage. However, for the VM to be usable on the destination host, you also need to migrate the VM’s disk image. For example:

    # scp root@example-source:/var/lib/libvirt/images/example-VM-4.qcow2 root@example-destination:/var/lib/libvirt/images/example-VM-4.qcow2
  2. The following command migrates the example-VM-5 VM to the example-destination host and uses multiple parallel connections, also known as multiple file descriptors (multi-FD) migration. With multi-FD migration, you can speed up the migration by utilizing all of the available network bandwidth for the migration process.

    # virsh migrate --parallel --parallel-connections 4 <example-VM-5> qemu+ssh://<example-destination>/system

    This example uses 4 multi-FD channels to migrate the example-VM-5 VM. It is recommended to use one channel for each 10 Gbps of available network bandwidth. The default value is 2 channels.

    1. Wait for the migration to complete. The process may take some time depending on network bandwidth, system load, and the size of the VM. If the --verbose option is not used for virsh migrate, the CLI does not display any progress indicators except errors.

      When the migration is in progress, you can use the virsh domjobinfo utility to display the migration statistics.

Verification

  • On the destination host, list the available VMs to verify if the VM has been migrated:

    # virsh list
    Id      Name             State
    ----------------------------------
    10    example-VM-1      running

    If the migration is still running, this command will list the VM state as paused.

Troubleshooting

  • In some cases, the target host will not be compatible with certain values of the migrated VM’s XML configuration, such as the network name or CPU type. As a result, the VM will fail to boot on the target host. To fix these problems, you can update the problematic values by using the virsh edit command. After updating the values, you must restart the VM for the changes to be applied.
  • If a live migration is taking a long time to complete, this may be because the VM is under heavy load and too many memory pages are changing for live migration to be possible. To fix this problem, change the migration to a non-live one by suspending the VM.

    # virsh suspend example-VM-1

Additional resources

  • virsh migrate --help command
  • virsh (1) man page on your system

12.7. Live migrating a virtual machine by using the web console

If you wish to migrate a virtual machine (VM) that is performing tasks which require it to be constantly running, you can migrate that VM to another KVM host without shutting it down. This is also known as live migration. The following instructions explain how to do so by using the web console.

Warning

For tasks that modify memory pages faster than KVM can transfer them, such as heavy I/O load tasks, it is recommended that you do not live migrate the VM.

Prerequisites

  • You have installed the RHEL 9 web console.

    For instructions, see Installing and enabling the web console.

  • The web console VM plug-in is installed on your system.
  • The source and destination hosts are running.
  • Ensure the following ports are open on the destination host.

    • Port 22 is needed for connecting to the destination host by using SSH.
    • Port 16509 is needed for connecting to the destination host by using TLS.
    • Port 16514 is needed for connecting to the destination host by using TCP.
    • Ports 49152-49215 are needed by QEMU for transfering the memory and disk migration data.
  • The VM must be compatible with the CPU features of the destination host. To ensure this is the case, see Verifying host CPU compatibility for virtual machine migration.
  • The VM’s disk images are located on a shared storage that is accessible to the source host as well as the destination host.
  • When migrating a running VM, your network bandwidth must be higher than the rate in which the VM generates dirty memory pages.

    To obtain the dirty page rate of your VM before you start the live migration, do the following in your command-line interface:

    1. Monitor the rate of dirty page generation of the VM for a short period of time.

      # virsh domdirtyrate-calc vm-name 30
    2. After the monitoring finishes, obtain its results:

      # virsh domstats vm-name --dirtyrate
      Domain: 'vm-name'
        dirtyrate.calc_status=2
        dirtyrate.calc_start_time=200942
        dirtyrate.calc_period=30
        dirtyrate.megabytes_per_second=2

      In this example, the VM is generating 2 MB of dirty memory pages per second. Attempting to live-migrate such a VM on a network with a bandwidth of 2 MB/s or less will cause the live migration not to progress if you do not pause the VM or lower its workload.

      To ensure that the live migration finishes successfully, Red Hat recommends that your network bandwidth is significantly greater than the VM’s dirty page generation rate.

Note

The value of the calc_period option might differ based on the workload and dirty page rate. You can experiment with several calc_period values to determine the most suitable period that aligns with the dirty page rate in your environment.

Procedure

  1. In the Virtual Machines interface of the web console, click the Menu button of the VM that you want to migrate.

    A drop down menu appears with controls for various VM operations.

    The virtual machines main page displaying the available options when the VM is running.
  2. Click Migrate

    The Migrate VM to another host dialog appears.

    The Migrate VM to another host dialog box with fields to enter the URI of the destination host and set the migration duration.
  3. Enter the URI of the destination host.
  4. Configure the duration of the migration:

    • Permanent - Do not check the box if you wish to migrate the VM permanently. Permanent migration completely removes the VM configuration from the source host.
    • Temporary - Temporary migration migrates a copy of the VM to the destination host. This copy is deleted from the destination host when the VM is shut down. The original VM remains on the source host.
  5. Click Migrate

    Your VM is migrated to the destination host.

Verification

To verify whether the VM has been successfully migrated and is working correctly:

  • Confirm whether the VM appears in the list of VMs available on the destination host.
  • Start the migrated VM and observe if it boots up.

12.8. Live migrating a virtual machine with an attached Mellanox virtual function

As a Technology Preview, you can live migrate a virtual machine (VM) with an attached virtual function (VF) of a Mellanox networking device. Currently, this is only possible when using a Mellanox CX-7 networking device. The VF on the Mellanox CX-7 networking device uses a new mlx5_vfio_pci driver, which adds functionality that is necessary for the live migration, and libvirt binds the new driver to the VF automatically.

Limitations

Currently, some virtualization features cannot be used when live migrating a VM with an attached Mellanox virtual function:

  • Calculating dirty memory page rate generation of the VM.
  • Using a post-copy live migration.
  • Using a virtual I/O Memory Management Unit (vIOMMU) device in the VM.
Important

This feature is included in RHEL 9 only as a Technology Preview, which means it is not supported.

Prerequisites

  • You have a Mellanox CX-7 networking device with a firmware version that is equal to or greater than 28.36.1010.

    Refer to Mellanox documentation for details about firmware versions.

  • The mstflint package is installed on both the source and destination host:

    # dnf install mstflint
  • The Mellanox CX-7 networking device has VF_MIGRATION_MODE set to MIGRATION_ENABLED:

    # mstconfig -d <device_pci_address> query | grep -i VF_migration
    
    VF_MIGRATION_MODE                           MIGRATION_ENABLED(2)
    • You can set VF_MIGRATION_MODE to MIGRATION_ENABLED by using the following command:

      # mstconfig -d <device_pci_address> set VF_MIGRATION_MODE=2
  • The openvswitch package is installed on both the source and destination host:

    # dnf install openvswitch
  • The CPU and the firmware of your host support the I/O Memory Management Unit (IOMMU).

    • If using an Intel CPU, it must support the Intel Virtualization Technology for Directed I/O (VT-d).
    • If using an AMD CPU, it must support the AMD-Vi feature.
  • The host system uses Access Control Service (ACS) to provide direct memory access (DMA) isolation for PCIe topology. Verify this with the system vendor.

    For additional information, see Hardware Considerations for Implementing SR-IOV.

  • The host network interface you want to use for creating VFs is running. For example, to activate the eth1 interface and verify it is running, use the following commands:

    # ip link set eth1 up
    # ip link show eth1
    8: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT qlen 1000
       link/ether a0:36:9f:8f:3f:b8 brd ff:ff:ff:ff:ff:ff
       vf 0 MAC 00:00:00:00:00:00, spoof checking on, link-state auto
       vf 1 MAC 00:00:00:00:00:00, spoof checking on, link-state auto
       vf 2 MAC 00:00:00:00:00:00, spoof checking on, link-state auto
       vf 3 MAC 00:00:00:00:00:00, spoof checking on, link-state auto
  • For SR-IOV device assignment to work, the IOMMU feature must be enabled in the host BIOS and kernel. To do so:

    • On an Intel host, enable Intel Virtualization Technology for Directed I/O (VT-d):

      1. Regenerate the GRUB configuration with the intel_iommu=on and iommu=pt parameters:

        # grubby --args="intel_iommu=on iommu=pt" --update-kernel=ALL
      2. Reboot the host.
    • On an AMD host, enable AMD-Vi:

      1. Regenerate the GRUB configuration with the iommu=pt parameter:

        # grubby --args="iommu=pt" --update-kernel=ALL
      2. Reboot the host.
  • The source host and the destination host both use the KVM hypervisor.
  • The source host and the destination host are able to reach each other over the network. Use the ping utility to verify this.
  • The following ports are open on the destination host.

    • Port 22 is needed for connecting to the destination host by using SSH.
    • Port 16509 is needed for connecting to the destination host by using TLS.
    • Port 16514 is needed for connecting to the destination host by using TCP.
    • Ports 49152-49215 are needed by QEMU for transferring the memory and disk migration data.
  • The source host and destination host are using operating systems and machine types that allow migration. To ensure this is the case, see Supported hosts for virtual machine migration.
  • The VM must be compatible with the CPU features of the destination host. To ensure this is the case, see Verifying host CPU compatibility for virtual machine migration.
  • The disk images of VMs that will be migrated are located on a separate networked location accessible to both the source host and the destination host. This is optional for offline migration, but required for migrating a running VM.

    For instructions to set up such shared VM storage, see Sharing virtual machine disk images with other hosts.

  • When migrating a running VM, your network bandwidth must be higher than the rate in which the VM generates dirty memory pages.
  • A virtual network socket is enabled that corresponds to the connection protocol.

    When performing a VM migration, the virsh client on the source host can use one of several protocols to connect to the libvirt daemon on the destination host. Examples in the following procedure use an SSH connection, but you can choose a different one.

    • If you want libvirt to use an SSH connection, ensure that the virtqemud socket is enabled and running on the destination host.

      # systemctl enable --now virtqemud.socket
    • If you want libvirt to use a TLS connection, ensure that the virtproxyd-tls socket is enabled and running on the destination host.

      # systemctl enable --now virtproxyd-tls.socket
    • If you want libvirt to use a TCP connection, ensure that the virtproxyd-tcp socket is enabled and running on the destination host.

      # systemctl enable --now virtproxyd-tcp.socket

Procedure

  1. On the source host, set the Mellanox networking device to the switchdev mode.

    # devlink dev eswitch set pci/<device_pci_address> mode switchdev
  2. On the source host, create a virtual function on the Mellanox device.

    # echo 1 > /sys/bus/pci/devices/0000\:e1\:00.0/sriov_numvfs

    The /0000\:e1\:00.0/ part of the file path is based on the PCI address of the device. In the example it is: 0000:e1:00.0

  3. On the source host, unbind the VF from its driver.

    # virsh nodedev-detach <vf_pci_address> --driver pci-stub

    You can view the PCI address of the VF by using the following command:

    # lshw -c network -businfo
    
    Bus info                     Device             Class           Description
    ===========================================================================
    pci@0000:e1:00.0  enp225s0np0    network        MT2910 Family [ConnectX-7]
    pci@0000:e1:00.1  enp225s0v0     network        ConnectX Family mlx5Gen Virtual Function
  4. On the source host, enable the migration function of the VF.

    # devlink port function set pci/0000:e1:00.0/1 migratable enable

    In this example, pci/0000:e1:00.0/1 refers to the first VF on the Mellanox device with the given PCI address.

  5. On the source host, configure Open vSwitch (OVS) for the migration of the VF. If the Mellanox device is in switchdev mode, it cannot transfer data over the network.

    1. Ensure the openvswitch service is running.

      # systemctl start openvswitch
    2. Enable hardware offloading to improve networking performance.

      # ovs-vsctl set Open_vSwitch . other_config:hw-offload=true
    3. Increase the maximum idle time to ensure network connections remain open during the migration.

      # ovs-vsctl set Open_vSwitch . other_config:max-idle=300000
    4. Create a new bridge in the OVS instance.

      # ovs-vsctl add-br <bridge_name>
    5. Restart the openvswitch service.

      # systemctl restart openvswitch
    6. Add the physical Mellanox device to the OVS bridge.

      # ovs-vsctl add-port <bridge_name> enp225s0np0

      In this example, <bridge_name> is the name of the bridge you created in step d and enp225s0np0 is the network interface name of the Mellanox device.

    7. Add the VF of the Mellanox device to the OVS bridge.

      # ovs-vsctl add-port <bridge_name> enp225s0npf0vf0

      In this example, <bridge_name> is the name of the bridge you created in step d and enp225s0npf0vf0 is the network interface name of the VF.

  6. Repeat steps 1-5 on the destination host.
  7. On the source host, open a new file, such as mlx_vf.xml, and add the following XML configuration of the VF:

     <interface type='hostdev' managed='yes'>
          <mac address='52:54:00:56:8c:f7'/>
          <source>
            <address type='pci' domain='0x0000' bus='0xe1' slot='0x00' function='0x1'/>
          </source>
     </interface>

    This example configures a pass-through of the VF as a network interface for the VM. Ensure the MAC address is unique, and use the PCI address of the VF on the source host.

  8. On the source host, attach the VF XML file to the VM.

    # virsh attach-device <vm_name> mlx_vf.xml --live --config

    In this example, mlx_vf.xml is the name of the XML file with the VF configuration. Use the --live option to attach the device to a running VM.

  9. On the source host, start the live migration of the running VM with the attached VF.

    # virsh migrate --live --domain <vm_name> --desturi qemu+ssh://<destination_host_ip_address>/system

Verification

  1. In the migrated VM, view the network interface name of the Mellanox VF.

    # ifconfig
    
    eth0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
            inet 192.168.1.10  netmask 255.255.255.0  broadcast 192.168.1.255
            inet6 fe80::a00:27ff:fe4e:66a1  prefixlen 64  scopeid 0x20<link>
            ether 08:00:27:4e:66:a1  txqueuelen 1000  (Ethernet)
            RX packets 100000  bytes 6543210 (6.5 MB)
            RX errors 0  dropped 0  overruns 0  frame 0
            TX packets 100000  bytes 6543210 (6.5 MB)
            TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0
    
    enp4s0f0v0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
            inet 192.168.3.10  netmask 255.255.255.0  broadcast 192.168.3.255
            inet6 fe80::a00:27ff:fe4e:66c3  prefixlen 64  scopeid 0x20<link>
            ether 08:00:27:4e:66:c3  txqueuelen 1000  (Ethernet)
            RX packets 200000  bytes 12345678 (12.3 MB)
            RX errors 0  dropped 0  overruns 0  frame 0
            TX packets 200000  bytes 12345678 (12.3 MB)
            TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0
  2. In the migrated VM, check that the Mellanox VF works, for example:

    # ping -I <VF_interface_name> 8.8.8.8
    
    PING 8.8.8.8 (8.8.8.8) from 192.168.3.10 <VF_interface_name>: 56(84) bytes of data.
    64 bytes from 8.8.8.8: icmp_seq=1 ttl=57 time=27.4 ms
    64 bytes from 8.8.8.8: icmp_seq=2 ttl=57 time=26.9 ms
    
    --- 8.8.8.8 ping statistics ---
    2 packets transmitted, 2 received, 0% packet loss, time 1002ms
    rtt min/avg/max/mdev = 26.944/27.046/27.148/0.102 ms

12.9. Troubleshooting virtual machine migrations

If you are facing one of the following problems when migrating virtual machines (VMs), see the provided instructions to fix or avoid the issue.

12.9.1. Live migration of a VM takes a long time without completing

Cause

In some cases, migrating a running VM might cause the the VM to generate dirty memory pages faster than they can be migrated. When this occurs, the migration cannot complete successfully.

The following scenarios frequently cause this problem:

  • Live migrating a VM under a heavy load
  • Live migrating a VM that uses a large amount of memory, such as 1 TB or more

    Important

    Red Hat has successfully tested live migration of VMs with up to 6 TB of memory. However, for live migration scenarios that involve VMs with more than 1 TB of memory, customers should reach out to Red Hat technical support.

Diagnosis

If your VM live migration is taking longer than expected, use the virsh domjobinfo command to obtain the memory page data for the VM:

# virsh domjobinfo vm-name

Job type:         Unbounded
Operation:        Outgoing migration
Time elapsed:     168286974    ms
Data processed:   26.106 TiB
Data remaining:   34.383 MiB
Data total:       10.586 TiB
Memory processed: 26.106 TiB
Memory remaining: 34.383 MiB
Memory total:     10.586 TiB
Memory bandwidth: 29.056 MiB/s
Dirty rate: 17225 pages/s
Page size: 4096 bytes

In this output, the multiplication of Dirty rate and Page size is greater than Memory bandwidth. This means that the VM is generating dirty memory pages faster than the network can migrate them. As a consequence, the state of the VM on the destination host cannot converge with the state of the VM on the source host, which prevents the migration from completing.

Fix

To improve the chances that a stalled live migration finishes successfully, you can do any of the following:

  • Reduce the workload of the VM, especially memory updates.

    • To do this, stop or cancel non-essential processes in the guest operating system of the source VM.
  • Increase the downtime allowed for the live migration:

    1. Display the current maximum downtime at the end of a live migration for the VM that is being migrated:

      # virsh migrate-getmaxdowntime vm-name
    2. Set a higher maximum downtime:

      # virsh migrate-setmaxdowntime vm-name downtime-in-miliseconds

      The higher you set the maximum downtime, the more likely it will be for the migration to complete.

  • Switch the live migration to post-copy mode.

    # virsh migrate-start-postcopy vm-name
    • This ensures that the memory pages of the VM can converge on the destination host, and that the migration can complete.

      However, when post-copy mode is active, the VM might slow down significantly, due to remote page requests from the destination host to the source host. In addition, if the network connection between the source host and the destination host stops working during post-copy migration, some of the VM processes may halt due to missing memory pages.

      Therefore, do not use post-copy migration if the VM availability is critical or if the migration network is unstable.

  • If your workload allows it, suspend the VM and let the migration finish as a non-live migration. This increases the downtime of the VM, but in most cases ensures that the migration completes successfully.

Prevention

The probability of successfully completing a live migration of a VM depends on the following:

  • The workload of the VM during the migration

    • Before starting the migration, stop or cancel non-essential processes in the guest operating system of the VM.
  • The network bandwidth that the host can use for migration

    • For optimal results of a live migration, the bandwidth of the network used for the migration must be significantly higher than the dirty page generation of the VM. For instructions on obtaining the VM dirty page generation rate, see the Prerequisites in Migrating a virtual machine by using the command-line interface.
    • Both the source host and the destination host must have a dedicated network interface controller (NIC) for the migration. For live migrating a VM with more than 1 TB of memory, Red Hat recommends a NIC with the speed of 25 Gb/s or more.
    • You can also specify the network bandwidth assigned to the live migration by using the --bandwidth option when initiating the migration. For migrating very large VMs, assign as much bandwidth as viable for your deployment.
  • The mode of live migration

    • The default pre-copy migration mode copies memory pages repeatedly if they become dirty.
    • Post-copy migration copies memory pages only once.

      To enable your live migration to switch to post-copy mode if the migration stalls, use the --postcopy option with virsh migrate when starting the migration.

  • The downtime specified for the deployment

    • You can adjust this during the migration by using virsh migrate-setmaxdowntime as described previously.

12.10. Supported hosts for virtual machine migration

For the virtual machine (VM) migration to work properly and be supported by Red Hat, the source and destination hosts must be specific RHEL versions and machine types. The following table shows supported VM migration paths.

Table 12.2. Live migration compatibility
Migration methodRelease typeFuture version exampleSupport status

Forward

Minor release

9.0.1 9.1

On supported RHEL 9 systems: machine type q35.

Backward

Minor release

9.1 9.0.1

On supported RHEL 9 systems: machine type q35.

Note

Support level is different for other virtualization solutions provided by Red Hat, including RHOSP and OpenShift Virtualization.

Red Hat logoGithubRedditYoutubeTwitter

Learn

Try, buy, & sell

Communities

About Red Hat Documentation

We help Red Hat users innovate and achieve their goals with our products and services with content they can trust.

Making open source more inclusive

Red Hat is committed to replacing problematic language in our code, documentation, and web properties. For more details, see the Red Hat Blog.

About Red Hat

We deliver hardened solutions that make it easier for enterprises to work across platforms and environments, from the core datacenter to the network edge.

© 2024 Red Hat, Inc.