Ce contenu n'est pas disponible dans la langue sélectionnée.
Chapter 16. Managing GPU devices in virtual machines
To prepare your virtual machine (VM) for running AI workloads or to enhance the graphical performance of the VM, you can assign a RHEL host GPU to the VM.
- You can detach the GPU from the host and pass full control of the GPU directly to the VM.
- You can create multiple devices from a physical GPU, and assign these devices as virtual GPUs (vGPUs) to multiple guests. This is currently only supported on selected NVIDIA GPUs.
GPU assignment is currently only supported on Intel 64 and AMD64 systems.
16.1. Assigning a GPU to a virtual machine Copier lienLien copié sur presse-papiers!
To access and control GPUs that are attached to the host system, you must configure the host system to pass direct control of the GPU to the virtual machine (VM).
Note that directly assigning a GPU to a VM makes it impossible for the host to use the GPU, and for the GPU to be assigned to another VM at the same time.
If you are looking for information about assigning a virtual GPU, see Managing NVIDIA vGPU devices.
Prerequisites
You must enable IOMMU support on the host machine kernel.
On an Intel host, you must enable VT-d:
Regenerate the GRUB configuration with the
intel_iommu=onandiommu=ptparameters:# grubby --args="intel_iommu=on iommu_pt" --update-kernel DEFAULT- Reboot the host.
On an AMD host, you must enable AMD-Vi.
Note that on AMD hosts, IOMMU is enabled by default, you can add
iommu=ptto switch it to pass-through mode:Regenerate the GRUB configuration with the
iommu=ptparameter:# grubby --args="iommu=pt" --update-kernel DEFAULTNoteThe
ptoption only enables IOMMU for devices used in pass-through mode and provides better host performance. However, not all hardware supports the option. You can still assign devices even when this option is not enabled.- Reboot the host.
Procedure
Prevent the driver from binding to the GPU.
Identify the PCI bus address to which the GPU is attached.
# lspci -Dnn | grep VGA 0000:02:00.0 VGA compatible controller [0300]: NVIDIA Corporation GK106GL [Quadro K4000] [10de:11fa] (rev a1)Prevent the host’s graphics driver from using the GPU. To do so, use the GPU PCI ID with the pci-stub driver.
For example, the following command prevents the driver from binding to the GPU attached at the 10de:11fa bus:
# grubby --args="pci-stub.ids=10de:11fa" --update-kernel DEFAULT- Reboot the host.
Optional: If certain GPU functions, such as audio, cannot be passed through to the VM due to support limitations, you can modify the driver bindings of the endpoints within an IOMMU group to pass through only the necessary GPU functions.
Convert the GPU settings to XML and note the PCI address of the endpoints that you want to prevent from attaching to the host drivers.
To do so, convert the GPU’s PCI bus address to a libvirt-compatible format by adding the
pci_prefix to the address, and converting the delimiters to underscores.For example, the following command displays the XML configuration of the GPU attached at the
0000:02:00.0bus address.# virsh nodedev-dumpxml pci_0000_02_00_0<device> <name>pci_0000_02_00_0</name> <path>/sys/devices/pci0000:00/0000:00:03.0/0000:02:00.0</path> <parent>pci_0000_00_03_0</parent> <driver> <name>pci-stub</name> </driver> <capability type='pci'> <domain>0</domain> <bus>2</bus> <slot>0</slot> <function>0</function> <product id='0x11fa'>GK106GL [Quadro K4000]</product> <vendor id='0x10de'>NVIDIA Corporation</vendor> <iommuGroup number='13'> <address domain='0x0000' bus='0x02' slot='0x00' function='0x0'/> <address domain='0x0000' bus='0x02' slot='0x00' function='0x1'/> </iommuGroup> <pci-express> <link validity='cap' port='0' speed='8' width='16'/> <link validity='sta' speed='2.5' width='16'/> </pci-express> </capability> </device>Prevent the endpoints from attaching to the host driver.
In this example, to assign the GPU to a VM, prevent the endpoints that correspond to the audio function,
<address domain='0x0000' bus='0x02' slot='0x00' function='0x1'/>, from attaching to the host audio driver, and instead attach the endpoints to VFIO-PCI.# driverctl set-override 0000:02:00.1 vfio-pci
Attach the GPU to the VM
Create an XML configuration file for the GPU by using the PCI bus address.
For example, you can create the following XML file, GPU-Assign.xml, by using parameters from the GPU’s bus address.
<hostdev mode='subsystem' type='pci' managed='yes'> <driver name='vfio'/> <source> <address domain='0x0000' bus='0x02' slot='0x00' function='0x0'/> </source> </hostdev>- Save the file on the host system.
Merge the file with the VM’s XML configuration.
For example, the following command merges the GPU XML file, GPU-Assign.xml, with the XML configuration file of the
System1VM.# virsh attach-device System1 --file /home/GPU-Assign.xml --persistent Device attached successfully.NoteThe GPU is attached as a secondary graphics device to the VM. Assigning a GPU as the primary graphics device is not supported, and Red Hat does not recommend removing the primary emulated graphics device in the VM’s XML configuration.
Verification
-
The device is displayed under the
<devices>section in VM’s XML configuration.
16.2. Managing NVIDIA vGPU devices Copier lienLien copié sur presse-papiers!
The vGPU feature makes it possible to divide a physical NVIDIA GPU device into multiple virtual devices. These devices can then be assigned to multiple virtual machines (VMs) as virtual GPUs. As a result, these VMs can share the performance of a single physical GPU.
Assigning a vGPU device to a VM might have a negative impact on the performance of the physical GPU that was used to create the vGPU.
16.2.1. Setting up NVIDIA vGPU devices Copier lienLien copié sur presse-papiers!
To assign an NVIDIA virtual GPU (vGPU) to a virtual machine (VM) on a RHEL 10 host system with modern NVIDIA hardware, use device assignment based on single-root I/O virtualization (SR-IOV).
To configure a SR-IOV-based vGPU, you need to download NVIDIA vGPU drivers for your GPU device, set up a vGPU instance on a virtual function (VF) of the physical GPU device, and assign the instance to a VM.
Prerequisites
- You have installed the NVIDIA software graphics driver. For instructions, see Installing the NVIDIA vGPU software graphics driver on Red Hat distributions from an RPM package in NVIDIA documentation.
Your GPU device supports vendor-specific SR-IOV device assignment. To check whether this is the case, use the the
nvidia-smitool, installed with the NVIDIA graphics drivers:# nvidia-smi -q | grep "Host VGPU Mode"Successful output:
Host VGPU Mode : SR-IOVTo set up vGPU based on legacy NVIDIA devices that do not support SR-IOV assignment, use mediated devices. For instructions, see Setting up NVIDIA vGPU on legacy devices.
You have enabled the virtual functions (VFs) for the physical GPU device and obtained the slot, bus, domain, and function identifiers of the VFs that you want to use for NVIDIA vGPU. For instructions, see Preparing the virtual function for an NVIDIA vGPU that supports SR-IOV on a Linux with KVM hypervisor in NVIDIA documentation.
As an example, the following procedure uses a VF with domain
0000, bus41, slot02,and function3. The complete ID of this VF is0000:41:02.3.
Procedure
If the NVIDIA software installer did not create the /etc/modprobe.d/nvidia-installer-disable-nouveau.conf file, create a
conffile of any name in /etc/modprobe.d/, and add the following lines in the file:blacklist nouveau options nouveau modeset=0Change to the directory of the VF that you want to use for vGPU:
# cd /sys/bus/pci/devices/<domain>\:_<bus>_\:_<slot>_.<function>/nvidiaFor example:
# cd /sys/bus/pci/devices/0000\:41\:02.3/nvidiaOptional: Confirm that the directory contains the
creatable_vgpu_typesandcurrent_vgpu_typefiles:-r--r--r-- 1 root root 4096 Aug 3 00:39 creatable_vgpu_types -rw-r--r-- 1 root root 4096 Aug 3 00:39 current_vgpu_typeOptional: Confirm that a vGPU instance has not already been created on the VF:
# cat current_vgpu_typeIf the output of this command is
0, a vGPU has not already been created on the VF.Inspect the
creatable_vgpu_typesfile to determine what vGPU types can be created on the VF:# cat current_vgpu_typeExample output:
NVIDIA A40-1Q 557 NVIDIA A40-2Q 558 NVIDIA A40-3Q 559 NVIDIA A40-4Q 560 NVIDIA A40-6Q 561Select the vGPU type that you want to use and add its respective ID to the
current_vgpu_typefile.For example, the following command creates an instance of the
A40-4QvGPU type on the VF:# echo 560 > current_vgpu_typeOptional: Confirm that the vGPU type has been set correctly on the VF:
# cat current_vgpu_type560-
Optional: Confirm that the
creatable_vgpu_typesfile is empty. This means that no vGPU instance can be created on the VF anymore. Open the XML configuration of the VM to which you want to assign the vGPU device:
# virsh edit <vm-name>Set the video model type setting of the VM to
none.<video> <model type='none'/> </video>Add a
<hostdev>section with ID values of the VF on which you created a vGPU instance in the previous steps:<hostdev mode='subsystem' type='pci' managed='no' display='on' ramfb='on'> <source> <address domain='0x0000' bus='0x41' slot='0x02' function='0x3'/> </source> </hostdev>If you want to assign multiple vGPUs to the VM, each must be assigned to a different VF. For example:
<hostdev mode='subsystem' type='pci' managed='no' display='on' ramfb='on'> <source> <address domain='0x0000' bus='0x41' slot='0x02' function='0x3'/> </source> </hostdev> <hostdev mode='subsystem' type='pci' managed='no'> <source> <address domain='0x0000' bus='0x41' slot='0x02' function='0x4'/> </source> </hostdev>- Save the XML configuration.
- If the VM is running, reboot it.
Verification
Start the VM and verify that the guest operating system detects the virtual device as an NVIDIA GPU. For example, if the VM uses Linux:
# lspci -d 10de: -k 07:00.0 VGA compatible controller: NVIDIA Corporation GV100GL [Tesla V100 SXM2 32GB] (rev a1) Subsystem: NVIDIA Corporation Device 12ce Kernel driver in use: nvidia Kernel modules: nouveau, nvidia_drm, nvidia
Next steps
16.2.2. Removing NVIDIA vGPU devices Copier lienLien copié sur presse-papiers!
To change the configuration of assigned SR-IOV-based vGPU devices, you need to remove the vGPU host device from the assigned virtual machine (VM). Optionally, you can also deactivate the vGPU instance in the respective virtual function (VF).
Prerequisites
- You have set up an SR-IOV-based vGPU device in a VM on your host. For instructions, see Setting up nVidia vGPU devices.
- The VM from which you want to remove the device is shut down.
Procedure
Identify the VF that is used by the vGPU instance that you want to remove.
Open the XML configuration of the VM.
# virsh edit <vm-name>Look for a
hostdevsection similar to the following:<hostdev mode='subsystem' type='pci' managed='no' display='on' ramfb='on'> <source> <address domain='0x0000' bus='0x41' slot='0x02' function='0x3'/> </source> </hostdev>
Remove the vGPU
hostdevsection from the XML configuration.This removes the vGPU instance from the VM, but the respective VF remains configured with a vGPU. If you want to change the vGPU type or remove the vGPU functionality from the VF, proceed with the remaining steps.
If the
<video>section was previously removed from the XML configuration, restore it in the<devices>section to ensure that the VM uses virtualized video drivers. For example:<video> <model type='vga' vram='16384' heads='1' primary='yes'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x0'/> </video>- Save the XML configuration.
Change to the directory of the VF that you use for vGPU assignment:
# cd /sys/bus/pci/devices/<domain>\:<bus>\:<slot>.<function>/nvidiaFor example, if the VF is
0000:41:02.3, use the following command:# cd /sys/bus/pci/devices/0000\:41\:02.3/nvidiaOptional: Confirm that a vGPU type is currently set correctly on the VF:
# cat current_vgpu_type560Change the content of the
current_vgpu_typeto0.# echo 0 > current_vgpu_type
Verification
Confirm that the vGPU type on the VF is now
0.# cat current_vgpu_type0Confirm that the
creatable_vgpu_typesnow contains the available types of vGPU.cat creatable_vgpu_typesExample successful output:
NVIDIA A40-1Q 557 NVIDIA A40-2Q 558 NVIDIA A40-3Q 559 NVIDIA A40-4Q 560 NVIDIA A40-6Q 561This indicates that the vGPU instance has been removed and that you can create a new vGPU on the VF.
16.2.3. Setting up NVIDIA vGPU legacy devices Copier lienLien copié sur presse-papiers!
To set up the NVIDIA vGPU feature on NVIDIA devices that do not support device assignment based on single-root I/O virtualization (SR-IOV), you need to download NVIDIA vGPU drivers for your GPU device, create mediated devices, and assign them to the intended virtual machines.
For modern vGPU devices, use SR-IOV-based vGPU assignment instead. For instructions, see Setting up NVIDIA vGPU devices.
Prerequisites
Your GPU supports vGPU mediated devices. For an up-to-date list of NVIDIA GPUs that support creating vGPUs, see the NVIDIA vGPU software documentation.
If you do not know which GPU your host is using, install the lshw package and use the
lshw -C displaycommand. The following example shows the system is using an NVIDIA Tesla P4 GPU, compatible with vGPU.# lshw -C display *-display description: 3D controller product: GP104GL [Tesla P4] vendor: NVIDIA Corporation physical id: 0 bus info: pci@0000:01:00.0 version: a1 width: 64 bits clock: 33MHz capabilities: pm msi pciexpress cap_list configuration: driver=vfio-pci latency=0 resources: irq:16 memory:f6000000-f6ffffff memory:e0000000-efffffff memory:f0000000-f1ffffff
Your GPU does not support SR-IOV device assignment. To verify, use the following command:
# nvidia-smi -q | grep -i "Host VGPU Mode"Output for GPUs that are not compatible with SR-IOV:
Host VGPU Mode : Non SR-IOV
Procedure
- Download the NVIDIA vGPU drivers and install them on your system. For instructions, see the NVIDIA documentation.
If the NVIDIA software installer did not create the /etc/modprobe.d/nvidia-installer-disable-nouveau.conf file, create a
conffile of any name in /etc/modprobe.d/, and add the following lines in the file:blacklist nouveau options nouveau modeset=0Regenerate the initial RAM disk for the current kernel, then reboot.
# dracut --force # rebootCheck that the kernel has loaded the
nvidia_vgpu_vfiomodule and that thenvidia-vgpu-mgr.serviceservice is running.# lsmod | grep nvidia_vgpu_vfio nvidia_vgpu_vfio 45011 0 nvidia 14333621 10 nvidia_vgpu_vfio mdev 20414 2 vfio_mdev,nvidia_vgpu_vfio vfio 32695 3 vfio_mdev,nvidia_vgpu_vfio,vfio_iommu_type1 # systemctl status nvidia-vgpu-mgr.service nvidia-vgpu-mgr.service - NVIDIA vGPU Manager Daemon Loaded: loaded (/usr/lib/systemd/system/nvidia-vgpu-mgr.service; enabled; vendor preset: disabled) Active: active (running) since Fri 2018-03-16 10:17:36 CET; 5h 8min ago Main PID: 1553 (nvidia-vgpu-mgr) [...]Generate a device UUID.
# uuidgen 30820a6f-b1a5-4503-91ca-0c10ba58692aPrepare an XML file with a configuration of the mediated device, based on the detected GPU hardware. For example, the following configures a mediated device of the
nvidia-63vGPU type on an NVIDIA Tesla P4 card that runs on the 0000:01:00.0 PCI bus and uses the UUID generated in the previous step.<device> <parent>pci_0000_01_00_0</parent> <capability type="mdev"> <type id="nvidia-63"/> <uuid>30820a6f-b1a5-4503-91ca-0c10ba58692a</uuid> </capability> </device>Define a vGPU mediated device based on the XML file you prepared. For example:
# virsh nodedev-define vgpu-test.xml Node device mdev_30820a6f_b1a5_4503_91ca_0c10ba58692a_0000_01_00_0 created from vgpu-test.xmlOptional: Verify that the mediated device is listed as inactive.
# virsh nodedev-list --cap mdev --inactive mdev_30820a6f_b1a5_4503_91ca_0c10ba58692a_0000_01_00_0Start the vGPU mediated device you created.
# virsh nodedev-start mdev_30820a6f_b1a5_4503_91ca_0c10ba58692a_0000_01_00_0 Device mdev_30820a6f_b1a5_4503_91ca_0c10ba58692a_0000_01_00_0 startedOptional: Ensure that the mediated device is listed as active.
# virsh nodedev-list --cap mdev mdev_30820a6f_b1a5_4503_91ca_0c10ba58692a_0000_01_00_0Set the vGPU device to start automatically after the host reboots
# virsh nodedev-autostart mdev_30820a6f_b1a5_4503_91ca_0c10ba58692a_0000_01_00_0 Device mdev_30820a6f_b1a5_4503_91ca_0c10ba58692a_0000_01_00_0 marked as autostartedAttach the mediated device to a VM that you want to share the vGPU resources. To do so, add the following lines, along with the previously generated UUID, to the <devices/> sections in the XML configuration of the VM.
To attach a single vGPU to a VM:
... <video> <model type='none'/> </video> ... <hostdev mode='subsystem' type='mdev' managed='no' model='vfio-pci' display='on' ramfb="on"> <source> <address uuid='30820a6f-b1a5-4503-91ca-0c10ba58692a'/> </source> </hostdev>Note that each UUID can only be assigned to one VM at a time.
To attach multiple vGPUs to a VM:
... <video> <model type='none'/> </video> ... <hostdev mode='subsystem' type='mdev' managed='no' model='vfio-pci' display='on' ramfb="on"> <source> <address uuid='30820a6f-b1a5-4503-91ca-0c10ba58692a'/> </source> </hostdev> <hostdev mode='subsystem' type='mdev' managed='no' model='vfio-pci'> <source> <address uuid='751c2bd9-3322-4031-ad07-50a0d1367577'/> </source> </hostdev>
- For full functionality of the vGPU mediated devices to be available on the assigned VMs, set up NVIDIA vGPU guest software licensing on the VMs. For further information and instructions, see the NVIDIA Virtual GPU Software License Server User Guide.
Verification
Query the capabilities of the vGPU you created, and ensure it is listed as active and persistent.
# virsh nodedev-info mdev_30820a6f_b1a5_4503_91ca_0c10ba58692a_0000_01_00_0 Name: virsh nodedev-autostart mdev_30820a6f_b1a5_4503_91ca_0c10ba58692a_0000_01_00_0 Parent: pci_0000_01_00_0 Active: yes Persistent: yes Autostart: yesStart the VM and verify that the guest operating system detects the mediated device as an NVIDIA GPU. For example, if the VM uses Linux:
# lspci -d 10de: -k 07:00.0 VGA compatible controller: NVIDIA Corporation GV100GL [Tesla V100 SXM2 32GB] (rev a1) Subsystem: NVIDIA Corporation Device 12ce Kernel driver in use: nvidia Kernel modules: nouveau, nvidia_drm, nvidia
Troubleshooting
- When using RHEL 10 in a VM, the only available display protocol is Wayland. However, Wayland is currently not supported by Nvidia vGPU guest driver. As a consequence, you are not able to start GNOME desktop running on vGPU. For more information, see NVIDIA vGPU documentation.
Next steps
16.2.4. Removing NVIDIA vGPU legacy devices Copier lienLien copié sur presse-papiers!
To change the configuration of assigned vGPU mediated devices, you need to remove the existing devices from the assigned VMs.
Prerequisites
- The VM from which you want to remove the device is shut down.
Procedure
Obtain the ID of the mediated device that you want to remove.
# virsh nodedev-list --cap mdev mdev_30820a6f_b1a5_4503_91ca_0c10ba58692a_0000_01_00_0Stop the running instance of the vGPU mediated device.
# virsh nodedev-destroy mdev_30820a6f_b1a5_4503_91ca_0c10ba58692a_0000_01_00_0 Destroyed node device 'mdev_30820a6f_b1a5_4503_91ca_0c10ba58692a_0000_01_00_0'Optional: Ensure the mediated device has been deactivated.
# virsh nodedev-info mdev_30820a6f_b1a5_4503_91ca_0c10ba58692a_0000_01_00_0 Name: virsh nodedev-autostart mdev_30820a6f_b1a5_4503_91ca_0c10ba58692a_0000_01_00_0 Parent: pci_0000_01_00_0 Active: no Persistent: yes Autostart: yesRemove the device from the XML configuration of the VM. To do so, use the
virsh editutility to edit the XML configuration of the VM, and remove the mdev’s configuration segment. The segment will look similar to the following:<hostdev mode='subsystem' type='mdev' managed='no' model='vfio-pci'> <source> <address uuid='30820a6f-b1a5-4503-91ca-0c10ba58692a'/> </source> </hostdev>Note that stopping and detaching the mediated device does not delete it, but rather keeps it as defined. As such, you can restart and attach the device to a different VM.
If the
<video>segment was previously removed from the XML configuration, restore it in the<devices>section to ensure that the VM uses virtualized video drivers. For example:<video> <model type='vga' vram='16384' heads='1' primary='yes'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x0'/> </video>- Save the XML configuration.
Optional: To delete the stopped mediated device, remove its definition.
# virsh nodedev-undefine mdev_30820a6f_b1a5_4503_91ca_0c10ba58692a_0000_01_00_0 Undefined node device 'mdev_30820a6f_b1a5_4503_91ca_0c10ba58692a_0000_01_00_0'
Verification
If you only stopped and detached the device, ensure the mediated device is listed as inactive.
# virsh nodedev-list --cap mdev --inactive mdev_30820a6f_b1a5_4503_91ca_0c10ba58692a_0000_01_00_0If you also deleted the device, ensure the following command does not display it.
# virsh nodedev-list --cap mdev
16.2.5. Obtaining NVIDIA legacy vGPU information about your system Copier lienLien copié sur presse-papiers!
To evaluate the capabilities of the legacy vGPU features available to you, you can obtain additional information about the mediated devices on your system.
You can obtain the following information:
- How many mediated devices of a given type can be created
- What mediated devices are already configured on your system.
Prerequisites
Your GPU does not support SR-IOV device assignment. To verify, use the following command:
# nvidia-smi -q | grep -i "Host VGPU Mode"Output for GPUs that are not compatible with SR-IOV:
Host VGPU Mode : Non SR-IOV
Procedure
To see the available GPUs devices on your host that can support vGPU mediated devices, use the
virsh nodedev-list --cap mdev_typescommand. For example, the following shows a system with two NVIDIA Quadro RTX6000 devices.# virsh nodedev-list --cap mdev_types pci_0000_5b_00_0 pci_0000_9b_00_0To display vGPU types supported by a specific GPU device, as well as additional metadata, use the
virsh nodedev-dumpxmlcommand.# virsh nodedev-dumpxml pci_0000_9b_00_0 <device> <name>pci_0000_9b_00_0</name> <path>/sys/devices/pci0000:9a/0000:9a:00.0/0000:9b:00.0</path> <parent>pci_0000_9a_00_0</parent> <driver> <name>nvidia</name> </driver> <capability type='pci'> <class>0x030000</class> <domain>0</domain> <bus>155</bus> <slot>0</slot> <function>0</function> <product id='0x1e30'>TU102GL [Quadro RTX 6000/8000]</product> <vendor id='0x10de'>NVIDIA Corporation</vendor> <capability type='mdev_types'> <type id='nvidia-346'> <name>GRID RTX6000-12C</name> <deviceAPI>vfio-pci</deviceAPI> <availableInstances>2</availableInstances> </type> <type id='nvidia-439'> <name>GRID RTX6000-3A</name> <deviceAPI>vfio-pci</deviceAPI> <availableInstances>8</availableInstances> </type> [...] <type id='nvidia-440'> <name>GRID RTX6000-4A</name> <deviceAPI>vfio-pci</deviceAPI> <availableInstances>6</availableInstances> </type> <type id='nvidia-261'> <name>GRID RTX6000-8Q</name> <deviceAPI>vfio-pci</deviceAPI> <availableInstances>3</availableInstances> </type> </capability> <iommuGroup number='216'> <address domain='0x0000' bus='0x9b' slot='0x00' function='0x3'/> <address domain='0x0000' bus='0x9b' slot='0x00' function='0x1'/> <address domain='0x0000' bus='0x9b' slot='0x00' function='0x2'/> <address domain='0x0000' bus='0x9b' slot='0x00' function='0x0'/> </iommuGroup> <numa node='2'/> <pci-express> <link validity='cap' port='0' speed='8' width='16'/> <link validity='sta' speed='2.5' width='8'/> </pci-express> </capability> </device>
16.2.6. Live migrating a virtual machine with an attached NVIDIA vGPU Copier lienLien copié sur presse-papiers!
If you use virtual GPUs (vGPUs) in your virtualization workloads, you can live migrate a running virtual machine (VM) with an attached vGPU to another KVM host. Currently, this is only possible with NVIDIA GPUs.
Prerequisites
- You have an NVIDIA GPU with an NVIDIA Virtual GPU Software Driver version that supports this functionality. Refer to the relevant NVIDIA vGPU documentation for more details.
- You have a correctly configured NVIDIA vGPU assigned to a VM. For instructions, see: Managing NVIDIA vGPU devices.
It is also possible to live migrate a VM with multiple vGPU devices attached.
- The host uses the Intel 64 or AMD64 CPU architecture.
- All of the vGPU migration prerequisites that are documented by NVIDIA. Refer to the relevant NVIDIA vGPU documentation for more details.
- All of the general VM migration prerequisites. For details, see Migrating a virtual machine by using the command line
Limitations
- Certain NVIDIA GPU features can disable the migration. For more information, see the specific NVIDIA documentation for your graphics card.
- Some GPU workloads are not compatible with the downtime that happens during a migration. As a consequence, the GPU workloads might stop or crash. It is recommended to test if your workloads are compatible with the downtime before attempting a vGPU live migration.
Currently, some general virtualization features cannot be used when live migrating a VM with an attached vGPU:
Calculating dirty memory page rate generation of the VM.
Currently, live migration data and statistics provided by
virsh domjobinfoandvirsh domdirtyrate-calccommands are inaccurate when migrating a VM with an attached vGPU, because the calculations only count guest RAM without including vRAM from the vGPU.- Using a post-copy live migration.
- Using a virtual I/O Memory Management Unit (vIOMMU) device in the VM.
Procedure
For instructions on how to proceed with the live migration, see: Migrating a virtual machine by using the command line
In most cases, no additional parameters for the migration command are required for the attached vGPU device.
To live migrate a VM with a legacy vGPU based on a mediated device to a host that supports SR-IOV-based vGPUs, for example when migrating from a RHEL 9 host to a RHEL 10 host, you can change the vGPU type during migration:
Create a copy of the XML configuration file of the VM:
# cp /etc/libivrt/qemu/<vm_config>.xml /etc/libivrt/qemu/<vm_new_config>.xml- Open the new copy of the XML configuration.
Adjust the
<hostdev>section of the mediated device (type='mdev') to instead use an SR-IOV device that is available on the destination host. For example:<hostdev mode='subsystem' type='pci' managed='no' display='on' ramfb='on'> <source> <address domain='0x0000' bus='0x41' slot='0x02' function='0x3'/> </source> </hostdev>- Save the XML configuration.
When performing the VM live migration, set the VM to use the adjusted XML on the destination host:
# virsh migrate --live \ --xml <vm_new_config>.xml \ <vm-name> \ qemu+ssh://<destination_host>/system
16.2.7. Remote desktop streaming services for NVIDIA vGPU Copier lienLien copié sur presse-papiers!
The following remote desktop streaming services are supported on the RHEL 10 hypervisor with NVIDIA vGPU or NVIDIA GPU passthrough enabled:
- HP ZCentral Remote Boost/Teradici
- NICE DCV
- Mechdyne TGX
For support details, see the appropriate vendor support matrix.