Accueiil
Products
Red Hat Enterprise Linux
10
Configuring and managing Linux virtual machines
Chapter 16. Managing GPU devices in virtual machines

Ce contenu n'est pas disponible dans la langue sélectionnée.

Chapter 16. Managing GPU devices in virtual machines

To prepare your virtual machine (VM) for running AI workloads or to enhance the graphical performance of the VM, you can assign a RHEL host GPU to the VM.

You can detach the GPU from the host and pass full control of the GPU directly to the VM.
You can create multiple devices from a physical GPU, and assign these devices as virtual GPUs (vGPUs) to multiple guests. This is currently only supported on selected NVIDIA GPUs.

Important

GPU assignment is currently only supported on Intel 64 and AMD64 systems.

16.1. Assigning a GPU to a virtual machine
Copier lien

To access and control GPUs that are attached to the host system, you must configure the host system to pass direct control of the GPU to the virtual machine (VM).

Note that directly assigning a GPU to a VM makes it impossible for the host to use the GPU, and for the GPU to be assigned to another VM at the same time.

Note

If you are looking for information about assigning a virtual GPU, see Managing NVIDIA vGPU devices.

Prerequisites

You must enable IOMMU support on the host machine kernel.
- On an Intel host, you must enable VT-d:
  1. Regenerate the GRUB configuration with the intel_iommu=on and iommu=pt parameters:
    
    # grubby --args="intel_iommu=on iommu_pt" --update-kernel DEFAULT
  2. Reboot the host.
- On an AMD host, you must enable AMD-Vi.
  Note that on AMD hosts, IOMMU is enabled by default, you can add iommu=pt to switch it to pass-through mode:
  1. Regenerate the GRUB configuration with the iommu=pt parameter:
    
    # grubby --args="iommu=pt" --update-kernel DEFAULT
    
    Note
    The pt option only enables IOMMU for devices used in pass-through mode and provides better host performance. However, not all hardware supports the option. You can still assign devices even when this option is not enabled.
  2. Reboot the host.

Procedure

Prevent the driver from binding to the GPU.
1. Identify the PCI bus address to which the GPU is attached.
  # lspci -Dnn | grep VGA 0000:02:00.0 VGA compatible controller [0300]: NVIDIA Corporation GK106GL [Quadro K4000] [10de:11fa] (rev a1)
2. Prevent the host’s graphics driver from using the GPU. To do so, use the GPU PCI ID with the pci-stub driver.
  For example, the following command prevents the driver from binding to the GPU attached at the 10de:11fa bus:
  # grubby --args="pci-stub.ids=10de:11fa" --update-kernel DEFAULT
3. Reboot the host.
Optional: If certain GPU functions, such as audio, cannot be passed through to the VM due to support limitations, you can modify the driver bindings of the endpoints within an IOMMU group to pass through only the necessary GPU functions.
1. Convert the GPU settings to XML and note the PCI address of the endpoints that you want to prevent from attaching to the host drivers.
  To do so, convert the GPU’s PCI bus address to a libvirt-compatible format by adding the pci_ prefix to the address, and converting the delimiters to underscores.
  For example, the following command displays the XML configuration of the GPU attached at the 0000:02:00.0 bus address.
  # virsh nodedev-dumpxml pci_0000_02_00_0
  <device> <name>pci_0000_02_00_0</name> <path>/sys/devices/pci0000:00/0000:00:03.0/0000:02:00.0</path> <parent>pci_0000_00_03_0</parent> <driver> <name>pci-stub</name> </driver> <capability type='pci'> <domain>0</domain> <bus>2</bus> <slot>0</slot> <function>0</function> <product id='0x11fa'>GK106GL [Quadro K4000]</product> <vendor id='0x10de'>NVIDIA Corporation</vendor> <iommuGroup number='13'> <address domain='0x0000' bus='0x02' slot='0x00' function='0x0'/> <address domain='0x0000' bus='0x02' slot='0x00' function='0x1'/> </iommuGroup> <pci-express> <link validity='cap' port='0' speed='8' width='16'/> <link validity='sta' speed='2.5' width='16'/> </pci-express> </capability> </device>
2. Prevent the endpoints from attaching to the host driver.
  In this example, to assign the GPU to a VM, prevent the endpoints that correspond to the audio function, <address domain='0x0000' bus='0x02' slot='0x00' function='0x1'/>, from attaching to the host audio driver, and instead attach the endpoints to VFIO-PCI.
  # driverctl set-override 0000:02:00.1 vfio-pci
Attach the GPU to the VM
1. Create an XML configuration file for the GPU by using the PCI bus address.
  For example, you can create the following XML file, GPU-Assign.xml, by using parameters from the GPU’s bus address.
  <hostdev mode='subsystem' type='pci' managed='yes'> <driver name='vfio'/> <source> <address domain='0x0000' bus='0x02' slot='0x00' function='0x0'/> </source> </hostdev>
2. Save the file on the host system.
3. Merge the file with the VM’s XML configuration.
  For example, the following command merges the GPU XML file, GPU-Assign.xml, with the XML configuration file of the System1 VM.
  # virsh attach-device System1 --file /home/GPU-Assign.xml --persistent Device attached successfully.
  Note
  The GPU is attached as a secondary graphics device to the VM. Assigning a GPU as the primary graphics device is not supported, and Red Hat does not recommend removing the primary emulated graphics device in the VM’s XML configuration.

Verification

The device is displayed under the <devices> section in VM’s XML configuration.

16.2. Managing NVIDIA vGPU devices
Copier lien

The vGPU feature makes it possible to divide a physical NVIDIA GPU device into multiple virtual devices. These devices can then be assigned to multiple virtual machines (VMs) as virtual GPUs. As a result, these VMs can share the performance of a single physical GPU.

Note

Assigning a vGPU device to a VM might have a negative impact on the performance of the physical GPU that was used to create the vGPU.

16.2.1. Setting up NVIDIA vGPU devices
Copier lien

To assign an NVIDIA virtual GPU (vGPU) to a virtual machine (VM) on a RHEL 10 host system with modern NVIDIA hardware, use device assignment based on single-root I/O virtualization (SR-IOV).

To configure a SR-IOV-based vGPU, you need to download NVIDIA vGPU drivers for your GPU device, set up a vGPU instance on a virtual function (VF) of the physical GPU device, and assign the instance to a VM.

Prerequisites

You have installed the NVIDIA software graphics driver. For instructions, see Installing the NVIDIA vGPU software graphics driver on Red Hat distributions from an RPM package in NVIDIA documentation.
Your GPU device supports vendor-specific SR-IOV device assignment. To check whether this is the case, use the the nvidia-smi tool, installed with the NVIDIA graphics drivers:
```
# nvidia-smi -q | grep "Host VGPU Mode"
```
Successful output:
```
Host VGPU Mode   : SR-IOV
```
To set up vGPU based on legacy NVIDIA devices that do not support SR-IOV assignment, use mediated devices. For instructions, see Setting up NVIDIA vGPU on legacy devices.
You have enabled the virtual functions (VFs) for the physical GPU device and obtained the slot, bus, domain, and function identifiers of the VFs that you want to use for NVIDIA vGPU. For instructions, see Preparing the virtual function for an NVIDIA vGPU that supports SR-IOV on a Linux with KVM hypervisor in NVIDIA documentation.
As an example, the following procedure uses a VF with domain 0000, bus 41, slot 02,and function 3. The complete ID of this VF is 0000:41:02.3.

Procedure

If the NVIDIA software installer did not create the /etc/modprobe.d/nvidia-installer-disable-nouveau.conf file, create a conf file of any name in /etc/modprobe.d/, and add the following lines in the file:
```
blacklist nouveau
options nouveau modeset=0
```

Change to the directory of the VF that you want to use for vGPU:

# cd /sys/bus/pci/devices/<domain>\:_<bus>_\:_<slot>_.<function>/nvidia

For example:

# cd /sys/bus/pci/devices/0000\:41\:02.3/nvidia

Optional: Confirm that the directory contains the creatable_vgpu_types and current_vgpu_type files:

-r--r--r-- 1 root root       4096 Aug  3 00:39 creatable_vgpu_types
-rw-r--r-- 1 root root       4096 Aug  3 00:39 current_vgpu_type

Optional: Confirm that a vGPU instance has not already been created on the VF:
```
# cat current_vgpu_type
```
If the output of this command is 0, a vGPU has not already been created on the VF.

Inspect the creatable_vgpu_types file to determine what vGPU types can be created on the VF:

# cat current_vgpu_type

Example output:

NVIDIA A40-1Q           557
NVIDIA A40-2Q           558
NVIDIA A40-3Q           559
NVIDIA A40-4Q           560
NVIDIA A40-6Q           561

Select the vGPU type that you want to use and add its respective ID to the current_vgpu_type file.
For example, the following command creates an instance of the A40-4Q vGPU type on the VF:
```
# echo 560 > current_vgpu_type
```
Optional: Confirm that the vGPU type has been set correctly on the VF:
```
# cat current_vgpu_type
```
```
560
```
Optional: Confirm that the creatable_vgpu_types file is empty. This means that no vGPU instance can be created on the VF anymore.
Open the XML configuration of the VM to which you want to assign the vGPU device:
```
# virsh edit <vm-name>
```
Set the video model type setting of the VM to none.
```
<video>
  <model type='none'/>
</video>
```

Add a <hostdev> section with ID values of the VF on which you created a vGPU instance in the previous steps:

<hostdev mode='subsystem' type='pci' managed='no' display='on' ramfb='on'>
   <source>
     <address domain='0x0000' bus='0x41' slot='0x02' function='0x3'/>
  </source>
</hostdev>

If you want to assign multiple vGPUs to the VM, each must be assigned to a different VF. For example:

<hostdev mode='subsystem' type='pci' managed='no' display='on' ramfb='on'>
  <source>
    <address domain='0x0000' bus='0x41' slot='0x02' function='0x3'/>
  </source>
</hostdev>
<hostdev mode='subsystem' type='pci' managed='no'>
  <source>
    <address domain='0x0000' bus='0x41' slot='0x02' function='0x4'/>
  </source>
</hostdev>

Save the XML configuration.
If the VM is running, reboot it.

Verification

Start the VM and verify that the guest operating system detects the virtual device as an NVIDIA GPU. For example, if the VM uses Linux:

# lspci -d 10de: -k
07:00.0 VGA compatible controller: NVIDIA Corporation GV100GL [Tesla V100 SXM2 32GB] (rev a1)
        Subsystem: NVIDIA Corporation Device 12ce
        Kernel driver in use: nvidia
        Kernel modules: nouveau, nvidia_drm, nvidia

Next steps

Removing NVIDIA vGPU devices

16.2.2. Removing NVIDIA vGPU devices
Copier lien

To change the configuration of assigned SR-IOV-based vGPU devices, you need to remove the vGPU host device from the assigned virtual machine (VM). Optionally, you can also deactivate the vGPU instance in the respective virtual function (VF).

Prerequisites

You have set up an SR-IOV-based vGPU device in a VM on your host. For instructions, see Setting up nVidia vGPU devices.
The VM from which you want to remove the device is shut down.

Procedure

Identify the VF that is used by the vGPU instance that you want to remove.

Open the XML configuration of the VM.
```
# virsh edit <vm-name>
```

Look for a hostdev section similar to the following:

<hostdev mode='subsystem' type='pci' managed='no' display='on' ramfb='on'>
   <source>
     <address domain='0x0000' bus='0x41' slot='0x02' function='0x3'/>
  </source>
</hostdev>

Remove the vGPU hostdev section from the XML configuration.
This removes the vGPU instance from the VM, but the respective VF remains configured with a vGPU. If you want to change the vGPU type or remove the vGPU functionality from the VF, proceed with the remaining steps.

If the <video> section was previously removed from the XML configuration, restore it in the <devices> section to ensure that the VM uses virtualized video drivers. For example:

    <video>
      <model type='vga' vram='16384' heads='1' primary='yes'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x0'/>
    </video>

Save the XML configuration.
Change to the directory of the VF that you use for vGPU assignment:
```
# cd /sys/bus/pci/devices/<domain>\:<bus>\:<slot>.<function>/nvidia
```
For example, if the VF is 0000:41:02.3, use the following command:
```
# cd /sys/bus/pci/devices/0000\:41\:02.3/nvidia
```
Optional: Confirm that a vGPU type is currently set correctly on the VF:
```
# cat current_vgpu_type
```
```
560
```
Change the content of the current_vgpu_type to 0.
```
# echo 0 > current_vgpu_type
```

Verification

Confirm that the vGPU type on the VF is now 0.
```
# cat current_vgpu_type
```
```
0
```
Confirm that the creatable_vgpu_types now contains the available types of vGPU.
```
cat creatable_vgpu_types
```
Example successful output:
```
NVIDIA A40-1Q           557
NVIDIA A40-2Q           558
NVIDIA A40-3Q           559
NVIDIA A40-4Q           560
NVIDIA A40-6Q           561
```
This indicates that the vGPU instance has been removed and that you can create a new vGPU on the VF.

16.2.3. Setting up NVIDIA vGPU legacy devices
Copier lien

To set up the NVIDIA vGPU feature on NVIDIA devices that do not support device assignment based on single-root I/O virtualization (SR-IOV), you need to download NVIDIA vGPU drivers for your GPU device, create mediated devices, and assign them to the intended virtual machines.

Important

For modern vGPU devices, use SR-IOV-based vGPU assignment instead. For instructions, see Setting up NVIDIA vGPU devices.

Prerequisites

Your GPU supports vGPU mediated devices. For an up-to-date list of NVIDIA GPUs that support creating vGPUs, see the NVIDIA vGPU software documentation.

If you do not know which GPU your host is using, install the lshw package and use the lshw -C display command. The following example shows the system is using an NVIDIA Tesla P4 GPU, compatible with vGPU.

# lshw -C display

*-display
       description: 3D controller
       product: GP104GL [Tesla P4]
       vendor: NVIDIA Corporation
       physical id: 0
       bus info: pci@0000:01:00.0
       version: a1
       width: 64 bits
       clock: 33MHz
       capabilities: pm msi pciexpress cap_list
       configuration: driver=vfio-pci latency=0
       resources: irq:16 memory:f6000000-f6ffffff memory:e0000000-efffffff memory:f0000000-f1ffffff

Your GPU does not support SR-IOV device assignment. To verify, use the following command:
```
# nvidia-smi -q | grep -i "Host VGPU Mode"
```
Output for GPUs that are not compatible with SR-IOV:
```
Host VGPU Mode   : Non SR-IOV
```

Procedure

Download the NVIDIA vGPU drivers and install them on your system. For instructions, see the NVIDIA documentation.
If the NVIDIA software installer did not create the /etc/modprobe.d/nvidia-installer-disable-nouveau.conf file, create a conf file of any name in /etc/modprobe.d/, and add the following lines in the file:
```
blacklist nouveau
options nouveau modeset=0
```
Regenerate the initial RAM disk for the current kernel, then reboot.
```
# dracut --force
# reboot
```

Check that the kernel has loaded the nvidia_vgpu_vfio module and that the nvidia-vgpu-mgr.service service is running.

# lsmod | grep nvidia_vgpu_vfio
nvidia_vgpu_vfio 45011 0
nvidia 14333621 10 nvidia_vgpu_vfio
mdev 20414 2 vfio_mdev,nvidia_vgpu_vfio
vfio 32695 3 vfio_mdev,nvidia_vgpu_vfio,vfio_iommu_type1

# systemctl status nvidia-vgpu-mgr.service
nvidia-vgpu-mgr.service - NVIDIA vGPU Manager Daemon
   Loaded: loaded (/usr/lib/systemd/system/nvidia-vgpu-mgr.service; enabled; vendor preset: disabled)
   Active: active (running) since Fri 2018-03-16 10:17:36 CET; 5h 8min ago
 Main PID: 1553 (nvidia-vgpu-mgr)
 [...]

Generate a device UUID.

# uuidgen
30820a6f-b1a5-4503-91ca-0c10ba58692a

Prepare an XML file with a configuration of the mediated device, based on the detected GPU hardware. For example, the following configures a mediated device of the nvidia-63 vGPU type on an NVIDIA Tesla P4 card that runs on the 0000:01:00.0 PCI bus and uses the UUID generated in the previous step.
```
<device>
    <parent>pci_0000_01_00_0</parent>
    <capability type="mdev">
        <type id="nvidia-63"/>
        <uuid>30820a6f-b1a5-4503-91ca-0c10ba58692a</uuid>
    </capability>
</device>
```

Define a vGPU mediated device based on the XML file you prepared. For example:

# virsh nodedev-define vgpu-test.xml
Node device mdev_30820a6f_b1a5_4503_91ca_0c10ba58692a_0000_01_00_0 created from vgpu-test.xml

Optional: Verify that the mediated device is listed as inactive.

# virsh nodedev-list --cap mdev --inactive
mdev_30820a6f_b1a5_4503_91ca_0c10ba58692a_0000_01_00_0

Start the vGPU mediated device you created.

# virsh nodedev-start mdev_30820a6f_b1a5_4503_91ca_0c10ba58692a_0000_01_00_0
Device mdev_30820a6f_b1a5_4503_91ca_0c10ba58692a_0000_01_00_0 started

Optional: Ensure that the mediated device is listed as active.

# virsh nodedev-list --cap mdev
mdev_30820a6f_b1a5_4503_91ca_0c10ba58692a_0000_01_00_0

Set the vGPU device to start automatically after the host reboots

# virsh nodedev-autostart mdev_30820a6f_b1a5_4503_91ca_0c10ba58692a_0000_01_00_0
Device mdev_30820a6f_b1a5_4503_91ca_0c10ba58692a_0000_01_00_0 marked as autostarted

Attach the mediated device to a VM that you want to share the vGPU resources. To do so, add the following lines, along with the previously generated UUID, to the <devices/> sections in the XML configuration of the VM.

To attach a single vGPU to a VM:

...
<video>
  <model type='none'/>
</video>
...

<hostdev mode='subsystem' type='mdev' managed='no' model='vfio-pci' display='on' ramfb="on">
  <source>
    <address uuid='30820a6f-b1a5-4503-91ca-0c10ba58692a'/>
  </source>
</hostdev>

Note that each UUID can only be assigned to one VM at a time.

To attach multiple vGPUs to a VM:

...
<video>
  <model type='none'/>
</video>
...

<hostdev mode='subsystem' type='mdev' managed='no' model='vfio-pci' display='on' ramfb="on">
  <source>
    <address uuid='30820a6f-b1a5-4503-91ca-0c10ba58692a'/>
  </source>
</hostdev>

<hostdev mode='subsystem' type='mdev' managed='no' model='vfio-pci'>
  <source>
    <address uuid='751c2bd9-3322-4031-ad07-50a0d1367577'/>
  </source>
</hostdev>

For full functionality of the vGPU mediated devices to be available on the assigned VMs, set up NVIDIA vGPU guest software licensing on the VMs. For further information and instructions, see the NVIDIA Virtual GPU Software License Server User Guide.

Verification

Query the capabilities of the vGPU you created, and ensure it is listed as active and persistent.

# virsh nodedev-info mdev_30820a6f_b1a5_4503_91ca_0c10ba58692a_0000_01_00_0
Name:           virsh nodedev-autostart mdev_30820a6f_b1a5_4503_91ca_0c10ba58692a_0000_01_00_0
Parent:         pci_0000_01_00_0
Active:         yes
Persistent:     yes
Autostart:      yes

Start the VM and verify that the guest operating system detects the mediated device as an NVIDIA GPU. For example, if the VM uses Linux:

# lspci -d 10de: -k
07:00.0 VGA compatible controller: NVIDIA Corporation GV100GL [Tesla V100 SXM2 32GB] (rev a1)
        Subsystem: NVIDIA Corporation Device 12ce
        Kernel driver in use: nvidia
        Kernel modules: nouveau, nvidia_drm, nvidia

Troubleshooting

When using RHEL 10 in a VM, the only available display protocol is Wayland. However, Wayland is currently not supported by Nvidia vGPU guest driver. As a consequence, you are not able to start GNOME desktop running on vGPU. For more information, see NVIDIA vGPU documentation.

Next steps

Removing NVIDIA vGPU legacy devices

16.2.4. Removing NVIDIA vGPU legacy devices
Copier lien

To change the configuration of assigned vGPU mediated devices, you need to remove the existing devices from the assigned VMs.

Prerequisites

The VM from which you want to remove the device is shut down.

Procedure

Obtain the ID of the mediated device that you want to remove.

# virsh nodedev-list --cap mdev
mdev_30820a6f_b1a5_4503_91ca_0c10ba58692a_0000_01_00_0

Stop the running instance of the vGPU mediated device.

# virsh nodedev-destroy mdev_30820a6f_b1a5_4503_91ca_0c10ba58692a_0000_01_00_0
Destroyed node device 'mdev_30820a6f_b1a5_4503_91ca_0c10ba58692a_0000_01_00_0'

Optional: Ensure the mediated device has been deactivated.

# virsh nodedev-info mdev_30820a6f_b1a5_4503_91ca_0c10ba58692a_0000_01_00_0
Name:           virsh nodedev-autostart mdev_30820a6f_b1a5_4503_91ca_0c10ba58692a_0000_01_00_0
Parent:         pci_0000_01_00_0
Active:         no
Persistent:     yes
Autostart:      yes

Remove the device from the XML configuration of the VM. To do so, use the virsh edit utility to edit the XML configuration of the VM, and remove the mdev’s configuration segment. The segment will look similar to the following:
```
<hostdev mode='subsystem' type='mdev' managed='no' model='vfio-pci'>
  <source>
    <address uuid='30820a6f-b1a5-4503-91ca-0c10ba58692a'/>
  </source>
</hostdev>
```
Note that stopping and detaching the mediated device does not delete it, but rather keeps it as defined. As such, you can restart and attach the device to a different VM.

If the <video> segment was previously removed from the XML configuration, restore it in the <devices> section to ensure that the VM uses virtualized video drivers. For example:

    <video>
      <model type='vga' vram='16384' heads='1' primary='yes'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x0'/>
    </video>

Save the XML configuration.

Optional: To delete the stopped mediated device, remove its definition.

# virsh nodedev-undefine mdev_30820a6f_b1a5_4503_91ca_0c10ba58692a_0000_01_00_0
Undefined node device 'mdev_30820a6f_b1a5_4503_91ca_0c10ba58692a_0000_01_00_0'

Verification

If you only stopped and detached the device, ensure the mediated device is listed as inactive.

# virsh nodedev-list --cap mdev --inactive
mdev_30820a6f_b1a5_4503_91ca_0c10ba58692a_0000_01_00_0

If you also deleted the device, ensure the following command does not display it.
```
# virsh nodedev-list --cap mdev
```

16.2.5. Obtaining NVIDIA legacy vGPU information about your system
Copier lien

To evaluate the capabilities of the legacy vGPU features available to you, you can obtain additional information about the mediated devices on your system.

You can obtain the following information:

How many mediated devices of a given type can be created
What mediated devices are already configured on your system.

Prerequisites

Your GPU does not support SR-IOV device assignment. To verify, use the following command:
```
# nvidia-smi -q | grep -i "Host VGPU Mode"
```
Output for GPUs that are not compatible with SR-IOV:
```
Host VGPU Mode   : Non SR-IOV
```

Procedure

To see the available GPUs devices on your host that can support vGPU mediated devices, use the virsh nodedev-list --cap mdev_types command. For example, the following shows a system with two NVIDIA Quadro RTX6000 devices.
```
# virsh nodedev-list --cap mdev_types
pci_0000_5b_00_0
pci_0000_9b_00_0
```

To display vGPU types supported by a specific GPU device, as well as additional metadata, use the virsh nodedev-dumpxml command.

# virsh nodedev-dumpxml pci_0000_9b_00_0
<device>
  <name>pci_0000_9b_00_0</name>
  <path>/sys/devices/pci0000:9a/0000:9a:00.0/0000:9b:00.0</path>
  <parent>pci_0000_9a_00_0</parent>
  <driver>
    <name>nvidia</name>
  </driver>
  <capability type='pci'>
    <class>0x030000</class>
    <domain>0</domain>
    <bus>155</bus>
    <slot>0</slot>
    <function>0</function>
    <product id='0x1e30'>TU102GL [Quadro RTX 6000/8000]</product>
    <vendor id='0x10de'>NVIDIA Corporation</vendor>
    <capability type='mdev_types'>
      <type id='nvidia-346'>
        <name>GRID RTX6000-12C</name>
        <deviceAPI>vfio-pci</deviceAPI>
        <availableInstances>2</availableInstances>
      </type>
      <type id='nvidia-439'>
        <name>GRID RTX6000-3A</name>
        <deviceAPI>vfio-pci</deviceAPI>
        <availableInstances>8</availableInstances>
      </type>
      [...]
      <type id='nvidia-440'>
        <name>GRID RTX6000-4A</name>
        <deviceAPI>vfio-pci</deviceAPI>
        <availableInstances>6</availableInstances>
      </type>
      <type id='nvidia-261'>
        <name>GRID RTX6000-8Q</name>
        <deviceAPI>vfio-pci</deviceAPI>
        <availableInstances>3</availableInstances>
      </type>
    </capability>
    <iommuGroup number='216'>
      <address domain='0x0000' bus='0x9b' slot='0x00' function='0x3'/>
      <address domain='0x0000' bus='0x9b' slot='0x00' function='0x1'/>
      <address domain='0x0000' bus='0x9b' slot='0x00' function='0x2'/>
      <address domain='0x0000' bus='0x9b' slot='0x00' function='0x0'/>
    </iommuGroup>
    <numa node='2'/>
    <pci-express>
      <link validity='cap' port='0' speed='8' width='16'/>
      <link validity='sta' speed='2.5' width='8'/>
    </pci-express>
  </capability>
</device>

16.2.6. Live migrating a virtual machine with an attached NVIDIA vGPU
Copier lien

If you use virtual GPUs (vGPUs) in your virtualization workloads, you can live migrate a running virtual machine (VM) with an attached vGPU to another KVM host. Currently, this is only possible with NVIDIA GPUs.

Prerequisites

You have an NVIDIA GPU with an NVIDIA Virtual GPU Software Driver version that supports this functionality. Refer to the relevant NVIDIA vGPU documentation for more details.
You have a correctly configured NVIDIA vGPU assigned to a VM. For instructions, see: Managing NVIDIA vGPU devices.

Note

It is also possible to live migrate a VM with multiple vGPU devices attached.

The host uses the Intel 64 or AMD64 CPU architecture.
All of the vGPU migration prerequisites that are documented by NVIDIA. Refer to the relevant NVIDIA vGPU documentation for more details.
All of the general VM migration prerequisites. For details, see Migrating a virtual machine by using the command line

Limitations

Certain NVIDIA GPU features can disable the migration. For more information, see the specific NVIDIA documentation for your graphics card.
Some GPU workloads are not compatible with the downtime that happens during a migration. As a consequence, the GPU workloads might stop or crash. It is recommended to test if your workloads are compatible with the downtime before attempting a vGPU live migration.
Currently, some general virtualization features cannot be used when live migrating a VM with an attached vGPU:
- Calculating dirty memory page rate generation of the VM.
  Currently, live migration data and statistics provided by virsh domjobinfo and virsh domdirtyrate-calc commands are inaccurate when migrating a VM with an attached vGPU, because the calculations only count guest RAM without including vRAM from the vGPU.
- Using a post-copy live migration.
- Using a virtual I/O Memory Management Unit (vIOMMU) device in the VM.

Procedure

For instructions on how to proceed with the live migration, see: Migrating a virtual machine by using the command line
In most cases, no additional parameters for the migration command are required for the attached vGPU device.
To live migrate a VM with a legacy vGPU based on a mediated device to a host that supports SR-IOV-based vGPUs, for example when migrating from a RHEL 9 host to a RHEL 10 host, you can change the vGPU type during migration:
1. Create a copy of the XML configuration file of the VM:
  # cp /etc/libivrt/qemu/<vm_config>.xml /etc/libivrt/qemu/<vm_new_config>.xml
2. Open the new copy of the XML configuration.
3. Adjust the <hostdev> section of the mediated device (type='mdev') to instead use an SR-IOV device that is available on the destination host. For example:
  <hostdev mode='subsystem' type='pci' managed='no' display='on' ramfb='on'> <source> <address domain='0x0000' bus='0x41' slot='0x02' function='0x3'/> </source> </hostdev>
4. Save the XML configuration.
5. When performing the VM live migration, set the VM to use the adjusted XML on the destination host:
  # virsh migrate --live \ --xml <vm_new_config>.xml \ <vm-name> \ qemu+ssh://<destination_host>/system

16.2.7. Remote desktop streaming services for NVIDIA vGPU
Copier lien

The following remote desktop streaming services are supported on the RHEL 10 hypervisor with NVIDIA vGPU or NVIDIA GPU passthrough enabled:

HP ZCentral Remote Boost/Teradici
NICE DCV
Mechdyne TGX

For support details, see the appropriate vendor support matrix.

Ce contenu n'est pas disponible dans la langue sélectionnée.

Chapter 16. Managing GPU devices in virtual machines

16.1. Assigning a GPU to a virtual machine
Copier lien

16.2. Managing NVIDIA vGPU devices
Copier lien

16.2.1. Setting up NVIDIA vGPU devices
Copier lien

16.2.2. Removing NVIDIA vGPU devices
Copier lien

16.2.3. Setting up NVIDIA vGPU legacy devices
Copier lien

16.2.4. Removing NVIDIA vGPU legacy devices
Copier lien

16.2.5. Obtaining NVIDIA legacy vGPU information about your system
Copier lien

16.2.6. Live migrating a virtual machine with an attached NVIDIA vGPU
Copier lien

16.2.7. Remote desktop streaming services for NVIDIA vGPU
Copier lien

Apprendre

Essayez, achetez et vendez

Communautés

À propos de la documentation Red Hat

Rendre l’open source plus inclusif

À propos de Red Hat

Theme

Red Hat legal and privacy links

Red Hat legal and privacy links

Ce contenu n'est pas disponible dans la langue sélectionnée.

Chapter 16. Managing GPU devices in virtual machines

16.1. Assigning a GPU to a virtual machineCopier lienLien copié sur presse-papiers!

16.2. Managing NVIDIA vGPU devicesCopier lienLien copié sur presse-papiers!

16.2.1. Setting up NVIDIA vGPU devicesCopier lienLien copié sur presse-papiers!

16.2.2. Removing NVIDIA vGPU devicesCopier lienLien copié sur presse-papiers!

16.2.3. Setting up NVIDIA vGPU legacy devicesCopier lienLien copié sur presse-papiers!

16.2.4. Removing NVIDIA vGPU legacy devicesCopier lienLien copié sur presse-papiers!

16.2.5. Obtaining NVIDIA legacy vGPU information about your systemCopier lienLien copié sur presse-papiers!

16.2.6. Live migrating a virtual machine with an attached NVIDIA vGPUCopier lienLien copié sur presse-papiers!

16.2.7. Remote desktop streaming services for NVIDIA vGPUCopier lienLien copié sur presse-papiers!

Apprendre

Essayez, achetez et vendez

Communautés

À propos de la documentation Red Hat

Rendre l’open source plus inclusif

À propos de Red Hat

Theme

Red Hat legal and privacy links

Red Hat legal and privacy links

16.1. Assigning a GPU to a virtual machine
Copier lien

16.2. Managing NVIDIA vGPU devices
Copier lien

16.2.1. Setting up NVIDIA vGPU devices
Copier lien

16.2.2. Removing NVIDIA vGPU devices
Copier lien

16.2.3. Setting up NVIDIA vGPU legacy devices
Copier lien

16.2.4. Removing NVIDIA vGPU legacy devices
Copier lien

16.2.5. Obtaining NVIDIA legacy vGPU information about your system
Copier lien

16.2.6. Live migrating a virtual machine with an attached NVIDIA vGPU
Copier lien

16.2.7. Remote desktop streaming services for NVIDIA vGPU
Copier lien