Home
Products
Red Hat Enterprise Linux
8
Configuring and managing virtualization
Chapter 13. Managing GPU devices in virtual machines

Chapter 13. Managing GPU devices in virtual machines

To enhance the graphical performance of your virtual machine (VMs) on a RHEL 8 host, you can assign a host GPU to a VM.

You can detach the GPU from the host and pass full control of the GPU directly to the VM.
You can create multiple mediated devices from a physical GPU, and assign these devices as virtual GPUs (vGPUs) to multiple guests. This is currently only supported on selected NVIDIA GPUs, and only one mediated device can be assigned to a single guest.

Important

GPU assignment is currently only supported on Intel 64 and AMD64 systems.

13.1. Assigning a GPU to a virtual machine
Copy link

To access and control GPUs that are attached to the host system, you must configure the host system to pass direct control of the GPU to the virtual machine (VM).

Note

If you are looking for information about assigning a virtual GPU, see Managing NVIDIA vGPU devices.

Prerequisites

You must enable IOMMU support on the host machine kernel.
- On an Intel host, you must enable VT-d:
  1. Regenerate the GRUB configuration with the intel_iommu=on and iommu=pt parameters:
    
    # grubby --args="intel_iommu=on iommu=pt" --update-kernel DEFAULT
    
    Copy to Clipboard Toggle word wrap
  2. Reboot the host.
- On an AMD host, you must enable AMD-Vi.
  Note that on AMD hosts, IOMMU is enabled by default, you can add iommu=pt to switch it to pass-through mode:
  1. Regenerate the GRUB configuration with the iommu=pt parameter:
    
    # grubby --args="iommu=pt" --update-kernel DEFAULT
    
    Copy to Clipboard Toggle word wrap
    
    Note
    The pt option only enables IOMMU for devices used in pass-through mode and provides better host performance. However, not all hardware supports the option. You can still assign devices even when this option is not enabled.
  2. Reboot the host.
- On 64-bit ARM hosts, assigning GPUs to VMs is currently not supported.

Procedure

Prevent the driver from binding to the GPU.
1. Identify the PCI bus address to which the GPU is attached.
  # lspci -Dnn | grep VGA 0000:02:00.0 VGA compatible controller [0300]: NVIDIA Corporation GK106GL [Quadro K4000] [10de:11fa] (rev a1)
  Copy to Clipboard Toggle word wrap
2. Prevent the host’s graphics driver from using the GPU. To do so, use the GPU PCI ID with the pci-stub driver.
  For example, the following command prevents the driver from binding to the GPU attached at the 10de:11fa bus:
  # grubby --args="pci-stub.ids=10de:11fa" --update-kernel DEFAULT
  Copy to Clipboard Toggle word wrap
3. Reboot the host.

Optional: If certain GPU functions, such as audio, cannot be passed through to the VM due to support limitations, you can modify the driver bindings of the endpoints within an IOMMU group to pass through only the necessary GPU functions.

Convert the GPU settings to XML and note the PCI address of the endpoints that you want to prevent from attaching to the host drivers.

To do so, convert the GPU’s PCI bus address to a libvirt-compatible format by adding the pci_ prefix to the address, and converting the delimiters to underscores.

For example, the following command displays the XML configuration of the GPU attached at the 0000:02:00.0 bus address.

virsh nodedev-dumpxml pci_0000_02_00_0

# virsh nodedev-dumpxml pci_0000_02_00_0

Copy to Clipboard

Toggle word wrap

<device>
 <name>pci_0000_02_00_0</name>
 <path>/sys/devices/pci0000:00/0000:00:03.0/0000:02:00.0</path>
 <parent>pci_0000_00_03_0</parent>
 <driver>
  <name>pci-stub</name>
 </driver>
 <capability type='pci'>
  <domain>0</domain>
  <bus>2</bus>
  <slot>0</slot>
  <function>0</function>
  <product id='0x11fa'>GK106GL [Quadro K4000]</product>
  <vendor id='0x10de'>NVIDIA Corporation</vendor>
  <iommuGroup number='13'>
   <address domain='0x0000' bus='0x02' slot='0x00' function='0x0'/>
   <address domain='0x0000' bus='0x02' slot='0x00' function='0x1'/>
  </iommuGroup>
  <pci-express>
   <link validity='cap' port='0' speed='8' width='16'/>
   <link validity='sta' speed='2.5' width='16'/>
  </pci-express>
 </capability>
</device>

<device>
 <name>pci_0000_02_00_0</name>
 <path>/sys/devices/pci0000:00/0000:00:03.0/0000:02:00.0</path>
 <parent>pci_0000_00_03_0</parent>
 <driver>
  <name>pci-stub</name>
 </driver>
 <capability type='pci'>
  <domain>0</domain>
  <bus>2</bus>
  <slot>0</slot>
  <function>0</function>
  <product id='0x11fa'>GK106GL [Quadro K4000]</product>
  <vendor id='0x10de'>NVIDIA Corporation</vendor>
  <iommuGroup number='13'>
   <address domain='0x0000' bus='0x02' slot='0x00' function='0x0'/>
   <address domain='0x0000' bus='0x02' slot='0x00' function='0x1'/>
  </iommuGroup>
  <pci-express>
   <link validity='cap' port='0' speed='8' width='16'/>
   <link validity='sta' speed='2.5' width='16'/>
  </pci-express>
 </capability>
</device>

Copy to Clipboard

Toggle word wrap

Prevent the endpoints from attaching to the host driver.
In this example, to assign the GPU to a VM, prevent the endpoints that correspond to the audio function, <address domain='0x0000' bus='0x02' slot='0x00' function='0x1'/>, from attaching to the host audio driver, and instead attach the endpoints to VFIO-PCI.
```
driverctl set-override 0000:02:00.1 vfio-pci
```
```
# driverctl set-override 0000:02:00.1 vfio-pci
```
Copy to Clipboard Toggle word wrap

Attach the GPU to the VM
1. Create an XML configuration file for the GPU by using the PCI bus address.
  For example, you can create the following XML file, GPU-Assign.xml, by using parameters from the GPU’s bus address.
  <hostdev mode='subsystem' type='pci' managed='yes'> <driver name='vfio'/> <source> <address domain='0x0000' bus='0x02' slot='0x00' function='0x0'/> </source> </hostdev>
  Copy to Clipboard Toggle word wrap
2. Save the file on the host system.
3. Merge the file with the VM’s XML configuration.
  For example, the following command merges the GPU XML file, GPU-Assign.xml, with the XML configuration file of the System1 VM.
  # virsh attach-device System1 --file /home/GPU-Assign.xml --persistent Device attached successfully.
  Copy to Clipboard Toggle word wrap
  Note
  The GPU is attached as a secondary graphics device to the VM. Assigning a GPU as the primary graphics device is not supported, and Red Hat does not recommend removing the primary emulated graphics device in the VM’s XML configuration.

Verification

The device appears under the <devices> section in VM’s XML configuration. For more information, see Sample virtual machine XML configuration.

Known Issues

The number of GPUs that can be attached to a VM is limited by the maximum number of assigned PCI devices, which in RHEL 8 is currently 64. However, attaching multiple GPUs to a VM is likely to cause problems with memory-mapped I/O (MMIO) on the guest, which may result in the GPUs not being available to the VM.
To work around these problems, set a larger 64-bit MMIO space and configure the vCPU physical address bits to make the extended 64-bit MMIO space addressable.
Attaching an NVIDIA GPU device to a VM that uses a RHEL 8 guest operating system currently disables the Wayland session on that VM, and loads an Xorg session instead. This is because of incompatibilities between NVIDIA drivers and Wayland.

13.2. Managing NVIDIA vGPU devices
Copy link

The vGPU feature makes it possible to divide a physical NVIDIA GPU device into multiple virtual devices, referred to as mediated devices. These mediated devices can then be assigned to multiple virtual machines (VMs) as virtual GPUs. As a result, these VMs can share the performance of a single physical GPU.

Important

Assigning a physical GPU to VMs, with or without using mediated devices, makes it impossible for the host to use the GPU.

13.2.1. Setting up NVIDIA vGPU devices
Copy link

To set up the NVIDIA vGPU feature, you need to download NVIDIA vGPU drivers for your GPU device, create mediated devices, and assign them to the intended virtual machines. For detailed instructions, see below.

Prerequisites

Your GPU supports vGPU mediated devices. For an up-to-date list of NVIDIA GPUs that support creating vGPUs, see the NVIDIA vGPU software documentation.

If you do not know which GPU your host is using, install the lshw package and use the lshw -C display command. The following example shows the system is using an NVIDIA Tesla P4 GPU, compatible with vGPU.

lshw -C display

# lshw -C display

*-display
       description: 3D controller
       product: GP104GL [Tesla P4]
       vendor: NVIDIA Corporation
       physical id: 0
       bus info: pci@0000:01:00.0
       version: a1
       width: 64 bits
       clock: 33MHz
       capabilities: pm msi pciexpress cap_list
       configuration: driver=vfio-pci latency=0
       resources: irq:16 memory:f6000000-f6ffffff memory:e0000000-efffffff memory:f0000000-f1ffffff

Copy to Clipboard

Toggle word wrap

Procedure

Download the NVIDIA vGPU drivers and install them on your system. For instructions, see the NVIDIA documentation.
If the NVIDIA software installer did not create the /etc/modprobe.d/nvidia-installer-disable-nouveau.conf file, create a conf file of any name in /etc/modprobe.d/, and add the following lines in the file:
```
blacklist nouveau
options nouveau modeset=0
```
```
blacklist nouveau
options nouveau modeset=0
```
Copy to Clipboard Toggle word wrap
Regenerate the initial ramdisk for the current kernel, then reboot.
```
dracut --force
reboot
```
```
# dracut --force
# reboot
```
Copy to Clipboard Toggle word wrap

Check that the kernel has loaded the nvidia_vgpu_vfio module and that the nvidia-vgpu-mgr.service service is running.

lsmod | grep nvidia_vgpu_vfio
systemctl status nvidia-vgpu-mgr.service

# lsmod | grep nvidia_vgpu_vfio
nvidia_vgpu_vfio 45011 0
nvidia 14333621 10 nvidia_vgpu_vfio
mdev 20414 2 vfio_mdev,nvidia_vgpu_vfio
vfio 32695 3 vfio_mdev,nvidia_vgpu_vfio,vfio_iommu_type1

# systemctl status nvidia-vgpu-mgr.service
nvidia-vgpu-mgr.service - NVIDIA vGPU Manager Daemon
   Loaded: loaded (/usr/lib/systemd/system/nvidia-vgpu-mgr.service; enabled; vendor preset: disabled)
   Active: active (running) since Fri 2018-03-16 10:17:36 CET; 5h 8min ago
 Main PID: 1553 (nvidia-vgpu-mgr)
 [...]

Copy to Clipboard

Toggle word wrap

In addition, if creating vGPU based on an NVIDIA Ampere GPU device, ensure that virtual functions are enable for the physical GPU. For instructions, see the NVIDIA documentation.

Generate a device UUID.
```
uuidgen
```
```
# uuidgen
30820a6f-b1a5-4503-91ca-0c10ba58692a
```
Copy to Clipboard Toggle word wrap

Prepare an XML file with a configuration of the mediated device, based on the detected GPU hardware. For example, the following configures a mediated device of the nvidia-63 vGPU type on an NVIDIA Tesla P4 card that runs on the 0000:01:00.0 PCI bus and uses the UUID generated in the previous step.

<device>
    <parent>pci_0000_01_00_0</parent>
    <capability type="mdev">
        <type id="nvidia-63"/>
        <uuid>30820a6f-b1a5-4503-91ca-0c10ba58692a</uuid>
    </capability>
</device>

<device>
    <parent>pci_0000_01_00_0</parent>
    <capability type="mdev">
        <type id="nvidia-63"/>
        <uuid>30820a6f-b1a5-4503-91ca-0c10ba58692a</uuid>
    </capability>
</device>

Copy to Clipboard

Toggle word wrap

Define a vGPU mediated device based on the XML file you prepared. For example:

virsh nodedev-define vgpu-test.xml

# virsh nodedev-define vgpu-test.xml
Node device mdev_30820a6f_b1a5_4503_91ca_0c10ba58692a_0000_01_00_0 created from vgpu-test.xml

Copy to Clipboard

Toggle word wrap

Optional: Verify that the mediated device is listed as inactive.

virsh nodedev-list --cap mdev --inactive

# virsh nodedev-list --cap mdev --inactive
mdev_30820a6f_b1a5_4503_91ca_0c10ba58692a_0000_01_00_0

Copy to Clipboard

Toggle word wrap

Start the vGPU mediated device you created.

virsh nodedev-start mdev_30820a6f_b1a5_4503_91ca_0c10ba58692a_0000_01_00_0

# virsh nodedev-start mdev_30820a6f_b1a5_4503_91ca_0c10ba58692a_0000_01_00_0
Device mdev_30820a6f_b1a5_4503_91ca_0c10ba58692a_0000_01_00_0 started

Copy to Clipboard

Toggle word wrap

Optional: Ensure that the mediated device is listed as active.

virsh nodedev-list --cap mdev

# virsh nodedev-list --cap mdev
mdev_30820a6f_b1a5_4503_91ca_0c10ba58692a_0000_01_00_0

Copy to Clipboard

Toggle word wrap

Set the vGPU device to start automatically after the host reboots

virsh nodedev-autostart mdev_30820a6f_b1a5_4503_91ca_0c10ba58692a_0000_01_00_0

# virsh nodedev-autostart mdev_30820a6f_b1a5_4503_91ca_0c10ba58692a_0000_01_00_0
Device mdev_30820a6f_b1a5_4503_91ca_0c10ba58692a_0000_01_00_0 marked as autostarted

Copy to Clipboard

Toggle word wrap

Attach the mediated device to a VM that you want to share the vGPU resources. To do so, add the following lines, along with the previously genereated UUID, to the <devices/> sections in the XML configuration of the VM.
```
<hostdev mode='subsystem' type='mdev' managed='no' model='vfio-pci' display='on'>
  <source>
    <address uuid='30820a6f-b1a5-4503-91ca-0c10ba58692a'/>
  </source>
</hostdev>
```
```
<hostdev mode='subsystem' type='mdev' managed='no' model='vfio-pci' display='on'>
  <source>
    <address uuid='30820a6f-b1a5-4503-91ca-0c10ba58692a'/>
  </source>
</hostdev>
```
Copy to Clipboard Toggle word wrap
Note that each UUID can only be assigned to one VM at a time. In addition, if the VM does not have QEMU video devices, such as virtio-vga, add also the ramfb='on' parameter on the <hostdev> line.
For full functionality of the vGPU mediated devices to be available on the assigned VMs, set up NVIDIA vGPU guest software licensing on the VMs. For further information and instructions, see the NVIDIA Virtual GPU Software License Server User Guide.

Verification

Query the capabilities of the vGPU you created, and ensure it is listed as active and persistent.

virsh nodedev-info mdev_30820a6f_b1a5_4503_91ca_0c10ba58692a_0000_01_00_0

# virsh nodedev-info mdev_30820a6f_b1a5_4503_91ca_0c10ba58692a_0000_01_00_0
Name:           virsh nodedev-autostart mdev_30820a6f_b1a5_4503_91ca_0c10ba58692a_0000_01_00_0
Parent:         pci_0000_01_00_0
Active:         yes
Persistent:     yes
Autostart:      yes

Copy to Clipboard

Toggle word wrap

Start the VM and verify that the guest operating system detects the mediated device as an NVIDIA GPU. For example, if the VM uses Linux:

lspci -d 10de: -k

# lspci -d 10de: -k
07:00.0 VGA compatible controller: NVIDIA Corporation GV100GL [Tesla V100 SXM2 32GB] (rev a1)
        Subsystem: NVIDIA Corporation Device 12ce
        Kernel driver in use: nvidia
        Kernel modules: nouveau, nvidia_drm, nvidia

Copy to Clipboard

Toggle word wrap

Known Issues

Assigning an NVIDIA vGPU mediated device to a VM that uses a RHEL 8 guest operating system currently disables the Wayland session on that VM, and loads an Xorg session instead. This is because of incompatibilities between NVIDIA drivers and Wayland.

13.2.2. Removing NVIDIA vGPU devices
Copy link

To change the configuration of assigned vGPU mediated devices, you need to remove the existing devices from the assigned VMs. For instructions, see below:

Prerequisites

The VM from which you want to remove the device is shut down.

Procedure

Obtain the ID of the mediated device that you want to remove.

virsh nodedev-list --cap mdev

# virsh nodedev-list --cap mdev
mdev_30820a6f_b1a5_4503_91ca_0c10ba58692a_0000_01_00_0

Copy to Clipboard

Toggle word wrap

Stop the running instance of the vGPU mediated device.

virsh nodedev-destroy mdev_30820a6f_b1a5_4503_91ca_0c10ba58692a_0000_01_00_0

# virsh nodedev-destroy mdev_30820a6f_b1a5_4503_91ca_0c10ba58692a_0000_01_00_0
Destroyed node device 'mdev_30820a6f_b1a5_4503_91ca_0c10ba58692a_0000_01_00_0'

Copy to Clipboard

Toggle word wrap

Optional: Ensure the mediated device has been deactivated.

virsh nodedev-info mdev_30820a6f_b1a5_4503_91ca_0c10ba58692a_0000_01_00_0

# virsh nodedev-info mdev_30820a6f_b1a5_4503_91ca_0c10ba58692a_0000_01_00_0
Name:           virsh nodedev-autostart mdev_30820a6f_b1a5_4503_91ca_0c10ba58692a_0000_01_00_0
Parent:         pci_0000_01_00_0
Active:         no
Persistent:     yes
Autostart:      yes

Copy to Clipboard

Toggle word wrap

Remove the device from the XML configuration of the VM. To do so, use the virsh edit utility to edit the XML configuration of the VM, and remove the mdev’s configuration segment. The segment will look similar to the following:
```
<hostdev mode='subsystem' type='mdev' managed='no' model='vfio-pci'>
  <source>
    <address uuid='30820a6f-b1a5-4503-91ca-0c10ba58692a'/>
  </source>
</hostdev>
```
```
<hostdev mode='subsystem' type='mdev' managed='no' model='vfio-pci'>
  <source>
    <address uuid='30820a6f-b1a5-4503-91ca-0c10ba58692a'/>
  </source>
</hostdev>
```
Copy to Clipboard Toggle word wrap
Note that stopping and detaching the mediated device does not delete it, but rather keeps it as defined. As such, you can restart and attach the device to a different VM.

Optional: To delete the stopped mediated device, remove its definition.

virsh nodedev-undefine mdev_30820a6f_b1a5_4503_91ca_0c10ba58692a_0000_01_00_0

# virsh nodedev-undefine mdev_30820a6f_b1a5_4503_91ca_0c10ba58692a_0000_01_00_0
Undefined node device 'mdev_30820a6f_b1a5_4503_91ca_0c10ba58692a_0000_01_00_0'

Copy to Clipboard

Toggle word wrap

Verification

If you only stopped and detached the device, ensure the mediated device is listed as inactive.
```
virsh nodedev-list --cap mdev --inactive
```
```
# virsh nodedev-list --cap mdev --inactive
mdev_30820a6f_b1a5_4503_91ca_0c10ba58692a_0000_01_00_0
```
Copy to Clipboard Toggle word wrap
If you also deleted the device, ensure the following command does not display it.
```
virsh nodedev-list --cap mdev
```
```
# virsh nodedev-list --cap mdev
```
Copy to Clipboard Toggle word wrap

13.2.3. Obtaining NVIDIA vGPU information about your system
Copy link

To evaluate the capabilities of the vGPU features available to you, you can obtain additional information about the mediated devices on your system, such as:

How many mediated devices of a given type can be created
What mediated devices are already configured on your system.

Procedure

To see the available GPUs devices on your host that can support vGPU mediated devices, use the virsh nodedev-list --cap mdev_types command. For example, the following shows a system with two NVIDIA Quadro RTX6000 devices.
```
virsh nodedev-list --cap mdev_types
```
```
# virsh nodedev-list --cap mdev_types
pci_0000_5b_00_0
pci_0000_9b_00_0
```
Copy to Clipboard Toggle word wrap

To display vGPU types supported by a specific GPU device, as well as additional metadata, use the virsh nodedev-dumpxml command.

virsh nodedev-dumpxml pci_0000_9b_00_0

# virsh nodedev-dumpxml pci_0000_9b_00_0
<device>
  <name>pci_0000_9b_00_0</name>
  <path>/sys/devices/pci0000:9a/0000:9a:00.0/0000:9b:00.0</path>
  <parent>pci_0000_9a_00_0</parent>
  <driver>
    <name>nvidia</name>
  </driver>
  <capability type='pci'>
    <class>0x030000</class>
    <domain>0</domain>
    <bus>155</bus>
    <slot>0</slot>
    <function>0</function>
    <product id='0x1e30'>TU102GL [Quadro RTX 6000/8000]</product>
    <vendor id='0x10de'>NVIDIA Corporation</vendor>
    <capability type='mdev_types'>
      <type id='nvidia-346'>
        <name>GRID RTX6000-12C</name>
        <deviceAPI>vfio-pci</deviceAPI>
        <availableInstances>2</availableInstances>
      </type>
      <type id='nvidia-439'>
        <name>GRID RTX6000-3A</name>
        <deviceAPI>vfio-pci</deviceAPI>
        <availableInstances>8</availableInstances>
      </type>
      [...]
      <type id='nvidia-440'>
        <name>GRID RTX6000-4A</name>
        <deviceAPI>vfio-pci</deviceAPI>
        <availableInstances>6</availableInstances>
      </type>
      <type id='nvidia-261'>
        <name>GRID RTX6000-8Q</name>
        <deviceAPI>vfio-pci</deviceAPI>
        <availableInstances>3</availableInstances>
      </type>
    </capability>
    <iommuGroup number='216'>
      <address domain='0x0000' bus='0x9b' slot='0x00' function='0x3'/>
      <address domain='0x0000' bus='0x9b' slot='0x00' function='0x1'/>
      <address domain='0x0000' bus='0x9b' slot='0x00' function='0x2'/>
      <address domain='0x0000' bus='0x9b' slot='0x00' function='0x0'/>
    </iommuGroup>
    <numa node='2'/>
    <pci-express>
      <link validity='cap' port='0' speed='8' width='16'/>
      <link validity='sta' speed='2.5' width='8'/>
    </pci-express>
  </capability>
</device>

Copy to Clipboard

Toggle word wrap

13.2.4. Remote desktop streaming services for NVIDIA vGPU
Copy link

The following remote desktop streaming services are supported on the RHEL 8 hypervisor with NVIDIA vGPU or NVIDIA GPU passthrough enabled:

HP ZCentral Remote Boost/Teradici
NICE DCV
Mechdyne TGX

For support details, see the appropriate vendor support matrix.

Chapter 13. Managing GPU devices in virtual machines

13.1. Assigning a GPU to a virtual machine
Copy link

13.2. Managing NVIDIA vGPU devices
Copy link

13.2.1. Setting up NVIDIA vGPU devices
Copy link

13.2.2. Removing NVIDIA vGPU devices
Copy link

13.2.3. Obtaining NVIDIA vGPU information about your system
Copy link

13.2.4. Remote desktop streaming services for NVIDIA vGPU
Copy link

Learn

Try, buy, & sell

Communities

About Red Hat Documentation

Making open source more inclusive

About Red Hat

Theme

Red Hat legal and privacy links

Red Hat legal and privacy links

Chapter 13. Managing GPU devices in virtual machines

13.1. Assigning a GPU to a virtual machineCopy linkLink copied to clipboard!

13.2. Managing NVIDIA vGPU devicesCopy linkLink copied to clipboard!

13.2.1. Setting up NVIDIA vGPU devicesCopy linkLink copied to clipboard!

13.2.2. Removing NVIDIA vGPU devicesCopy linkLink copied to clipboard!

13.2.3. Obtaining NVIDIA vGPU information about your systemCopy linkLink copied to clipboard!

13.2.4. Remote desktop streaming services for NVIDIA vGPUCopy linkLink copied to clipboard!

Learn

Try, buy, & sell

Communities

About Red Hat Documentation

Making open source more inclusive

About Red Hat

Theme

Red Hat legal and privacy links

Red Hat legal and privacy links

13.1. Assigning a GPU to a virtual machine
Copy link

13.2. Managing NVIDIA vGPU devices
Copy link

13.2.1. Setting up NVIDIA vGPU devices
Copy link

13.2.2. Removing NVIDIA vGPU devices
Copy link

13.2.3. Obtaining NVIDIA vGPU information about your system
Copy link

13.2.4. Remote desktop streaming services for NVIDIA vGPU
Copy link