Rechercher

15.2. Managing NVIDIA vGPU devices

download PDF

The vGPU feature makes it possible to divide a physical NVIDIA GPU device into multiple virtual devices, referred to as mediated devices. These mediated devices can then be assigned to multiple virtual machines (VMs) as virtual GPUs. As a result, these VMs can share the performance of a single physical GPU.

Important

Assigning a physical GPU to VMs, with or without using mediated devices, makes it impossible for the host to use the GPU.

15.2.1. Setting up NVIDIA vGPU devices

To set up the NVIDIA vGPU feature, you need to download NVIDIA vGPU drivers for your GPU device, create mediated devices, and assign them to the intended virtual machines. For detailed instructions, see below.

Conditions préalables

  • Your GPU supports vGPU mediated devices. For an up-to-date list of NVIDIA GPUs that support creating vGPUs, see the NVIDIA vGPU software documentation.

    • If you do not know which GPU your host is using, install the lshw package and use the lshw -C display command. The following example shows the system is using an NVIDIA Tesla P4 GPU, compatible with vGPU.

      # lshw -C display
      
      *-display
             description: 3D controller
             product: GP104GL [Tesla P4]
             vendor: NVIDIA Corporation
             physical id: 0
             bus info: pci@0000:01:00.0
             version: a1
             width: 64 bits
             clock: 33MHz
             capabilities: pm msi pciexpress cap_list
             configuration: driver=vfio-pci latency=0
             resources: irq:16 memory:f6000000-f6ffffff memory:e0000000-efffffff memory:f0000000-f1ffffff

Procédure

  1. Download the NVIDIA vGPU drivers and install them on your system. For instructions, see the NVIDIA documentation.
  2. If the NVIDIA software installer did not create the /etc/modprobe.d/nvidia-installer-disable-nouveau.conf file, create a conf file of any name in /etc/modprobe.d/, and add the following lines in the file:

    blacklist nouveau
    options nouveau modeset=0
  3. Regenerate the initial ramdisk for the current kernel, then reboot.

    # dracut --force
    # reboot
  4. Check that the kernel has loaded the nvidia_vgpu_vfio module and that the nvidia-vgpu-mgr.service service is running.

    # lsmod | grep nvidia_vgpu_vfio
    nvidia_vgpu_vfio 45011 0
    nvidia 14333621 10 nvidia_vgpu_vfio
    mdev 20414 2 vfio_mdev,nvidia_vgpu_vfio
    vfio 32695 3 vfio_mdev,nvidia_vgpu_vfio,vfio_iommu_type1
    
    # systemctl status nvidia-vgpu-mgr.service
    nvidia-vgpu-mgr.service - NVIDIA vGPU Manager Daemon
       Loaded: loaded (/usr/lib/systemd/system/nvidia-vgpu-mgr.service; enabled; vendor preset: disabled)
       Active: active (running) since Fri 2018-03-16 10:17:36 CET; 5h 8min ago
     Main PID: 1553 (nvidia-vgpu-mgr)
     [...]

    In addition, if creating vGPU based on an NVIDIA Ampere GPU device, ensure that virtual functions are enable for the physical GPU. For instructions, see the NVIDIA documentation.

  5. Generate a device UUID.

    # uuidgen
    30820a6f-b1a5-4503-91ca-0c10ba58692a
  6. Prepare an XML file with a configuration of the mediated device, based on the detected GPU hardware. For example, the following configures a mediated device of the nvidia-63 vGPU type on an NVIDIA Tesla P4 card that runs on the 0000:01:00.0 PCI bus and uses the UUID generated in the previous step.

    <device>
        <parent>pci_0000_01_00_0</parent>
        <capability type="mdev">
            <type id="nvidia-63"/>
            <uuid>30820a6f-b1a5-4503-91ca-0c10ba58692a</uuid>
        </capability>
    </device>
  7. Define a vGPU mediated device based on the XML file you prepared. For example:

    # virsh nodedev-define vgpu-test.xml
    Node device mdev_30820a6f_b1a5_4503_91ca_0c10ba58692a_0000_01_00_0 created from vgpu-test.xml
  8. Optional: Verify that the mediated device is listed as inactive.

    # virsh nodedev-list --cap mdev --inactive
    mdev_30820a6f_b1a5_4503_91ca_0c10ba58692a_0000_01_00_0
  9. Start the vGPU mediated device you created.

    [[starting-nvidia-vgpu-devices_{context}]]
    # virsh nodedev-start mdev_30820a6f_b1a5_4503_91ca_0c10ba58692a_0000_01_00_0
    Device mdev_30820a6f_b1a5_4503_91ca_0c10ba58692a_0000_01_00_0 started
  10. Optional: Ensure that the mediated device is listed as active.

    # virsh nodedev-list --cap mdev
    mdev_30820a6f_b1a5_4503_91ca_0c10ba58692a_0000_01_00_0
  11. Set the vGPU device to start automatically after the host reboots

    # virsh nodedev-autostart mdev_30820a6f_b1a5_4503_91ca_0c10ba58692a_0000_01_00_0
    Device mdev_d196754e_d8ed_4f43_bf22_684ed698b08b_0000_9b_00_0 marked as autostarted
  12. Attach the mediated device to a VM that you want to share the vGPU resources. To do so, add the following lines, along with the previously genereated UUID, to the <devices/> sections in the XML configuration of the VM.

    <hostdev mode='subsystem' type='mdev' managed='no' model='vfio-pci' display='on'>
      <source>
        <address uuid='30820a6f-b1a5-4503-91ca-0c10ba58692a'/>
      </source>
    </hostdev>

    Note that each UUID can only be assigned to one VM at a time. In addition, if the VM does not have QEMU video devices, such as virtio-vga, add also the ramfb='on' parameter on the <hostdev> line.

  13. For full functionality of the vGPU mediated devices to be available on the assigned VMs, set up NVIDIA vGPU guest software licensing on the VMs. For further information and instructions, see the NVIDIA Virtual GPU Software License Server User Guide.

Vérification

  1. Query the capabilities of the vGPU you created, and ensure it is listed as active and persistent.

    # virsh nodedev-info mdev_30820a6f_b1a5_4503_91ca_0c10ba58692a_0000_01_00_0
    Name:           virsh nodedev-autostart mdev_30820a6f_b1a5_4503_91ca_0c10ba58692a_0000_01_00_0
    Parent:         pci_0000_01_00_0
    Active:         yes
    Persistent:     yes
    Autostart:      yes
  2. Start the VM and verify that the guest operating system detects the mediated device as an NVIDIA GPU. For example, if the VM uses Linux:

    # lspci -d 10de: -k
    07:00.0 VGA compatible controller: NVIDIA Corporation GV100GL [Tesla V100 SXM2 32GB] (rev a1)
            Subsystem: NVIDIA Corporation Device 12ce
            Kernel driver in use: nvidia
            Kernel modules: nouveau, nvidia_drm, nvidia

Problèmes connus

  • Assigning an NVIDIA vGPU mediated device to a VM that uses a RHEL 9 guest operating system currently disables the Wayland session on that VM, and loads an Xorg session instead. This is because of incompatibilities between NVIDIA drivers and Wayland.

Ressources supplémentaires

15.2.2. Removing NVIDIA vGPU devices

To change the configuration of assigned vGPU mediated devices, you need to remove the existing devices from the assigned VMs. For instructions, see below:

Conditions préalables

  • The VM from which you want to remove the device is shut down.

Procédure

  1. Obtain the ID of the mediated device that you want to remove.

    # virsh nodedev-list --cap mdev
    mdev_30820a6f_b1a5_4503_91ca_0c10ba58692a_0000_01_00_0
  2. Stop the running instance of the vGPU mediated device.

    # virsh nodedev-destroy mdev_30820a6f_b1a5_4503_91ca_0c10ba58692a_0000_01_00_0
    Destroyed node device 'mdev_30820a6f_b1a5_4503_91ca_0c10ba58692a_0000_01_00_0'
  3. Optional: Ensure the mediated device has been deactivated.

    # virsh nodedev-info mdev_30820a6f_b1a5_4503_91ca_0c10ba58692a_0000_01_00_0
    Name:           virsh nodedev-autostart mdev_30820a6f_b1a5_4503_91ca_0c10ba58692a_0000_01_00_0
    Parent:         pci_0000_01_00_0
    Active:         no
    Persistent:     yes
    Autostart:      yes
  4. Remove the device from the XML configuration of the VM. To do so, use the virsh edit utility to edit the XML configuration of the VM, and remove the mdev’s configuration segment. The segment will look similar to the following:

    <hostdev mode='subsystem' type='mdev' managed='no' model='vfio-pci'>
      <source>
        <address uuid='30820a6f-b1a5-4503-91ca-0c10ba58692a'/>
      </source>
    </hostdev>

    Note that stopping and detaching the mediated device does not delete it, but rather keeps it as defined. As such, you can restart and attach the device to a different VM.

  5. Optional: To delete the stopped mediated device, remove its definition.

    # virsh nodedev-undefine mdev_30820a6f_b1a5_4503_91ca_0c10ba58692a_0000_01_00_0
    Undefined node device 'mdev_30820a6f_b1a5_4503_91ca_0c10ba58692a_0000_01_00_0'

Vérification

  • If you only stopped and detached the device, ensure the mediated device is listed as inactive.

    # virsh nodedev-list --cap mdev --inactive
    mdev_30820a6f_b1a5_4503_91ca_0c10ba58692a_0000_01_00_0
  • If you also deleted the device, ensure the following command does not display it.

    # virsh nodedev-list --cap mdev

Ressources supplémentaires

  • The man virsh command

15.2.3. Obtaining NVIDIA vGPU information about your system

To evaluate the capabilities of the vGPU features available to you, you can obtain additional information about the mediated devices on your system, such as:

  • How many mediated devices of a given type can be created
  • What mediated devices are already configured on your system.

Procédure

  • To see the available GPUs devices on your host that can support vGPU mediated devices, use the virsh nodedev-list --cap mdev_types command. For example, the following shows a system with two NVIDIA Quadro RTX6000 devices.

    # virsh nodedev-list --cap mdev_types
    pci_0000_5b_00_0
    pci_0000_9b_00_0
  • To display vGPU types supported by a specific GPU device, as well as additional metadata, use the virsh nodedev-dumpxml command.

    # virsh nodedev-dumpxml pci_0000_9b_00_0
    <device>
      <name>pci_0000_9b_00_0</name>
      <path>/sys/devices/pci0000:9a/0000:9a:00.0/0000:9b:00.0</path>
      <parent>pci_0000_9a_00_0</parent>
      <driver>
        <name>nvidia</name>
      </driver>
      <capability type='pci'>
        <class>0x030000</class>
        <domain>0</domain>
        <bus>155</bus>
        <slot>0</slot>
        <function>0</function>
        <product id='0x1e30'>TU102GL [Quadro RTX 6000/8000]</product>
        <vendor id='0x10de'>NVIDIA Corporation</vendor>
        <capability type='mdev_types'>
          <type id='nvidia-346'>
            <name>GRID RTX6000-12C</name>
            <deviceAPI>vfio-pci</deviceAPI>
            <availableInstances>2</availableInstances>
          </type>
          <type id='nvidia-439'>
            <name>GRID RTX6000-3A</name>
            <deviceAPI>vfio-pci</deviceAPI>
            <availableInstances>8</availableInstances>
          </type>
          [...]
          <type id='nvidia-440'>
            <name>GRID RTX6000-4A</name>
            <deviceAPI>vfio-pci</deviceAPI>
            <availableInstances>6</availableInstances>
          </type>
          <type id='nvidia-261'>
            <name>GRID RTX6000-8Q</name>
            <deviceAPI>vfio-pci</deviceAPI>
            <availableInstances>3</availableInstances>
          </type>
        </capability>
        <iommuGroup number='216'>
          <address domain='0x0000' bus='0x9b' slot='0x00' function='0x3'/>
          <address domain='0x0000' bus='0x9b' slot='0x00' function='0x1'/>
          <address domain='0x0000' bus='0x9b' slot='0x00' function='0x2'/>
          <address domain='0x0000' bus='0x9b' slot='0x00' function='0x0'/>
        </iommuGroup>
        <numa node='2'/>
        <pci-express>
          <link validity='cap' port='0' speed='8' width='16'/>
          <link validity='sta' speed='2.5' width='8'/>
        </pci-express>
      </capability>
    </device>

Ressources supplémentaires

  • The man virsh command

15.2.4. Remote desktop streaming services for NVIDIA vGPU

The following remote desktop streaming services are supported on the RHEL 9 hypervisor with NVIDIA vGPU or NVIDIA GPU passthrough enabled:

  • HP ZCentral Remote Boost/Teradici
  • NICE DCV
  • Mechdyne TGX

For support details, see the appropriate vendor support matrix.

Red Hat logoGithubRedditYoutubeTwitter

Apprendre

Essayez, achetez et vendez

Communautés

À propos de la documentation Red Hat

Nous aidons les utilisateurs de Red Hat à innover et à atteindre leurs objectifs grâce à nos produits et services avec un contenu auquel ils peuvent faire confiance.

Rendre l’open source plus inclusif

Red Hat s'engage à remplacer le langage problématique dans notre code, notre documentation et nos propriétés Web. Pour plus de détails, consultez leBlog Red Hat.

À propos de Red Hat

Nous proposons des solutions renforcées qui facilitent le travail des entreprises sur plusieurs plates-formes et environnements, du centre de données central à la périphérie du réseau.

© 2024 Red Hat, Inc.