Home
Products
Red Hat OpenStack Platform
17.1
Configuring the Compute service for instance creation
Chapter 10. Configuring virtual GPUs for instances

Chapter 10. Configuring virtual GPUs for instances

To support GPU-based rendering on your instances, you can define and manage virtual GPU (vGPU) resources according to your available physical GPU devices and your hypervisor type. You can use this configuration to divide the rendering workloads between all your physical GPU devices more effectively, and to have more control over scheduling your vGPU-enabled instances.

To enable vGPU in the Compute (nova) service, create flavors that your cloud users can use to create Red Hat Enterprise Linux (RHEL) instances with vGPU devices. Each instance can then support GPU workloads with virtual GPU devices that correspond to the physical GPU devices.

The Compute service tracks the number of vGPU devices that are available for each GPU profile you define on each host. The Compute service schedules instances to these hosts based on the flavor, attaches the devices, and monitors usage on an ongoing basis. When an instance is deleted, the Compute service adds the vGPU devices back to the available pool.

Important

Red Hat enables the use of NVIDIA vGPU in RHOSP without the requirement for support exceptions. However, Red Hat does not provide technical support for the NVIDIA vGPU drivers. The NVIDIA vGPU drivers are shipped and supported by NVIDIA. You require an NVIDIA Certified Support Services subscription to obtain NVIDIA Enterprise Support for NVIDIA vGPU software. For issues that result from the use of NVIDIA vGPUs where you are unable to reproduce the issue on a supported component, the following support policies apply:

When Red Hat does not suspect that the third-party component is involved in the issue, the normal Scope of Support and Red Hat SLA apply.
When Red Hat suspects that the third-party component is involved in the issue, the customer will be directed to NVIDIA in line with the Red Hat third party support and certification policies. For more information, see the Knowledge Base article Obtaining Support from NVIDIA.

10.1. Supported configurations and limitations
Copy link

Supported GPU cards

For a list of supported NVIDIA GPU cards, see Virtual GPU Software Supported Products on the NVIDIA website.

Limitations when using vGPU devices

Each instance can use only one vGPU resource.
Live migration of vGPU instances between hosts is not supported.
Evacuation of vGPU instances is not supported.
If you need to reboot the Compute node that hosts the vGPU instances, the vGPUs are not automatically reassigned to the recreated instances. You must either cold migrate the instances before you reboot the Compute node, or manually allocate each vGPU to the correct instance after reboot. To manually allocate each vGPU, you must retrieve the mdev UUID from the instance XML for each vGPU instance that runs on the Compute node before you reboot. You can use one of the following commands to discover the mdev UUID for each instance:
- For a RHEL version 8.4 Compute node:
  $ sudo podman exec -it nova_libvirt virsh dumpxml <instance_name> | grep mdev
  Copy to Clipboard Toggle word wrap
- For a RHEL version 9.2 Compute node:
  $ sudo podman exec -it nova_virtqemud virsh dumpxml <instance_name> | grep mdev
  Copy to Clipboard Toggle word wrap
  Replace <instance_name> with the libvirt instance name, OS-EXT-SRV-ATTR:instance_name, returned in a /servers request to the Compute API.
Suspend operations on a vGPU-enabled instance is not supported due to a libvirt limitation. Instead, you can snapshot or shelve the instance.
By default, vGPU types on Compute hosts are not exposed to API users. To expose the vGPU types on Compute hosts to API users, you must configure resource provider traits and create flavors that require the traits. For more information, see Creating a custom vGPU resource provider trait. Alternatively, if you only have one vGPU type, you can grant access by adding the hosts to a host aggregate. For more information, see Creating and managing host aggregates.
If you use NVIDIA accelerator hardware, you must comply with the NVIDIA licensing requirements. For example, NVIDIA vGPU GRID requires a licensing server. For more information about the NVIDIA licensing requirements, see NVIDIA License Server Release Notes on the NVIDIA website.

10.2. Configuring vGPU on the Compute nodes
Copy link

To enable your cloud users to create instances that use a virtual GPU (vGPU), you must configure the Compute nodes that have the physical GPUs:

Designate Compute nodes for vGPU.
Configure the Compute node for vGPU.
Deploy the overcloud.
Optional: Create custom traits for vGPU types.
Optional: Create a custom GPU instance image.
Create a vGPU flavor for launching instances that have vGPU.

Tip

If the GPU hardware is limited, you can also configure a host aggregate to optimize scheduling on the vGPU Compute nodes. To schedule only instances that request vGPUs on the vGPU Compute nodes, create a host aggregate of the vGPU Compute nodes, and configure the Compute scheduler to place only vGPU instances on the host aggregate. For more information, see Creating and managing host aggregates and Filtering by isolating host aggregates.

Note

To use an NVIDIA GRID vGPU, you must comply with the NVIDIA GRID licensing requirements and you must have the URL of your self-hosted license server. For more information, see the NVIDIA License Server Release Notes web page.

10.2.1. Prerequisites
Copy link

You have downloaded the NVIDIA GRID host driver RPM package that corresponds to your GPU device from the NVIDIA website. To determine which driver you need, see the NVIDIA Driver Downloads Portal. You must be a registered NVIDIA customer to download the drivers from the portal.
You have built a custom overcloud image that has the NVIDIA GRID host driver installed.

10.2.2. Designating Compute nodes for vGPU
Copy link

To designate Compute nodes for vGPU workloads, you must create a new role file to configure the vGPU role, and configure the bare metal nodes with a GPU resource class to use to tag the GPU-enabled Compute nodes.

Note

The following procedure applies to new overcloud nodes that have not yet been provisioned. To assign a resource class to an existing overcloud node that has already been provisioned, you must use the scale down procedure to unprovision the node, then use the scale up procedure to reprovision the node with the new resource class assignment. For more information, see Scaling overcloud nodes.

Procedure

Log in to the undercloud as the stack user.
Source the stackrc file:
```
source ~/stackrc
```
```
[stack@director ~]$ source ~/stackrc
```
Copy to Clipboard Toggle word wrap

Generate a new roles data file named roles_data_gpu.yaml that includes the Controller, Compute, and ComputeGpu roles, along with any other roles that you need for the overcloud:

openstack overcloud roles \
  generate -o /home/stack/templates/roles_data_gpu.yaml \
  Compute:ComputeGpu Compute Controller

(undercloud)$ openstack overcloud roles \
  generate -o /home/stack/templates/roles_data_gpu.yaml \
  Compute:ComputeGpu Compute Controller

Copy to Clipboard

Toggle word wrap

Open roles_data_gpu.yaml and edit or add the following parameters and sections:

Expand

Section/Parameter	Current value	New value
Role comment	`Role: Compute`	`Role: ComputeGpu`
Role name	`name: Compute`	`name: ComputeGpu`
`description`	`Basic Compute Node role`	`GPU Compute Node role`
`HostnameFormatDefault`	`-compute-`	`-computegpu-`
`deprecated_nic_config_name`	`compute.yaml`	`compute-gpu.yaml`

Register the GPU-enabled Compute nodes for the overcloud by adding them to your node definition template, node.json or node.yaml. For more information, see Registering nodes for the overcloud in the Installing and managing Red Hat OpenStack Platform with director guide.
Inspect the node hardware:
```
openstack overcloud node introspect --all-manageable \
 --provide
```
```
(undercloud)$ openstack overcloud node introspect --all-manageable \
 --provide
```
Copy to Clipboard Toggle word wrap
For more information, see Creating an inventory of the bare-metal node hardware in the Installing and managing Red Hat OpenStack Platform with director guide.
Tag each bare metal node that you want to designate for GPU workloads with a custom GPU resource class:
```
openstack baremetal node set \
 --resource-class baremetal.GPU <node>
```
```
(undercloud)$ openstack baremetal node set \
 --resource-class baremetal.GPU <node>
```
Copy to Clipboard Toggle word wrap
Replace <node> with the ID of the baremetal node.
Add the ComputeGpu role to your node definition file, overcloud-baremetal-deploy.yaml, and define any predictive node placements, resource classes, network topologies, or other attributes that you want to assign to your nodes:
```
- name: Controller
  count: 3
- name: Compute
  count: 3
- name: ComputeGpu
  count: 1
  defaults:
    resource_class: baremetal.GPU
    network_config:
      template: /home/stack/templates/nic-config/myRoleTopology.j2 
```
```
- name: Controller
  count: 3
- name: Compute
  count: 3
- name: ComputeGpu
  count: 1
  defaults:
    resource_class: baremetal.GPU
    network_config:
      template: /home/stack/templates/nic-config/myRoleTopology.j2 
```
1
Copy to Clipboard Toggle word wrap
1
You can reuse an existing network topology or create a new custom network interface template for the role. For more information, see Custom network interface templates in the Installing and managing Red Hat OpenStack Platform with director guide. If you do not define the network definitions by using the network_config property, then the default network definitions are used.
For more information about the properties you can use to configure node attributes in your node definition file, see Bare metal node provisioning attributes. For an example node definition file, see Example node definition file.
Run the provisioning command to provision the new nodes for your role:
```
openstack overcloud node provision \
--stack <stack> \
[--network-config \]
```
```
(undercloud)$ openstack overcloud node provision \
--stack <stack> \
[--network-config \]
--output /home/stack/templates/overcloud-baremetal-deployed.yaml \
/home/stack/templates/overcloud-baremetal-deploy.yaml
```
Copy to Clipboard Toggle word wrap
- Replace <stack> with the name of the stack for which the bare-metal nodes are provisioned. If not specified, the default is overcloud.
- Include the --network-config optional argument to provide the network definitions to the cli-overcloud-node-network-config.yaml Ansible playbook. If you do not define the network definitions by using the network_config property, then the default network definitions are used.
Monitor the provisioning progress in a separate terminal. When provisioning is successful, the node state changes from available to active:
```
watch openstack baremetal node list
```
```
(undercloud)$ watch openstack baremetal node list
```
Copy to Clipboard Toggle word wrap

If you did not run the provisioning command with the --network-config option, then configure the <Role>NetworkConfigTemplate parameters in your network-environment.yaml file to point to your NIC template files:

parameter_defaults:
   ComputeNetworkConfigTemplate: /home/stack/templates/nic-configs/compute.j2
   ComputeGpuNetworkConfigTemplate: /home/stack/templates/nic-configs/<gpu_net_top>.j2
   ControllerNetworkConfigTemplate: /home/stack/templates/nic-configs/controller.j2

parameter_defaults:
   ComputeNetworkConfigTemplate: /home/stack/templates/nic-configs/compute.j2
   ComputeGpuNetworkConfigTemplate: /home/stack/templates/nic-configs/<gpu_net_top>.j2
   ControllerNetworkConfigTemplate: /home/stack/templates/nic-configs/controller.j2

Copy to Clipboard

Toggle word wrap

Replace <gpu_net_top> with the name of the file that contains the network topology of the ComputeGpu role, for example, compute.yaml to use the default network topology.

10.2.3. Configuring the Compute node for vGPU and deploying the overcloud
Copy link

You need to retrieve and assign the vGPU type that corresponds to the physical GPU device in your environment, and prepare the environment files to configure the Compute node for vGPU.

Procedure

Install Red Hat Enterprise Linux and the NVIDIA GRID driver on a temporary Compute node and launch the node.
Virtual GPUs are mediated devices, or mdev type devices. Retrieve the PCI address for each mdev device on each Compute node:
```
ls /sys/class/mdev_bus/
```
```
$ ls /sys/class/mdev_bus/
```
Copy to Clipboard Toggle word wrap
The PCI address is used as the device driver directory name, for example, 0000:84:00.0.

Review the supported mdev types for each available pGPU device on each Compute node to discover the available vGPU types:

ls /sys/class/mdev_bus/<mdev_device>/mdev_supported_types

$ ls /sys/class/mdev_bus/<mdev_device>/mdev_supported_types

Copy to Clipboard

Toggle word wrap

Replace <mdev_device> with the PCI address for the mdev device, for example, 0000:84:00.0.
For example, the following Compute node has 4 pGPUs, and each pGPU supports the same 11 vGPU types:

ls /sys/class/mdev_bus/0000:84:00.0/mdev_supported_types:
ls /sys/class/mdev_bus/0000:85:00.0/mdev_supported_types:
ls /sys/class/mdev_bus/0000:86:00.0/mdev_supported_types:
ls /sys/class/mdev_bus/0000:87:00.0/mdev_supported_types:

[root@overcloud-computegpu-0 ~]# ls /sys/class/mdev_bus/0000:84:00.0/mdev_supported_types:
nvidia-35  nvidia-36  nvidia-37  nvidia-38  nvidia-39  nvidia-40  nvidia-41  nvidia-42  nvidia-43  nvidia-44  nvidia-45
[root@overcloud-computegpu-0 ~]# ls /sys/class/mdev_bus/0000:85:00.0/mdev_supported_types:
nvidia-35  nvidia-36  nvidia-37  nvidia-38  nvidia-39  nvidia-40  nvidia-41  nvidia-42  nvidia-43  nvidia-44  nvidia-45
[root@overcloud-computegpu-0 ~]# ls /sys/class/mdev_bus/0000:86:00.0/mdev_supported_types:
nvidia-35  nvidia-36  nvidia-37  nvidia-38  nvidia-39  nvidia-40  nvidia-41  nvidia-42  nvidia-43  nvidia-44  nvidia-45
[root@overcloud-computegpu-0 ~]# ls /sys/class/mdev_bus/0000:87:00.0/mdev_supported_types:
nvidia-35  nvidia-36  nvidia-37  nvidia-38  nvidia-39  nvidia-40  nvidia-41  nvidia-42  nvidia-43  nvidia-44  nvidia-45

Copy to Clipboard

Toggle word wrap

Create a gpu.yaml file to specify the vGPU types that each GPU device supports:

parameter_defaults:
  ComputeGpuExtraConfig:
    nova::compute::vgpu::enabled_vgpu_types:
      - nvidia-35
      - nvidia-36

parameter_defaults:
  ComputeGpuExtraConfig:
    nova::compute::vgpu::enabled_vgpu_types:
      - nvidia-35
      - nvidia-36

Copy to Clipboard

Toggle word wrap

Optional: To configure more than one vGPU type, map the supported vGPU types to the pGPUs:
```
parameter_defaults:
  ComputeGpuExtraConfig:
    nova::compute::vgpu::enabled_vgpu_types:
      - nvidia-35
      - nvidia-36
    NovaVGPUTypesDeviceAddressesMapping: {'vgpu_<vgpu_type>': ['<pci_address>', '<pci_address>'],'vgpu_<vgpu_type>': ['<pci_address>', '<pci_address>']}
```
```
parameter_defaults:
  ComputeGpuExtraConfig:
    nova::compute::vgpu::enabled_vgpu_types:
      - nvidia-35
      - nvidia-36
    NovaVGPUTypesDeviceAddressesMapping: {'vgpu_<vgpu_type>': ['<pci_address>', '<pci_address>'],'vgpu_<vgpu_type>': ['<pci_address>', '<pci_address>']}
```
Copy to Clipboard Toggle word wrap
- Replace <vgpu_type> with the name of the vGPU type to create a label for the vGPU group, for example, vgpu_nvidia-35. Use a comma-separated list of vgpu_<vgpu_type> definitions to map additional vGPU types.
- Replace <pci_address> with the PCI address of a pGPU device that supports the vGPU type, for example, 0000:84:00.0. Use a comma-separated list of <pci_address> definitions to map the vGPU group to additional pGPUs.
  Example:
  NovaVGPUTypesDeviceAddressesMapping: {'vgpu_nvidia-35': ['0000:84:00.0', '0000:85:00.0'],'vgpu_nvidia-36': ['0000:86:00.0']}
  - The nvidia-35 vGPU type is supported by the pGPUs that are in the PCI addresses 0000:84:00.0 and 0000:85:00.0.
  - The nvidia-36 vGPU type is supported only by the pGPUs that are in the PCI address 0000:86:00.0.
Save the updates to your Compute environment file.

Add your new role and environment files to the stack with your other environment files and deploy the overcloud:

openstack overcloud deploy --templates \
  -e [your environment files] \
  -r /home/stack/templates/roles_data_gpu.yaml \
  -e /home/stack/templates/network-environment.yaml \
  -e /home/stack/templates/gpu.yaml \
  -e /home/stack/templates/overcloud-baremetal-deployed.yaml \
  -e /home/stack/templates/node-info.yaml

(undercloud)$ openstack overcloud deploy --templates \
  -e [your environment files] \
  -r /home/stack/templates/roles_data_gpu.yaml \
  -e /home/stack/templates/network-environment.yaml \
  -e /home/stack/templates/gpu.yaml \
  -e /home/stack/templates/overcloud-baremetal-deployed.yaml \
  -e /home/stack/templates/node-info.yaml

Copy to Clipboard

Toggle word wrap

10.3. Creating a custom vGPU resource provider trait
Copy link

You can create custom resource provider traits for each vGPU type that your RHOSP environment supports. You can then create flavors that your cloud users can use to launch instances on hosts that have those custom traits. Custom traits are defined in uppercase letters, and must begin with the prefix CUSTOM_. For more information on resource provider traits, see Filtering by resource provider traits.

Procedure

Create a new trait:
```
openstack --os-placement-api-version 1.6 trait \
 create CUSTOM_<TRAIT_NAME>
```
```
(overcloud)$ openstack --os-placement-api-version 1.6 trait \
 create CUSTOM_<TRAIT_NAME>
```
Copy to Clipboard Toggle word wrap
- Replace <TRAIT_NAME> with the name of the trait. The name can contain only the letters A through Z, the numbers 0 through 9 and the underscore "_" character.

Collect the existing resource provider traits of each host:

existing_traits=$(openstack --os-placement-api-version 1.6 resource provider trait list -f value <host_uuid> | sed 's/^/--trait /')

(overcloud)$ existing_traits=$(openstack --os-placement-api-version 1.6 resource provider trait list -f value <host_uuid> | sed 's/^/--trait /')

Copy to Clipboard

Toggle word wrap

Check the existing resource provider traits for the traits you require a host or host aggregate to have:
```
echo $existing_traits
```
```
(overcloud)$ echo $existing_traits
```
Copy to Clipboard Toggle word wrap
If the traits you require are not already added to the resource provider, then add the existing traits and your required traits to the resource providers for each host:
```
openstack --os-placement-api-version 1.6 \
 resource provider trait set $existing_traits \
 --trait CUSTOM_<TRAIT_NAME> \
 <host_uuid>
```
```
(overcloud)$ openstack --os-placement-api-version 1.6 \
 resource provider trait set $existing_traits \
 --trait CUSTOM_<TRAIT_NAME> \
 <host_uuid>
```
Copy to Clipboard Toggle word wrap
- Replace <TRAIT_NAME> with the name of the trait that you want to add to the resource provider. You can use the --trait option more than once to add additional traits, as required.
  Note
  This command performs a full replacement of the traits for the resource provider. Therefore, you must retrieve the list of existing resource provider traits on the host and set them again to prevent them from being removed.

10.4. Creating a custom GPU instance image
Copy link

To enable your cloud users to create instances that use a virtual GPU (vGPU), you can create a custom vGPU-enabled image for launching instances. Use the following procedure to create a custom vGPU-enabled instance image with the NVIDIA GRID guest driver and license file.

Prerequisites

You have configured and deployed the overcloud with GPU-enabled Compute nodes.

Procedure

Log in to the undercloud as the stack user.
Source the overcloudrc credential file:
```
source ~/overcloudrc
```
```
$ source ~/overcloudrc
```
Copy to Clipboard Toggle word wrap
Create an instance with the hardware and software profile that your vGPU instances require:
```
openstack server create --flavor <flavor> \
 --image <image> temp_vgpu_instance
```
```
(overcloud)$ openstack server create --flavor <flavor> \
 --image <image> temp_vgpu_instance
```
Copy to Clipboard Toggle word wrap
- Replace <flavor> with the name or ID of the flavor that has the hardware profile that your vGPU instances require. For information about creating a vGPU flavor, see Creating a vGPU flavor for instances.
- Replace <image> with the name or ID of the image that has the software profile that your vGPU instances require. For information about downloading RHEL cloud images, see Creating RHEL KVM or RHOSP-compatible images in Creating and managing images.
Log in to the instance as a cloud-user.
Create the gridd.conf NVIDIA GRID license file on the instance, following the NVIDIA guidance: Licensing an NVIDIA vGPU on Linux by Using a Configuration File.
Install the GPU driver on the instance. For more information about installing an NVIDIA driver, see Installing the NVIDIA vGPU Software Graphics Driver on Linux.
Note
Use the hw_video_model image property to define the GPU driver type. You can choose none if you want to disable the emulated GPUs for your vGPU instances. For more information about supported drivers, see Image configuration parameters.

Create an image snapshot of the instance:

openstack server image create \
 --name vgpu_image temp_vgpu_instance

(overcloud)$ openstack server image create \
 --name vgpu_image temp_vgpu_instance

Copy to Clipboard

Toggle word wrap

Optional: Delete the instance.

10.5. Creating a vGPU flavor for instances
Copy link

To enable your cloud users to create instances for GPU workloads, you can create a GPU flavor that can be used to launch vGPU instances, and assign the vGPU resource to that flavor.

Prerequisites

You have configured and deployed the overcloud with GPU-designated Compute nodes.

Procedure

Create an NVIDIA GPU flavor, for example:

openstack flavor create --vcpus 6 \
 --ram 8192 --disk 100 m1.small-gpu

(overcloud)$ openstack flavor create --vcpus 6 \
 --ram 8192 --disk 100 m1.small-gpu

Copy to Clipboard

Toggle word wrap

Assign a vGPU resource to the flavor:

openstack flavor set m1.small-gpu \
 --property "resources:VGPU=1"

(overcloud)$ openstack flavor set m1.small-gpu \
 --property "resources:VGPU=1"

Copy to Clipboard

Toggle word wrap

Note

You can assign only one vGPU for each instance.

Optional: To customize the flavor for a specific vGPU type, add a required trait to the flavor:
```
openstack flavor set m1.small-gpu \
 --property trait:CUSTOM_NVIDIA_11=required
```
```
(overcloud)$ openstack flavor set m1.small-gpu \
 --property trait:CUSTOM_NVIDIA_11=required
```
Copy to Clipboard Toggle word wrap
For information on how to create custom resource provider traits for each vGPU type, see Creating a custom vGPU resource provider trait.

10.6. Launching a vGPU instance
Copy link

You can create a GPU-enabled instance for GPU workloads.

Procedure

Create an instance using a GPU flavor and image, for example:

openstack server create --flavor m1.small-gpu \
 --image vgpu_image --security-group web --nic net-id=internal0 \
 --key-name lambda vgpu-instance

(overcloud)$ openstack server create --flavor m1.small-gpu \
 --image vgpu_image --security-group web --nic net-id=internal0 \
 --key-name lambda vgpu-instance

Copy to Clipboard

Toggle word wrap

Log in to the instance as a cloud-user.
To verify that the GPU is accessible from the instance, enter the following command from the instance:
```
lspci -nn | grep <gpu_name>
```
```
$ lspci -nn | grep <gpu_name>
```
Copy to Clipboard Toggle word wrap

10.7. Enabling PCI passthrough for a GPU device
Copy link

You can use PCI passthrough to attach a physical PCI device, such as a graphics card, to an instance. If you use PCI passthrough for a device, the instance reserves exclusive access to the device for performing tasks, and the device is not available to the host.

Prerequisites

The pciutils package is installed on the physical servers that have the PCI cards.
The driver for the GPU device must be installed on the instance that the device is passed through to. Therefore, you need to have created a custom instance image that has the required GPU driver installed. For more information about how to create a custom instance image with the GPU driver installed, see Creating a custom GPU instance image.

Procedure

To determine the vendor ID and product ID for each passthrough device type, enter the following command on the physical server that has the PCI cards:
```
lspci -nn | grep -i <gpu_name>
```
```
# lspci -nn | grep -i <gpu_name>
```
Copy to Clipboard Toggle word wrap
For example, to determine the vendor and product ID for an NVIDIA GPU, enter the following command:
```
lspci -nn | grep -i nvidia
```
```
# lspci -nn | grep -i nvidia
3b:00.0 3D controller [0302]: NVIDIA Corporation TU104GL [Tesla T4] [10de:1eb8] (rev a1)
d8:00.0 3D controller [0302]: NVIDIA Corporation TU104GL [Tesla T4] [10de:1db4] (rev a1)
```
Copy to Clipboard Toggle word wrap
To determine if each PCI device has Single Root I/O Virtualization (SR-IOV) capabilities, enter the following command on the physical server that has the PCI cards:
```
lspci -v -s 3b:00.0
```
```
# lspci -v -s 3b:00.0
3b:00.0 3D controller: NVIDIA Corporation TU104GL [Tesla T4] (rev a1)
          ...
          Capabilities: [bcc] Single Root I/O Virtualization (SR-IOV)
          ...
```
Copy to Clipboard Toggle word wrap
To configure the Controller node on the overcloud for PCI passthrough, create an environment file, for example, pci_passthru_controller.yaml.

Add PciPassthroughFilter to the NovaSchedulerEnabledFilters parameter in pci_passthru_controller.yaml:

parameter_defaults:
  NovaSchedulerEnabledFilters:
    - AvailabilityZoneFilter
    - ComputeFilter
    - ComputeCapabilitiesFilter
    - ImagePropertiesFilter
    - ServerGroupAntiAffinityFilter
    - ServerGroupAffinityFilter
    - PciPassthroughFilter
    - NUMATopologyFilter

parameter_defaults:
  NovaSchedulerEnabledFilters:
    - AvailabilityZoneFilter
    - ComputeFilter
    - ComputeCapabilitiesFilter
    - ImagePropertiesFilter
    - ServerGroupAntiAffinityFilter
    - ServerGroupAffinityFilter
    - PciPassthroughFilter
    - NUMATopologyFilter

Copy to Clipboard

Toggle word wrap

To specify the PCI alias for the devices on the Controller node, add the following configuration to pci_passthru_controller.yaml:

If the PCI device has SR-IOV capabilities:

ControllerExtraConfig:
  nova::pci::aliases:
    - name: "t4"
      product_id: "1eb8"
      vendor_id: "10de"
      device_type: "type-PF"
    - name: "v100"
      product_id: "1db4"
      vendor_id: "10de"
      device_type: "type-PF"

ControllerExtraConfig:
  nova::pci::aliases:
    - name: "t4"
      product_id: "1eb8"
      vendor_id: "10de"
      device_type: "type-PF"
    - name: "v100"
      product_id: "1db4"
      vendor_id: "10de"
      device_type: "type-PF"

Copy to Clipboard

Toggle word wrap

If the PCI device does not have SR-IOV capabilities:
```
ControllerExtraConfig:
  nova::pci::aliases:
    - name: "t4"
      product_id: "1eb8"
      vendor_id: "10de"
    - name: "v100"
      product_id: "1db4"
      vendor_id: "10de"
```
```
ControllerExtraConfig:
  nova::pci::aliases:
    - name: "t4"
      product_id: "1eb8"
      vendor_id: "10de"
    - name: "v100"
      product_id: "1db4"
      vendor_id: "10de"
```
Copy to Clipboard Toggle word wrap
For more information on configuring the device_type field, see PCI passthrough device type field.
Note
If the nova-api service is running in a role other than the Controller, then replace ControllerExtraConfig with the user role, in the format <Role>ExtraConfig.

To configure the Compute node on the overcloud for PCI passthrough, create an environment file, for example, pci_passthru_compute.yaml.

To specify the available PCIs for the devices on the Compute node, add the following to pci_passthru_compute.yaml:

parameter_defaults:
  NovaPCIPassthrough:
    - vendor_id: "10de"
      product_id: "1eb8"

parameter_defaults:
  NovaPCIPassthrough:
    - vendor_id: "10de"
      product_id: "1eb8"

Copy to Clipboard

Toggle word wrap

You must create a copy of the PCI alias on the Compute node for instance migration and resize operations. To specify the PCI alias for the devices on the Compute node, add the following to pci_passthru_compute.yaml:

If the PCI device has SR-IOV capabilities:

ComputeExtraConfig:
  nova::pci::aliases:
    - name: "t4"
      product_id: "1eb8"
      vendor_id: "10de"
      device_type: "type-PF"
    - name: "v100"
      product_id: "1db4"
      vendor_id: "10de"
      device_type: "type-PF"

ComputeExtraConfig:
  nova::pci::aliases:
    - name: "t4"
      product_id: "1eb8"
      vendor_id: "10de"
      device_type: "type-PF"
    - name: "v100"
      product_id: "1db4"
      vendor_id: "10de"
      device_type: "type-PF"

Copy to Clipboard

Toggle word wrap

If the PCI device does not have SR-IOV capabilities:

ComputeExtraConfig:
  nova::pci::aliases:
    - name: "t4"
      product_id: "1eb8"
      vendor_id: "10de"
    - name: "v100"
      product_id: "1db4"
      vendor_id: "10de"

ComputeExtraConfig:
  nova::pci::aliases:
    - name: "t4"
      product_id: "1eb8"
      vendor_id: "10de"
    - name: "v100"
      product_id: "1db4"
      vendor_id: "10de"

Copy to Clipboard

Toggle word wrap

Note

The Compute node aliases must be identical to the aliases on the Controller node.

To enable IOMMU in the server BIOS of the Compute nodes to support PCI passthrough, add the KernelArgs parameter to pci_passthru_compute.yaml:
```
parameter_defaults:
  ...
  ComputeParameters:
    KernelArgs: "intel_iommu=on iommu=pt"
```
```
parameter_defaults:
  ...
  ComputeParameters:
    KernelArgs: "intel_iommu=on iommu=pt"
```
Copy to Clipboard Toggle word wrap
Note
When you first add the KernelArgs parameter to the configuration of a role, the overcloud nodes are automatically rebooted. If required, you can disable the automatic rebooting of nodes and instead perform node reboots manually after each overcloud deployment. For more information, see Configuring manual node reboot to define KernelArgs.

Add your custom environment files to the stack with your other environment files and deploy the overcloud:

openstack overcloud deploy --templates \
  -e [your environment files] \
  -e /home/stack/templates/pci_passthru_controller.yaml \
  -e /home/stack/templates/pci_passthru_compute.yaml

(undercloud)$ openstack overcloud deploy --templates \
  -e [your environment files] \
  -e /home/stack/templates/pci_passthru_controller.yaml \
  -e /home/stack/templates/pci_passthru_compute.yaml

Copy to Clipboard

Toggle word wrap

Configure a flavor to request the PCI devices. The following example requests two devices, each with a vendor ID of 10de and a product ID of 13f2:
```
openstack flavor set m1.large \
 --property "pci_passthrough:alias"="t4:2"
```
```
# openstack flavor set m1.large \
 --property "pci_passthrough:alias"="t4:2"
```
Copy to Clipboard Toggle word wrap

Verification

Create an instance with a PCI passthrough device:
```
openstack server create --flavor m1.large \
 --image <custom_gpu> --wait test-pci
```
```
# openstack server create --flavor m1.large \
 --image <custom_gpu> --wait test-pci
```
Copy to Clipboard Toggle word wrap
Replace <custom_gpu> with the name of your custom instance image that has the required GPU driver installed.
Log in to the instance as a cloud user. For more information, see Connecting to an instance.
To verify that the GPU is accessible from the instance, enter the following command from the instance:
```
lspci -nn | grep <gpu_name>
```
```
$ lspci -nn | grep <gpu_name>
```
Copy to Clipboard Toggle word wrap

To check the NVIDIA System Management Interface status, enter the following command from the instance:

nvidia-smi

$ nvidia-smi

Copy to Clipboard

Toggle word wrap

Example output:

-----------------------------------------------------------------------------
| NVIDIA-SMI 440.33.01    Driver Version: 440.33.01    CUDA Version: 10.2     |
|---------------------------------------------------------------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===========================================================================|
|   0  Tesla T4            Off  | 00000000:01:00.0 Off |                    0 |
| N/A   43C    P0    20W /  70W |      0MiB / 15109MiB |      0%      Default |
---------------------------------------------------------------------------

-----------------------------------------------------------------------------
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|  No running processes found                                                 |
-----------------------------------------------------------------------------

-----------------------------------------------------------------------------
| NVIDIA-SMI 440.33.01    Driver Version: 440.33.01    CUDA Version: 10.2     |
|---------------------------------------------------------------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===========================================================================|
|   0  Tesla T4            Off  | 00000000:01:00.0 Off |                    0 |
| N/A   43C    P0    20W /  70W |      0MiB / 15109MiB |      0%      Default |
---------------------------------------------------------------------------

-----------------------------------------------------------------------------
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|  No running processes found                                                 |
-----------------------------------------------------------------------------

Copy to Clipboard

Toggle word wrap

Chapter 10. Configuring virtual GPUs for instances

10.1. Supported configurations and limitations
Copy link

10.2. Configuring vGPU on the Compute nodes
Copy link

10.2.1. Prerequisites
Copy link

10.2.2. Designating Compute nodes for vGPU
Copy link

10.2.3. Configuring the Compute node for vGPU and deploying the overcloud
Copy link

10.3. Creating a custom vGPU resource provider trait
Copy link

10.4. Creating a custom GPU instance image
Copy link

10.5. Creating a vGPU flavor for instances
Copy link

10.6. Launching a vGPU instance
Copy link

10.7. Enabling PCI passthrough for a GPU device
Copy link

Learn

Try, buy, & sell

Communities

About Red Hat Documentation

Making open source more inclusive

About Red Hat

Theme

Red Hat legal and privacy links

Red Hat legal and privacy links

Chapter 10. Configuring virtual GPUs for instances

10.1. Supported configurations and limitationsCopy linkLink copied to clipboard!

10.2. Configuring vGPU on the Compute nodesCopy linkLink copied to clipboard!

10.2.1. PrerequisitesCopy linkLink copied to clipboard!

10.2.2. Designating Compute nodes for vGPUCopy linkLink copied to clipboard!

10.2.3. Configuring the Compute node for vGPU and deploying the overcloudCopy linkLink copied to clipboard!

10.3. Creating a custom vGPU resource provider traitCopy linkLink copied to clipboard!

10.4. Creating a custom GPU instance imageCopy linkLink copied to clipboard!

10.5. Creating a vGPU flavor for instancesCopy linkLink copied to clipboard!

10.6. Launching a vGPU instanceCopy linkLink copied to clipboard!

10.7. Enabling PCI passthrough for a GPU deviceCopy linkLink copied to clipboard!

Learn

Try, buy, & sell

Communities

About Red Hat Documentation

Making open source more inclusive

About Red Hat

Theme

Red Hat legal and privacy links

Red Hat legal and privacy links

10.1. Supported configurations and limitations
Copy link

10.2. Configuring vGPU on the Compute nodes
Copy link

10.2.1. Prerequisites
Copy link

10.2.2. Designating Compute nodes for vGPU
Copy link

10.2.3. Configuring the Compute node for vGPU and deploying the overcloud
Copy link

10.3. Creating a custom vGPU resource provider trait
Copy link

10.4. Creating a custom GPU instance image
Copy link

10.5. Creating a vGPU flavor for instances
Copy link

10.6. Launching a vGPU instance
Copy link

10.7. Enabling PCI passthrough for a GPU device
Copy link