Home
Products
Red Hat OpenStack Services on OpenShift
18.0
Configuring the Compute service for instance creation
Chapter 11. Configuring virtual GPUs for instances

Chapter 11. Configuring virtual GPUs for instances

To support GPU-based rendering on your instances, you can define and manage virtual GPU (vGPU) resources according to your available physical GPU devices and your hypervisor type. You can use this configuration to effectively spread the rendering workloads between all your physical GPU devices, and to control the scheduling of your vGPU-enabled instances.

To enable vGPU in the Compute service (nova), perform the following tasks:

Identify the nodes on which you want to configure vGPUs.
Retrieve the PCI address for each physical GPU on each Compute node, or for each SR-IOV virtual function (VF) if the GPU supports SR-IOV.
Configure the GPU profiles on each Compute node.

Each instance hosted on the configured Compute nodes can support GPU workloads with vGPU devices that correspond to the physical GPU devices.

The Compute service (nova) tracks the number of vGPU devices that are available for each GPU profile you define on each host. The Compute service schedules instances to these hosts, attaches the devices, and monitors the use of vGPU. When an instance is deleted, the Compute service adds the vGPU devices back to the available pool.

Important

Red Hat enables the use of NVIDIA vGPU in RHOSO without the requirement for support exceptions. However, Red Hat does not provide technical support for the NVIDIA vGPU drivers. The NVIDIA vGPU drivers are shipped and supported by NVIDIA. You require a NVIDIA Certified Support Services subscription to obtain NVIDIA Enterprise Support for NVIDIA vGPU software. For issues that result from the use of NVIDIA vGPUs where you are unable to reproduce the issue on a supported component, the following support policies apply:

When Red Hat does not suspect that the third-party component is involved in the issue, the normal Scope of Support and Red Hat SLA apply.
When Red Hat suspects that the third-party component is involved in the issue, the customer will be directed to NVIDIA in line with the Red Hat third party support and certification policies. For more information, see the Knowledge Base article Obtaining Support from NVIDIA.

11.1. Supported configurations and limitations for vGPU devices
Copy link

Supported GPU cards

For a list of supported NVIDIA GPU cards, see Virtual GPU Software Supported Products on the NVIDIA website.

Limitations when using vGPU devices

Each instance can use only one vGPU resource.
Live migration of vGPU instances between hosts is not supported.
Evacuation of vGPU instances is not supported.
If you need to reboot the Compute node that hosts the vGPU instances, the vGPUs are not automatically reassigned to the recreated instances. You must either cold migrate the instances before you reboot the Compute node, or manually allocate each vGPU to the correct instance after reboot. To manually allocate each vGPU, you must retrieve the mdev UUID from the instance XML for each vGPU instance that runs on the Compute node before you reboot. You can use the following command to discover the mdev UUID for each instance:
```
virsh dumpxml <instance_name> | grep mdev
```
```
# virsh dumpxml <instance_name> | grep mdev
```
Copy to Clipboard Toggle word wrap
Replace <instance_name> with the libvirt instance name, OS-EXT-SRV-ATTR:instance_name, returned in a /servers request to the Compute API.
By default, vGPU types on Compute hosts are not exposed to API users. To expose the vGPU types on Compute hosts to API users, you must configure resource provider traits and create flavors that require the traits. Alternatively, if you only have one vGPU type, you can grant access by adding the hosts to a host aggregate. For more information, see Creating and managing host aggregates.
If you use NVIDIA accelerator hardware, you must comply with the NVIDIA licensing requirements. For example, NVIDIA vGPU GRID requires a licensing server. For more information about the NVIDIA licensing requirements, see NVIDIA License Server Release Notes on the NVIDIA website.

11.2. Preparing to configure the Compute service for vGPU
Copy link

Before you configure the Compute service for vGPU, you must prepare the data plane nodes that you want to use for vGPU and you must download and install the NVIDIA device driver.

Procedure

Access the remote shell for openstackclient:
```
oc rsh openstackclient
```
```
$ oc rsh openstackclient
```
Copy to Clipboard Toggle word wrap
Identify a node that you want to use for vGPU:
1. Retrieve the IP address of the Compute node that you want to use for vGPU:
  $ openstack hypervisor list
  Copy to Clipboard Toggle word wrap
2. Use SSH to connect to the data plane node:
  $ ssh <node_ipaddress>
  Copy to Clipboard Toggle word wrap
3. Create the file /etc/modprobe.d/blacklist-nouveau.conf.
4. Disable the nouveau driver by adding the following configuration to blacklist-nouveau.conf:
  blacklist nouveau options nouveau modeset=0
  Copy to Clipboard Toggle word wrap
5. Regenerate the initramfs:
  $ dracut --force $ grub2-mkconfig -o /boot/grub2/grub.cfg --update-bls-cmdline
  Copy to Clipboard Toggle word wrap
6. Download and install the NVIDIA driver from the NVIDIA portal. For more information, see NVIDIA DOCS HUB.
7. Reboot the node:
  $ sudo reboot
  Copy to Clipboard Toggle word wrap
Repeat this procedure for all nodes that you want to allocate for vGPU instances.

11.3. Configuring the Compute service for vGPU
Copy link

You need to retrieve and assign the vGPU type that corresponds to the physical GPU device in your environment and configure a vGPU type.

Note

You can configure only whole node sets. Reconfiguring a subset of the nodes within a node set is not supported. If you need to reconfigure a subset of nodes within a node set, you must scale the node set down, and create a new node set from the previously removed nodes.

Prerequisites

The oc command line tool is installed on your workstation.
You are logged in to Red Hat OpenStack Services on OpenShift (RHOSO) as a user with cluster-admin privileges.
You have selected the OpenStackDataPlaneNodeSet CR that defines the nodes that you can configure vGPU on. For more information about creating an OpenStackDataPlaneNodeSet CR, see Creating an OpenStackDataPlaneNodeSet CR with pre-provisioned nodes in the Deploying Red Hat OpenStack Services on OpenShift guide.

Procedure

Virtual GPUs are mediated devices. Retrieve the PCI address for each device that can create mediated devices on each Compute node:
```
ls /sys/class/mdev_bus/
```
```
$ ls /sys/class/mdev_bus/
```
Copy to Clipboard Toggle word wrap
Note
The PCI address of the GPU - or the GPU SR-IOV virtual function (VF) that can create vGPUs - is used as the device driver directory name, for example, 0000:84:00.0. In this procedure, the vGPU-capable resource is called an mdev device.
Note
Recent generations of NVIDIA cards now support SR-IOV. Refer to the NVIDIA documentation to discover if your GPU is SR-IOV-capable.

Review the supported mdev types for each available pGPU device on each Compute node to discover the available vGPU types:

ls /sys/class/mdev_bus/<mdev_device>/mdev_supported_types

$ ls /sys/class/mdev_bus/<mdev_device>/mdev_supported_types

Copy to Clipboard

Toggle word wrap

Replace <mdev_device> with the PCI address for the mdev device, for example, 0000:84:00.0. For example, the following Compute node has 4 pGPUs, and each pGPU supports the same 11 vGPU types:

ls /sys/class/mdev_bus/0000:84:00.0/mdev_supported_types:
ls /sys/class/mdev_bus/0000:85:00.0/mdev_supported_types:
ls /sys/class/mdev_bus/0000:86:00.0/mdev_supported_types:
ls /sys/class/mdev_bus/0000:87:00.0/mdev_supported_types:

[root@computegpu-0 ~]# ls /sys/class/mdev_bus/0000:84:00.0/mdev_supported_types:
NVIDIA-35  NVIDIA-36  NVIDIA-37  NVIDIA-38  NVIDIA-39  NVIDIA-40  NVIDIA-41  NVIDIA-42  NVIDIA-43  NVIDIA-44  NVIDIA-45
[root@computegpu-0 ~]# ls /sys/class/mdev_bus/0000:85:00.0/mdev_supported_types:
NVIDIA-35  NVIDIA-36  NVIDIA-37  NVIDIA-38  NVIDIA-39  NVIDIA-40  NVIDIA-41  NVIDIA-42  NVIDIA-43  NVIDIA-44  NVIDIA-45
[root@computegpu-0 ~]# ls /sys/class/mdev_bus/0000:86:00.0/mdev_supported_types:
NVIDIA-35  NVIDIA-36  NVIDIA-37  NVIDIA-38  NVIDIA-39  NVIDIA-40  NVIDIA-41  NVIDIA-42  NVIDIA-43  NVIDIA-44  NVIDIA-45
[root@computegpu-0 ~]# ls /sys/class/mdev_bus/0000:87:00.0/mdev_supported_types:
NVIDIA-35  NVIDIA-36  NVIDIA-37  NVIDIA-38  NVIDIA-39  NVIDIA-40  NVIDIA-41  NVIDIA-42  NVIDIA-43  NVIDIA-44  NVIDIA-45

Copy to Clipboard

Toggle word wrap

Create or update the ConfigMap CR named nova-extra-config.yaml and set the values of the parameters under [devices]:

apiVersion: v1
kind: ConfigMap
metadata:
   name: nova-extra-config
   namespace: openstack
data:
   34-nova-vgpu.conf: |
      [devices]
      enabled_mdev_types = nvidia-35, nvidia-36

apiVersion: v1
kind: ConfigMap
metadata:
   name: nova-extra-config
   namespace: openstack
data:
   34-nova-vgpu.conf: |
      [devices]
      enabled_mdev_types = nvidia-35, nvidia-36

Copy to Clipboard

Toggle word wrap

For more information about creating ConfigMap objects, see Creating and using config maps.

Optional: To configure more than one vGPU type, map the supported vGPU types to the pGPUs:
```
[devices]
enabled_mdev_types = nvidia-35, nvidia-36
[mdev_nvidia-35]
device_addresses = 0000:84:00.0,0000:85:00.0
[vgpu_nvidia-36]
device_addresses = 0000:86:00.0
```
```
[devices]
enabled_mdev_types = nvidia-35, nvidia-36
[mdev_nvidia-35]
device_addresses = 0000:84:00.0,0000:85:00.0
[vgpu_nvidia-36]
device_addresses = 0000:86:00.0
```
Copy to Clipboard Toggle word wrap
The nvidia-35 vGPU type is supported by the pGPUs that are in the PCI addresses 0000:84:00.0 and 0000:85:00.0. The nvidia-36 vGPU type is supported only by the pGPUs that are in the PCI address 0000:86:00.0.
Create a new OpenStackDataPlaneDeployment CR to configure the services on the data plane nodes and deploy the data plane, and save it to a file named compute_vgpu_deploy.yaml on your workstation:
```
apiVersion: core.openstack.org/v1beta1
kind: OpenStackDataPlaneDeployment
metadata:
   name: compute-vgpu
```
```
apiVersion: core.openstack.org/v1beta1
kind: OpenStackDataPlaneDeployment
metadata:
   name: compute-vgpu
```
Copy to Clipboard Toggle word wrap
In the compute_vgpu_deploy.yaml CR, specify nodeSets to include all the OpenStackDataPlaneNodeSet CRs you want to deploy. Ensure that you include the OpenStackDataPlaneNodeSet CR that you selected as a prerequisite. That OpenStackDataPlaneNodeSet CR defines the nodes that you want to want to use for vGPU.
Warning
If your deployment has more than one node set, changes to the nova-extra-config.yaml ConfigMap might directly affect more than one node set, depending on how the node sets and the DataPlaneServices are configured. To check if a node set uses the nova-extra-config ConfigMap and therefore will be affected by the reconfiguration, complete the following steps:
1. Check the services list of the node set and find the name of the DataPlaneService that points to nova.
2. Ensure that the value of the edpmServiceType field of the DataPlaneService is set to nova.
  If the dataSources list of the DataPlaneService contains a configMapRef named nova-extra-config, then this node set uses this ConfigMap and therefore will be affected by the configuration changes in this ConfigMap. If some of the node sets that are affected should not be reconfigured, you must create a new DataPlaneService pointing to a separate ConfigMap for these node sets.
```
apiVersion: dataplane.openstack.org/v1beta1
kind: OpenStackDataPlaneDeployment
metadata:
  name: compute-vgpu
spec:
  nodeSets:
    - openstack-edpm
    - compute-vgpu
    - ...
    - <nodeSet_name>
```
```
apiVersion: dataplane.openstack.org/v1beta1
kind: OpenStackDataPlaneDeployment
metadata:
  name: compute-vgpu
spec:
  nodeSets:
    - openstack-edpm
    - compute-vgpu
    - ...
    - <nodeSet_name>
```
Copy to Clipboard Toggle word wrap
- Replace <nodeSet_name> with the names of the `OpenStackDataPlaneNodeSet`CRs that you want to include in your data plane deployment.
Save the compute_vgpu_deploy.yaml deployment file.
Deploy the data plane:
```
oc create -f compute_vgpu_deploy.yaml
```
```
$ oc create -f compute_vgpu_deploy.yaml
```
Copy to Clipboard Toggle word wrap
Verify that the data plane is deployed:
```
oc get openstackdataplanenodeset
```
```
$ oc get openstackdataplanenodeset

NAME                    STATUS    MESSAGE
compute-vgpu   True            Deployed
```
Copy to Clipboard Toggle word wrap
Tip
Append the -w option to the end of the get command to track deployment progress.
Access the remote shell for openstackclient and verify that the deployed Compute nodes are visible on the control plane:
```
oc rsh -n openstack openstackclient
openstack hypervisor list
```
```
$ oc rsh -n openstack openstackclient

$ openstack hypervisor list
```
Copy to Clipboard Toggle word wrap
Optional: Enable SR-IOV VFs of the GPUs. For more information, see Preparing virtual function for SRIOV vGPU on the NVIDIA DOCS HUB.

11.4. Setting the maximum number of vGPUs that an SR-IOV NVIDIA GPU can create
Copy link

If you are using NVIDIA SR-IOV GPUs, the Compute service (nova) cannot discover the maximum number of virtual GPUs (vGPUs) those GPUs can create. Therefore, you must retrieve this number manually from NVIDIA and then set the max_instances configuration option to define the maximum number of vGPUs your SR-IOV NVIDIA GPU can create.

Warning

You cannot reconfigure a subset of the nodes within a node set. If you need to do this, you must scale the node set down, and create a new node set from the previously removed nodes.

Prerequisites

You know whether your NVIDIA GPU supports SR-IOV and how many Virtual Functions (VFs) it supports. For example, the NVIDIA L4 GPU Accelerator provides SR-IOV support for 32 VFs. For more information, see www.nvidia.com.
You have the oc command line tool installed on your workstation.
You are logged on to a workstation that has access to the RHOSO control plane as a user with cluster-admin privileges.
You have selected the OpenStackDataPlaneNodeSet CR that defines the nodes on which you want to configure the maximum number of vGPUs for your SR-IOV NVIDIA GPU. For more information about creating an OpenStackDataPlaneNodeSet CR, see Creating an OpenStackDataPlaneNodeSet CR with pre-provisioned nodes in Deploying Red Hat OpenStack Services on OpenShift.

Procedure

To define the maximum number of vGPUs your SR-IOV NVIDIA GPU can create for a specific vGPU type, create or update the ConfigMap CR named nova-extra-config.yaml. You must set the value of the enabled_mdev_types parameter and max_instances parameter under the specific mdev section for the vGPU type. This example configuration is for the A40-2Q NVIDIA GPU type which can create up to 24 vGPUs:
```
apiVersion: v1
kind: ConfigMap
metadata:
   name: nova-extra-config
   namespace: openstack
data:
   36-nova-max-instances.conf: |
      [devices]
      enabled_mdev_types = nvidia-558

      [mdev_nvidia-558]
      max_instances = 24
```
```
apiVersion: v1
kind: ConfigMap
metadata:
   name: nova-extra-config
   namespace: openstack
data:
   36-nova-max-instances.conf: |
      [devices]
      enabled_mdev_types = nvidia-558

      [mdev_nvidia-558]
      max_instances = 24
```
Copy to Clipboard Toggle word wrap
For more information about creating ConfigMap objects, see Creating and using config maps in Nodes.
Save the nova-extra-config.yaml file.
Create a new OpenStackDataPlaneDeployment CR to configure the services on the data plane nodes and deploy the data plane, and save it to a file named compute_vgpus_max_instance_deploy.yaml on your workstation:
```
apiVersion: dataplane.openstack.org/v1beta1
kind: OpenStackDataPlaneDeployment
metadata:
  name: compute_ vgpus_max_instance
```
```
apiVersion: dataplane.openstack.org/v1beta1
kind: OpenStackDataPlaneDeployment
metadata:
  name: compute_ vgpus_max_instance
```
Copy to Clipboard Toggle word wrap
In the compute_vgpus_max_instance_deploy.yaml, specify nodeSets to include all the OpenStackDataPlaneNodeSet CRs you want to deploy. Ensure that you include the OpenStackDataPlaneNodeSet CR that you selected as a prerequisite.
Warning
If your deployment has more than one node set, changes to the nova-extra-config.yaml ConfigMap might directly affect more than one node set, depending on how the node sets and the DataPlaneServices are configured. To check if a node set uses the nova-extra-config ConfigMap and therefore will be affected by the reconfiguration, complete the following steps:
1. Check the services list of the node set and find the name of the DataPlaneService that points to nova. Ensure that the value of the edpmServiceType field of the DataPlaneService is set to nova.
2. If the dataSources list of the DataPlaneService contains a configMapRef named nova-extra-config, then this node set uses this ConfigMap and therefore will be affected by the configuration changes in this ConfigMap. If some of the node sets that are affected should not be reconfigured, you must create a new DataPlaneService pointing to a separate ConfigMap for these node sets and use that custom service in the required node sets instead.
```
apiVersion: dataplane.openstack.org/v1beta1
kind: OpenStackDataPlaneDeployment
metadata:
  name: compute-vgpus_max_instance
spec:
  nodeSets:
    - openstack-edpm
    - compute_vgpus_max_instance
    - ...
    - <nodeSet_name>
```
```
apiVersion: dataplane.openstack.org/v1beta1
kind: OpenStackDataPlaneDeployment
metadata:
  name: compute-vgpus_max_instance
spec:
  nodeSets:
    - openstack-edpm
    - compute_vgpus_max_instance
    - ...
    - <nodeSet_name>
```
Copy to Clipboard Toggle word wrap
- Replace <nodeSet_name> with the names of the OpenStackDataPlaneNodeSet CRs that you want to include in your data plane deployment.
Save the compute_vgpus_max_instance_deploy.yaml deployment file.

Deploy the data plane:

oc create -f compute_vgpus_max_instance_deploy.yaml

$ oc create -f compute_vgpus_max_instance_deploy.yaml

Copy to Clipboard

Toggle word wrap

Verify that the data plane is deployed:

oc get openstackdataplanenodeset

$ oc get openstackdataplanenodeset
NAME           STATUS MESSAGE
compute_vgpus_max_instance True   Deployed

Copy to Clipboard

Toggle word wrap

Access the remote shell for openstackclient and verify that the deployed Compute nodes are visible on the control plane:
```
oc rsh -n openstack openstackclient
openstack hypervisor list
```
```
$ oc rsh -n openstack openstackclient
$ openstack hypervisor list
```
Copy to Clipboard Toggle word wrap

11.5. Configuring Compute nodes for NVIDIA GPU passthrough
Copy link

You can use PCI passthrough (NVIDIA GPU passthrough) to attach a physical PCI device, such as a graphics card, to an instance. If you use PCI passthrough for a device, the instance reserves exclusive access to the device for performing tasks, and the device is not available to the host. To use NVIDIA GPU passthrough as PCI passthrough, you must prepare the data plane nodes that you want to use for NVIDIA GPU passthrough, and you must download and install the NVIDIA device driver.

Prerequisites

You have configured PCI passthrough. For more information, see Configuring PCI passthrough.
You have enabled IOMMU in the server BIOS of the Compute nodes to support PCI passthrough as part of the PCI passthrough configuration. For more information, see Creating an OpenStackDataPlaneNodeSet CR for PCI passthrough.

Procedure

Access the remote shell for openstackclient:
```
oc rsh openstackclient
```
```
$ oc rsh openstackclient
```
Copy to Clipboard Toggle word wrap
Create an instance and install the NVIDIA device driver:
```
openstack server create --flavor <flavor> \
  --image <image> --network <network> \
  --wait myInstanceFromImage
```
```
$ openstack server create --flavor <flavor> \
  --image <image> --network <network> \
  --wait myInstanceFromImage
```
Copy to Clipboard Toggle word wrap
- Replace <flavor> with the name or ID of the flavor.
- Replace <image> with the name or ID of the image.
- Replace <network> with the name or ID of the network. You can use the --network option more than once to connect your instance to several networks, as required.
  For more information about creating an instance, see Creating an instance in Creating and managing instances.
  1. Create the file /etc/modprobe.d/blacklist-nouveau.conf.
  2. Disable the nouveau device driver by adding the following configuration to blacklist-nouveau.conf:
    
    $ blacklist nouveau $ options nvidia modeset=0
    
    Copy to Clipboard Toggle word wrap
  3. Regenerate the initramfs:
    
    $ dracut --force $ grub2-mkconfig -o /boot/grub2/grub.cfg --update-bls-cmdline
    
    Copy to Clipboard Toggle word wrap
  4. Download and install the NVIDIA device driver from the product portal. For more information, see NVIDIA DOCS HUB.
  5. Reboot the node:
    
    $ sudo reboot
    
    Copy to Clipboard Toggle word wrap
Repeat this procedure for all instances that you want to allocate for GPU passthrough instances.

Verification

To verify that the GPU is correctly configured for PCI passthrough, see Creating a nodeset for PCI passthrough.

11.6. Creating a custom vGPU resource provider trait
Copy link

You can create custom resource provider traits for each vGPU type that your RHOSO environment supports. You can then create flavors that your cloud users can use to launch instances on hosts that have those custom traits. Custom traits are defined in uppercase letters, and must begin with the prefix CUSTOM_. For more information on resource provider traits, see Filtering by resource provider traits.

Procedure

Create a new trait:
```
openstack --os-placement-api-version 1.6 trait \
 create CUSTOM_<TRAIT_NAME>
```
```
 $ openstack --os-placement-api-version 1.6 trait \
 create CUSTOM_<TRAIT_NAME>
```
Copy to Clipboard Toggle word wrap
- Replace <TRAIT_NAME> with the name of the trait. The name can contain only the letters A through Z, the numbers 0 through 9 and the underscore "_" character.

Collect the existing resource provider traits of each host:

existing_traits=$(openstack --os-placement-api-version 1.6 resource provider trait list -f value <host_uuid> | sed 's/^/--trait /')

$ existing_traits=$(openstack --os-placement-api-version 1.6 resource provider trait list -f value <host_uuid> | sed 's/^/--trait /')

Copy to Clipboard

Toggle word wrap

Check the existing resource provider traits for the traits you require a host or host aggregate to have:
```
echo $existing_traits
```
```
 $ echo $existing_traits
```
Copy to Clipboard Toggle word wrap
If the traits you require are not already added to the resource provider, then add the existing traits and your required traits to the resource providers for each host:
```
openstack --os-placement-api-version 1.6 \
 resource provider trait set $existing_traits \
 --trait CUSTOM_<TRAIT_NAME> \
 <host_uuid>
```
```
 $ openstack --os-placement-api-version 1.6 \
 resource provider trait set $existing_traits \
 --trait CUSTOM_<TRAIT_NAME> \
 <host_uuid>
```
Copy to Clipboard Toggle word wrap
- Replace <TRAIT_NAME> with the name of the trait that you want to add to the resource provider. You can use the --trait option more than once to add additional traits, as required.
  Note
  This command performs a full replacement of the traits for the resource provider. Therefore, you must retrieve the list of existing resource provider traits on the host and set them again to prevent them from being removed.

11.7. Creating a custom GPU instance image
Copy link

To enable your cloud users to create instances that use a virtual GPU (vGPU), you can create a custom vGPU-enabled image for launching instances. Use the following procedure to create a custom vGPU-enabled instance image with the NVIDIA GRID guest driver and license file.

Prerequisites

You have configured and deployed the overcloud with GPU-enabled Compute nodes.

Procedure

Create an instance with the hardware and software profile that your vGPU instances require:
```
openstack server create --flavor <flavor> \
 --image <image> temp_vgpu_instance
```
```
$ openstack server create --flavor <flavor> \
 --image <image> temp_vgpu_instance
```
Copy to Clipboard Toggle word wrap
- Replace <flavor> with the name or ID of the flavor that has the hardware profile that your vGPU instances require.
- Replace <image> with the name or ID of the image that has the software profile that your vGPU instances require. For information about downloading RHEL cloud images, see Creating RHEL KVM or RHOSP-compatible images in Creating and managing images.
Log in to the instance as a cloud user.
Create the gridd.conf NVIDIA GRID license file on the instance, following the NVIDIA guidance: Licensing an NVIDIA vGPU on Linux by Using a Configuration File.
Install the GPU driver on the instance. For more information about installing an NVIDIA driver, see Installing the NVIDIA vGPU Software Graphics Driver on Linux.
Note
Use the hw_video_model image property to define the GPU driver type. You can choose none if you want to disable the emulated GPUs for your vGPU instances. For more information about supported drivers, see Image configuration parameters.

Create an image snapshot of the instance:

openstack server image create \
 --name vgpu_image temp_vgpu_instance

$ openstack server image create \
 --name vgpu_image temp_vgpu_instance

Copy to Clipboard

Toggle word wrap

Optional: Delete the instance.

11.8. Creating a vGPU flavor for instances
Copy link

To enable your cloud users to create instances for GPU workloads, you can create a GPU flavor that can be used to launch vGPU instances, and assign the vGPU resource to that flavor.

Prerequisites

You have configured and deployed the overcloud with GPU-designated Compute nodes.

Procedure

Create an NVIDIA GPU flavor, for example:

openstack --os-compute-api=2.86 flavor create --vcpus 6 \
 --ram 8192 --disk 100 m1.small-gpu

$ openstack --os-compute-api=2.86 flavor create --vcpus 6 \
 --ram 8192 --disk 100 m1.small-gpu

Copy to Clipboard

Toggle word wrap

Assign a vGPU resource to the flavor:

openstack --os-compute-api=2.86 flavor set m1.small-gpu \
 --property "resources:VGPU=1"

$ openstack --os-compute-api=2.86 flavor set m1.small-gpu \
 --property "resources:VGPU=1"

Copy to Clipboard

Toggle word wrap

Note

You can assign only one vGPU for each instance.

Optional: To customize the flavor for a specific vGPU type, add a required trait to the flavor:
```
openstack --os-compute-api=2.86 flavor set m1.small-gpu \
 --property trait:CUSTOM_NVIDIA_11=required
```
```
$ openstack --os-compute-api=2.86 flavor set m1.small-gpu \
 --property trait:CUSTOM_NVIDIA_11=required
```
Copy to Clipboard Toggle word wrap
For information on how to create custom resource provider traits for each vGPU type, see Creating a custom vGPU resource provider trait.

11.9. Launching a vGPU instance
Copy link

You can create a GPU-enabled instance for GPU workloads.

Procedure

Create an instance using a GPU flavor and image, for example:

openstack --os-compute-api=2.86 server create --flavor m1.small-gpu \
 --image vgpu_image --security-group web --nic net-id=internal0 \
 --key-name lambda vgpu-instance

$ openstack --os-compute-api=2.86 server create --flavor m1.small-gpu \
 --image vgpu_image --security-group web --nic net-id=internal0 \
 --key-name lambda vgpu-instance

Copy to Clipboard

Toggle word wrap

Log in to the instance as a cloud-user.
To verify that the GPU is accessible from the instance, enter the following command from the instance:
```
lspci -nn | grep <gpu_name>
```
```
$ lspci -nn | grep <gpu_name>
```
Copy to Clipboard Toggle word wrap

Chapter 11. Configuring virtual GPUs for instances

11.1. Supported configurations and limitations for vGPU devices
Copy link

11.2. Preparing to configure the Compute service for vGPU
Copy link

11.3. Configuring the Compute service for vGPU
Copy link

11.4. Setting the maximum number of vGPUs that an SR-IOV NVIDIA GPU can create
Copy link

11.5. Configuring Compute nodes for NVIDIA GPU passthrough
Copy link

11.6. Creating a custom vGPU resource provider trait
Copy link

11.7. Creating a custom GPU instance image
Copy link

11.8. Creating a vGPU flavor for instances
Copy link

11.9. Launching a vGPU instance
Copy link

Learn

Try, buy, & sell

Communities

About Red Hat Documentation

Making open source more inclusive

About Red Hat

Theme

Red Hat legal and privacy links

Red Hat legal and privacy links

Chapter 11. Configuring virtual GPUs for instances

11.1. Supported configurations and limitations for vGPU devicesCopy linkLink copied to clipboard!

11.2. Preparing to configure the Compute service for vGPUCopy linkLink copied to clipboard!

11.3. Configuring the Compute service for vGPUCopy linkLink copied to clipboard!

11.4. Setting the maximum number of vGPUs that an SR-IOV NVIDIA GPU can createCopy linkLink copied to clipboard!

11.5. Configuring Compute nodes for NVIDIA GPU passthroughCopy linkLink copied to clipboard!

11.6. Creating a custom vGPU resource provider traitCopy linkLink copied to clipboard!

11.7. Creating a custom GPU instance imageCopy linkLink copied to clipboard!

11.8. Creating a vGPU flavor for instancesCopy linkLink copied to clipboard!

11.9. Launching a vGPU instanceCopy linkLink copied to clipboard!

Learn

Try, buy, & sell

Communities

About Red Hat Documentation

Making open source more inclusive

About Red Hat

Theme

Red Hat legal and privacy links

Red Hat legal and privacy links

11.1. Supported configurations and limitations for vGPU devices
Copy link

11.2. Preparing to configure the Compute service for vGPU
Copy link

11.3. Configuring the Compute service for vGPU
Copy link

11.4. Setting the maximum number of vGPUs that an SR-IOV NVIDIA GPU can create
Copy link

11.5. Configuring Compute nodes for NVIDIA GPU passthrough
Copy link

11.6. Creating a custom vGPU resource provider trait
Copy link

11.7. Creating a custom GPU instance image
Copy link

11.8. Creating a vGPU flavor for instances
Copy link

11.9. Launching a vGPU instance
Copy link