Home
Products
Red Hat OpenStack Platform
15
Instances and Images Guide
Chapter 8. Configuring Virtual GPU for guest instances

Chapter 8. Configuring Virtual GPU for guest instances

To support GPU-based rendering on your guest instances, you can define and manage virtual GPU (vGPU) resources according to your available physical GPU devices and your hypervisor type. This configuration allows you to divide the rendering workloads between all your physical GPU devices more effectively, and to have more control over scheduling, tuning, and monitoring your vGPU-enabled guest instances.

To enable vGPU in OpenStack Compute, you create flavors that you can use to request Red Hat Enterprise Linux guests with vGPU devices, and then you assign those flavors to Compute instances. Each instance can then support GPU workloads with virtual GPU devices that correspond to the physical GPU devices.

The OpenStack Compute service tracks the number and size of the vGPU devices that are available on each host, schedules guests to these hosts based on the flavor, attaches the devices, and monitors usage on an ongoing basis. In case the guest is no longer available, OpenStack Compute adds the vGPU devices back to the available pool.

8.1. Supported configurations and limitations
Copy link

This section lists currently supported virtual GPU (vGPU) graphics cards, as well as considerations and limitations for setting up vGPU devices in OpenStack Compute.

Supported GPU cards

For a list of supported NVIDIA GPU cards, see Virtual GPU Software Supported Products on the NVIDIA website.

Limitations and considerations

You can use only one vGPU type for each Compute host.
You can use only one vGPU resource for each Compute instance.
Live migration of vGPU between hosts is not supported.
Suspend operations on a vGPU-enabled guest is not supported due to a libvirt limitation. Instead, you can snapshot or shelve the instance.
Resize and cold migration operations on an instance with a vGPU flavor does not automatically re-allocate the vGPU resources to the instance. After you resize or migrate the instance, you must rebuild it manually to re-allocate the vGPU resources.
By default, vGPU types on Compute hosts are not exposed to API users. To allow access, you can add the hosts to a host aggregate. For general information about host aggregates, see Section 4.4, “Manage Host Aggregates”
If you use NVIDIA accelerator hardware, you must comply with the NVIDIA licensing requirements. For example, NVIDIA vGPU GRID requires a licensing server. For more information about the NVIDIA licensing requirements, see the NVIDIA License Server Release Notes web page.

8.2. Deploying NVIDIA GRID vGPU
Copy link

This section describes how to deploy virtual GPU (vGPU) for NVIDIA devices on your Compute node hosts and on your guest instances. This end-to-end process includes the following steps:

Building a custom GPU-enabled overcloud image
Preparing the GPU role, profile, and flavor
Configuring and deploying the overcloud
Building a custom vGPU-enabled guest image
Preparing the vGPU flavor for the instances
Launching and configuring the vGPU-enabled instances

Prerequisites

Before you deploy NVIDIA GRID vGPU on your overcloud, make sure that your environment meets the following requirements:

Your deployment must meet the requirements for vGPU devices, as described in Section 8.1, “Supported configurations and limitations”.
Your undercloud must be deployed and the default overcloud image must be uploaded to Glance.
You must comply with the NVIDIA GRID licensing requirements and you must have the URL of your self-hosted license server. For more information about the NVIDIA licensing requirements and self-hosted server installation, see the NVIDIA License Server Release Notes web page.

8.2.1. Build a custom GPU overcloud image
Copy link

Perform the following steps on the undercloud to install the NVIDIA GRID host driver on an overcloud Compute image and upload the image to Glance.

Copy the overcloud image and add the gpu suffix to the copied image.
```
cp overcloud-full.qcow2 overcloud-full-gpu.qcow2
```
```
$ cp overcloud-full.qcow2 overcloud-full-gpu.qcow2
```
Copy to Clipboard Toggle word wrap
Install an ISO image generator tool from YUM.
```
sudo yum install genisoimage -y
```
```
$ sudo yum install genisoimage -y
```
Copy to Clipboard Toggle word wrap
Download the NVIDIA GRID host driver RPM package that corresponds to your GPU device from the NVIDIA website. To determine which driver you need, see the NVIDIA Driver Downloads Portal.
Note
You must be a registered NVIDIA customer to download the drivers from the portal.

Create an ISO image from the driver RPM package and save the image in the nvidia-guest directory. You will use this ISO image to install the driver on your Compute nodes in subsequent steps.

genisoimage -o nvidia-guest.iso -R -J -V NVIDIA nvidia-guest/
I: -input-charset not specified, using utf-8 (detected in locale settings)
  9.06% done, estimate finish Wed Oct 31 11:24:46 2018
 18.08% done, estimate finish Wed Oct 31 11:24:46 2018
 27.14% done, estimate finish Wed Oct 31 11:24:46 2018
 36.17% done, estimate finish Wed Oct 31 11:24:46 2018
 45.22% done, estimate finish Wed Oct 31 11:24:46 2018
 54.25% done, estimate finish Wed Oct 31 11:24:46 2018
 63.31% done, estimate finish Wed Oct 31 11:24:46 2018
 72.34% done, estimate finish Wed Oct 31 11:24:46 2018
 81.39% done, estimate finish Wed Oct 31 11:24:46 2018
 90.42% done, estimate finish Wed Oct 31 11:24:46 2018
 99.48% done, estimate finish Wed Oct 31 11:24:46 2018
Total translation table size: 0
Total rockridge attributes bytes: 358
Total directory bytes: 0
Path table size(bytes): 10
Max brk space used 0
55297 extents written (108 MB)

$ genisoimage -o nvidia-guest.iso -R -J -V NVIDIA nvidia-guest/
I: -input-charset not specified, using utf-8 (detected in locale settings)
  9.06% done, estimate finish Wed Oct 31 11:24:46 2018
 18.08% done, estimate finish Wed Oct 31 11:24:46 2018
 27.14% done, estimate finish Wed Oct 31 11:24:46 2018
 36.17% done, estimate finish Wed Oct 31 11:24:46 2018
 45.22% done, estimate finish Wed Oct 31 11:24:46 2018
 54.25% done, estimate finish Wed Oct 31 11:24:46 2018
 63.31% done, estimate finish Wed Oct 31 11:24:46 2018
 72.34% done, estimate finish Wed Oct 31 11:24:46 2018
 81.39% done, estimate finish Wed Oct 31 11:24:46 2018
 90.42% done, estimate finish Wed Oct 31 11:24:46 2018
 99.48% done, estimate finish Wed Oct 31 11:24:46 2018
Total translation table size: 0
Total rockridge attributes bytes: 358
Total directory bytes: 0
Path table size(bytes): 10
Max brk space used 0
55297 extents written (108 MB)

Copy to Clipboard

Toggle word wrap

Create a driver installation script for your Compute nodes. This script installs the NVIDIA GRID host driver on each Compute node that you run it on. In this example the script is named install_nvidia.sh.

#/bin/bash

# NVIDIA GRID package
mkdir /tmp/mount
mount LABEL=NVIDIA /tmp/mount
rpm -ivh /tmp/mount/NVIDIA-vGPU-rhel-8.0-430.27.x86_64.rpm

#/bin/bash

# NVIDIA GRID package
mkdir /tmp/mount
mount LABEL=NVIDIA /tmp/mount
rpm -ivh /tmp/mount/NVIDIA-vGPU-rhel-8.0-430.27.x86_64.rpm

Copy to Clipboard

Toggle word wrap

Customize the overcloud image by attaching the ISO image that you generated and running the driver installation script that you created. For example:

virt-customize --attach nvidia-packages.iso -a overcloud-full-gpu.qcow2  -v --run install_nvidia.sh
[   0.0] Examining the guest ...
libguestfs: launch: program=virt-customize
libguestfs: launch: version=1.36.10rhel=8,release=6.el8_5.2,libvirt
libguestfs: launch: backend registered: unix
libguestfs: launch: backend registered: uml
libguestfs: launch: backend registered: libvirt

$ virt-customize --attach nvidia-packages.iso -a overcloud-full-gpu.qcow2  -v --run install_nvidia.sh
[   0.0] Examining the guest ...
libguestfs: launch: program=virt-customize
libguestfs: launch: version=1.36.10rhel=8,release=6.el8_5.2,libvirt
libguestfs: launch: backend registered: unix
libguestfs: launch: backend registered: uml
libguestfs: launch: backend registered: libvirt

Copy to Clipboard

Toggle word wrap

Relabel the customized image with SELinux.

virt-customize -a overcloud-full-gpu.qcow2 --selinux-relabel
[   0.0] Examining the guest ...
[   2.2] Setting a random seed
[   2.2] SELinux relabelling
[  27.4] Finishing off

$ virt-customize -a overcloud-full-gpu.qcow2 --selinux-relabel
[   0.0] Examining the guest ...
[   2.2] Setting a random seed
[   2.2] SELinux relabelling
[  27.4] Finishing off

Copy to Clipboard

Toggle word wrap

Prepare the custom image files for a Glance upload. For example:

mkdir /var/image/x86_64/image
guestmount -a overcloud-full-gpu.qcow2 -i --ro image
cp image/boot/vmlinuz-3.10.0-862.14.4.el8.x86_64 ./overcloud-full-gpu.vmlinuz
cp image/boot/initramfs-3.10.0-862.14.4.el8.x86_64.img ./overcloud-full-gpu.initrd

$ mkdir /var/image/x86_64/image
$ guestmount -a overcloud-full-gpu.qcow2 -i --ro image
$ cp image/boot/vmlinuz-3.10.0-862.14.4.el8.x86_64 ./overcloud-full-gpu.vmlinuz
$ cp image/boot/initramfs-3.10.0-862.14.4.el8.x86_64.img ./overcloud-full-gpu.initrd

Copy to Clipboard

Toggle word wrap

From the undercloud, upload the custom image to Glance.

(undercloud) $ openstack overcloud image upload --update-existing --os-image-name overcloud-full-gpu.qcow2

(undercloud) $ openstack overcloud image upload --update-existing --os-image-name overcloud-full-gpu.qcow2

Copy to Clipboard

Toggle word wrap

8.2.2. Configure the vGPU role, profile, and flavor
Copy link

After you build the custom GPU overcloud image, you prepare the Compute nodes for GPU-enabled overcloud deployment. This section describes how to configure the role, profile, and flavor for the GPU-enabled Compute nodes.

Create the new ComputeGPU role file by copying the file /home/stack/templates/roles/Compute.yaml to /home/stack/templates/roles/ComputeGPU.yaml and editing the following file sections:

Expand

Table 8.1. ComputeGPU role file edits
Section	Current value	New value
Role comment	Role: Compute	Role: ComputeGpu
Role name	name: Compute	name: ComputeGpu
Description	Basic Compute Node role	GPU role
CountDefault	1	0
ImageDefault	overcloud-full	overcloud-gpu
HostnameFormatDefault	-compute-	-computegpu-
deprecated_nic_config_name	compute.yaml	compute-gpu.yaml

Generate a new roles data file named gpu_roles_data.yaml that includes the Controller, Compute, and ComputeGpu roles.

openstack overcloud roles generate -o /home/stack/templates/gpu_roles_data.yaml Controller Compute ComputeGpu

(undercloud) [stack@director templates]$ openstack overcloud roles generate -o /home/stack/templates/gpu_roles_data.yaml Controller Compute ComputeGpu

Copy to Clipboard

Toggle word wrap

The following example shows the ComputeGpu role details:

#####################################################################
# Role: ComputeGpu                                                  #
#####################################################################
- name: ComputeGpu
  description: |
    GPU Compute Node role
  CountDefault: 1
  ImageDefault: overcloud-gpu
  networks:
    - InternalApi
    - Tenant
    - Storage
  HostnameFormatDefault: '%stackname%-computegpu-%index%'
  RoleParametersDefault:
    TunedProfileName: "virtual-host"
  # Deprecated & backward-compatible values (FIXME: Make parameters consistent)
  # Set uses_deprecated_params to True if any deprecated params are used.
  uses_deprecated_params: True
  deprecated_param_image: 'NovaImage'
  deprecated_param_extraconfig: 'NovaComputeExtraConfig'
  deprecated_param_metadata: 'NovaComputeServerMetadata'
  deprecated_param_scheduler_hints: 'NovaComputeSchedulerHints'
  deprecated_param_ips: 'NovaComputeIPs'
  deprecated_server_resource_name: 'NovaCompute'
  deprecated_nic_config_name: 'compute-gpu.yaml'
  ServicesDefault:
    - OS::TripleO::Services::Aide
    - OS::TripleO::Services::AuditD
    - OS::TripleO::Services::CACerts
    - OS::TripleO::Services::CephClient
    - OS::TripleO::Services::CephExternal
    - OS::TripleO::Services::CertmongerUser
    - OS::TripleO::Services::Collectd
    - OS::TripleO::Services::ComputeCeilometerAgent
    - OS::TripleO::Services::ComputeNeutronCorePlugin
    - OS::TripleO::Services::ComputeNeutronL3Agent
    - OS::TripleO::Services::ComputeNeutronMetadataAgent
    - OS::TripleO::Services::ComputeNeutronOvsAgent
    - OS::TripleO::Services::Docker
    - OS::TripleO::Services::Fluentd
    - OS::TripleO::Services::Ipsec
    - OS::TripleO::Services::Iscsid
    - OS::TripleO::Services::Kernel
    - OS::TripleO::Services::LoginDefs
    - OS::TripleO::Services::MetricsQdr
    - OS::TripleO::Services::MySQLClient
    - OS::TripleO::Services::NeutronBgpVpnBagpipe
    - OS::TripleO::Services::NeutronLinuxbridgeAgent
    - OS::TripleO::Services::NeutronVppAgent
    - OS::TripleO::Services::NovaCompute
    - OS::TripleO::Services::NovaLibvirt
    - OS::TripleO::Services::NovaLibvirtGuests
    - OS::TripleO::Services::NovaMigrationTarget
    - OS::TripleO::Services::Ntp
    - OS::TripleO::Services::ContainersLogrotateCrond
    - OS::TripleO::Services::OpenDaylightOvs
    - OS::TripleO::Services::Rhsm
    - OS::TripleO::Services::RsyslogSidecar
    - OS::TripleO::Services::Securetty
    - OS::TripleO::Services::SensuClient
    - OS::TripleO::Services::SkydiveAgent
    - OS::TripleO::Services::Snmp
    - OS::TripleO::Services::Sshd
    - OS::TripleO::Services::Timezone
    - OS::TripleO::Services::TripleoFirewall
    - OS::TripleO::Services::TripleoPackages
    - OS::TripleO::Services::Tuned
    - OS::TripleO::Services::Vpp
    - OS::TripleO::Services::OVNController
    - OS::TripleO::Services::OVNMetadataAgent
    - OS::TripleO::Services::Ptp

#####################################################################
# Role: ComputeGpu                                                  #
#####################################################################
- name: ComputeGpu
  description: |
    GPU Compute Node role
  CountDefault: 1
  ImageDefault: overcloud-gpu
  networks:
    - InternalApi
    - Tenant
    - Storage
  HostnameFormatDefault: '%stackname%-computegpu-%index%'
  RoleParametersDefault:
    TunedProfileName: "virtual-host"
  # Deprecated & backward-compatible values (FIXME: Make parameters consistent)
  # Set uses_deprecated_params to True if any deprecated params are used.
  uses_deprecated_params: True
  deprecated_param_image: 'NovaImage'
  deprecated_param_extraconfig: 'NovaComputeExtraConfig'
  deprecated_param_metadata: 'NovaComputeServerMetadata'
  deprecated_param_scheduler_hints: 'NovaComputeSchedulerHints'
  deprecated_param_ips: 'NovaComputeIPs'
  deprecated_server_resource_name: 'NovaCompute'
  deprecated_nic_config_name: 'compute-gpu.yaml'
  ServicesDefault:
    - OS::TripleO::Services::Aide
    - OS::TripleO::Services::AuditD
    - OS::TripleO::Services::CACerts
    - OS::TripleO::Services::CephClient
    - OS::TripleO::Services::CephExternal
    - OS::TripleO::Services::CertmongerUser
    - OS::TripleO::Services::Collectd
    - OS::TripleO::Services::ComputeCeilometerAgent
    - OS::TripleO::Services::ComputeNeutronCorePlugin
    - OS::TripleO::Services::ComputeNeutronL3Agent
    - OS::TripleO::Services::ComputeNeutronMetadataAgent
    - OS::TripleO::Services::ComputeNeutronOvsAgent
    - OS::TripleO::Services::Docker
    - OS::TripleO::Services::Fluentd
    - OS::TripleO::Services::Ipsec
    - OS::TripleO::Services::Iscsid
    - OS::TripleO::Services::Kernel
    - OS::TripleO::Services::LoginDefs
    - OS::TripleO::Services::MetricsQdr
    - OS::TripleO::Services::MySQLClient
    - OS::TripleO::Services::NeutronBgpVpnBagpipe
    - OS::TripleO::Services::NeutronLinuxbridgeAgent
    - OS::TripleO::Services::NeutronVppAgent
    - OS::TripleO::Services::NovaCompute
    - OS::TripleO::Services::NovaLibvirt
    - OS::TripleO::Services::NovaLibvirtGuests
    - OS::TripleO::Services::NovaMigrationTarget
    - OS::TripleO::Services::Ntp
    - OS::TripleO::Services::ContainersLogrotateCrond
    - OS::TripleO::Services::OpenDaylightOvs
    - OS::TripleO::Services::Rhsm
    - OS::TripleO::Services::RsyslogSidecar
    - OS::TripleO::Services::Securetty
    - OS::TripleO::Services::SensuClient
    - OS::TripleO::Services::SkydiveAgent
    - OS::TripleO::Services::Snmp
    - OS::TripleO::Services::Sshd
    - OS::TripleO::Services::Timezone
    - OS::TripleO::Services::TripleoFirewall
    - OS::TripleO::Services::TripleoPackages
    - OS::TripleO::Services::Tuned
    - OS::TripleO::Services::Vpp
    - OS::TripleO::Services::OVNController
    - OS::TripleO::Services::OVNMetadataAgent
    - OS::TripleO::Services::Ptp

Copy to Clipboard

Toggle word wrap

Create the compute-vgpu-nvidia flavor to tag nodes that you want to designate for vGPU workloads.

openstack flavor create --id auto --ram 6144 --disk 40 --vcpus 4 compute-vgpu-nvidia
+----------------------------+--------------------------------------+
| Field                      | Value                                |
+----------------------------+--------------------------------------+
| OS-FLV-DISABLED:disabled   | False                                |
| OS-FLV-EXT-DATA:ephemeral  | 0                                    |
| disk                       | 40                                   |
| id                         | 9cb47954-be00-47c6-a57f-44db35be3e69 |
| name                       | compute-vgpu-nvidia                  |
| os-flavor-access:is_public | True                                 |
| properties                 |                                      |
| ram                        | 6144                                 |
| rxtx_factor                | 1.0                                  |
| swap                       |                                      |
| vcpus                      | 4                                    |
+----------------------------+--------------------------------------+

(undercloud) [stack@director templates]$ openstack flavor create --id auto --ram 6144 --disk 40 --vcpus 4 compute-vgpu-nvidia
+----------------------------+--------------------------------------+
| Field                      | Value                                |
+----------------------------+--------------------------------------+
| OS-FLV-DISABLED:disabled   | False                                |
| OS-FLV-EXT-DATA:ephemeral  | 0                                    |
| disk                       | 40                                   |
| id                         | 9cb47954-be00-47c6-a57f-44db35be3e69 |
| name                       | compute-vgpu-nvidia                  |
| os-flavor-access:is_public | True                                 |
| properties                 |                                      |
| ram                        | 6144                                 |
| rxtx_factor                | 1.0                                  |
| swap                       |                                      |
| vcpus                      | 4                                    |
+----------------------------+--------------------------------------+

Copy to Clipboard

Toggle word wrap

Tag each node that you want to designate for GPU workloads with the compute-vgpu-nvidia profile.

openstack baremetal node set --property capabilities='profile:compute-vgpu-nvidia,boot_option:local' 9d07a673-b6bf-4a20-a538-3b05e8fa2c13

(undercloud) [stack@director templates]$ openstack baremetal node set --property capabilities='profile:compute-vgpu-nvidia,boot_option:local' 9d07a673-b6bf-4a20-a538-3b05e8fa2c13

Copy to Clipboard

Toggle word wrap

8.2.3. Prepare configuration files and deploying the overcloud
Copy link

After you prepare your overcloud for vGPU, you retrieve and assign the vGPU type that corresponds to the physical GPU device in your environment and prepare the configuration templates.

Configure the vGPU type for your NVIDIA device

To determine the vGPU type for your physical GPU device, you must check the available device type from a different machine. You can perform these steps from any temporary Red Hat Enterprise Linux unused Compute node, and then delete the node. You do not need to deploy the overcloud to perform these steps.

Install Red Hat Enterprise Linux and the NVIDIA GRID driver on one Compute node and launch the node. For information on installing the NVIDIA GRID driver, see Section 8.2.1, “Build a custom GPU overcloud image”.

On the Compute node, locate the vGPU type of the physical GPU device that you want to enable. For libvirt, virtual GPUs are seen as mediated devices, or mdev type devices. To discover the supported mdev devices, run the following command:

ls /sys/class/mdev_bus/0000\:06\:00.0/mdev_supported_types/
nvidia-11  nvidia-12  nvidia-13  nvidia-14  nvidia-15  nvidia-16  nvidia-17  nvidia-18  nvidia-19  nvidia-20  nvidia-21  nvidia-210  nvidia-22

cat /sys/class/mdev_bus/0000\:06\:00.0/mdev_supported_types/nvidia-18/description
num_heads=4, frl_config=60, framebuffer=2048M, max_resolution=4096x2160, max_instance=4

[root@overcloud-computegpu-0 ~]# ls /sys/class/mdev_bus/0000\:06\:00.0/mdev_supported_types/
nvidia-11  nvidia-12  nvidia-13  nvidia-14  nvidia-15  nvidia-16  nvidia-17  nvidia-18  nvidia-19  nvidia-20  nvidia-21  nvidia-210  nvidia-22

[root@overcloud-computegpu-0 ~]# cat /sys/class/mdev_bus/0000\:06\:00.0/mdev_supported_types/nvidia-18/description
num_heads=4, frl_config=60, framebuffer=2048M, max_resolution=4096x2160, max_instance=4

Copy to Clipboard

Toggle word wrap

Prepare the configuration templates

Add the compute-gpu.yaml file to the network-environment.yaml file. For example:

resource_registry:
  OS::TripleO::Compute::Net::SoftwareConfig: /home/stack/templates/nic-configs/compute.yaml
  OS::TripleO::ComputeGpu::Net::SoftwareConfig: /home/stack/templates/nic-configs/compute-gpu.yaml
  OS::TripleO::Controller::Net::SoftwareConfig: /home/stack/templates/nic-configs/controller.yaml
  #OS::TripleO::AllNodes::Validation: OS::Heat::None

resource_registry:
  OS::TripleO::Compute::Net::SoftwareConfig: /home/stack/templates/nic-configs/compute.yaml
  OS::TripleO::ComputeGpu::Net::SoftwareConfig: /home/stack/templates/nic-configs/compute-gpu.yaml
  OS::TripleO::Controller::Net::SoftwareConfig: /home/stack/templates/nic-configs/controller.yaml
  #OS::TripleO::AllNodes::Validation: OS::Heat::None

Copy to Clipboard

Toggle word wrap

Add the OvercloudComputeGpuFlavor flavor to the node-info.yaml file. For example:

parameter_defaults:
  OvercloudControllerFlavor: control
  OvercloudComputeFlavor: compute
  OvercloudComputeGpuFlavor: compute-vgpu-nvidia
  ControllerCount: 1
  ComputeCount: 0
  ComputeGpuCount: 1
  NtpServer: `NTP_SERVER_URL`
  NeutronNetworkType: vxlan,vlan
  NeutronTunnelTypes: vxlan

parameter_defaults:
  OvercloudControllerFlavor: control
  OvercloudComputeFlavor: compute
  OvercloudComputeGpuFlavor: compute-vgpu-nvidia
  ControllerCount: 1
  ComputeCount: 0
  ComputeGpuCount: 1
  NtpServer: `NTP_SERVER_URL`
  NeutronNetworkType: vxlan,vlan
  NeutronTunnelTypes: vxlan

Copy to Clipboard

Toggle word wrap

Replace the NTP_SERVER_URL variable with the address of your NTP server.

Create a gpu.yaml file with the vGPU type that you retrieved for your GPU device. For example:
```
parameter_defaults:
  ComputeGpuExtraConfig:
    nova::compute::vgpu::enabled_vgpu_types:
      - nvidia-18
```
```
parameter_defaults:
  ComputeGpuExtraConfig:
    nova::compute::vgpu::enabled_vgpu_types:
      - nvidia-18
```
Copy to Clipboard Toggle word wrap
Note
Only one virtual GPU type is supported per physical GPU. If you specify multiple vGPU types in this property, only the first type is used.

Deploy the overcloud

Run the overcloud deploy command with the custom GPU image and the configuration templates that you prepared.

openstack overcloud deploy -r /home/stack/templates/nvidia/gpu_roles_data.yaml  -e /home/stack/templates/nvidia/gpu.yaml

$ openstack overcloud deploy -r /home/stack/templates/nvidia/gpu_roles_data.yaml  -e /home/stack/templates/nvidia/gpu.yaml

Copy to Clipboard

Toggle word wrap

8.2.4. Build a custom GPU guest image
Copy link

After you deploy the overcloud with GPU-enabled Compute nodes, you build a custom vGPU-enabled instance image with the NVIDIA GRID guest driver and license file.

Create the NVIDIA GRID license file

In the overcloud host, create a gridd.conf file that contains the NVIDIA GRID license information. Use the license server information from your self-hosted NVIDIA GRID license server that you installed previously. For example:

/etc/nvidia/gridd.conf.template - Configuration file for NVIDIA Grid Daemon

# This is a template for the configuration file for NVIDIA Grid Daemon.
# For details on the file format, please refer to the nvidia-gridd(1)
# man page.

# Description: Set License Server Address
# Data type: string
# Format:  "<address>"
ServerAddress=[NVIDIA_LICENSE_SERVER_URL]

# Description: Set License Server port number
# Data type: integer
# Format:  <port>, default is 7070
ServerPort=[PORT_NUMBER]

# Description: Set Backup License Server Address
# Data type: string
# Format:  "<address>"
#BackupServerAddress=

# Description: Set Backup License Server port number
# Data type: integer
# Format:  <port>, default is 7070
#BackupServerPort=

# Description: Set Feature to be enabled
# Data type: integer
# Possible values:
   0 => for unlicensed state
   1 => for GRID vGPU
   2 => for Quadro Virtual Datacenter Workstation
FeatureType=[TYPE_ID]

# Description: Parameter to enable or disable Grid Licensing tab in nvidia-settings
# Data type: boolean
# Possible values: TRUE or FALSE, default is FALSE
EnableUI=TRUE

# Description: Set license borrow period in minutes
# Data type: integer
# Possible values: 10 to 10080 mins(7 days), default is 1440 mins(1 day)
#LicenseInterval=1440

# Description: Set license linger period in minutes
# Data type: integer
# Possible values: 0 to 10080 mins(7 days), default is 0 mins
#LingerInterval=10

# /etc/nvidia/gridd.conf.template - Configuration file for NVIDIA Grid Daemon

# This is a template for the configuration file for NVIDIA Grid Daemon.
# For details on the file format, please refer to the nvidia-gridd(1)
# man page.

# Description: Set License Server Address
# Data type: string
# Format:  "<address>"
ServerAddress=[NVIDIA_LICENSE_SERVER_URL]

# Description: Set License Server port number
# Data type: integer
# Format:  <port>, default is 7070
ServerPort=[PORT_NUMBER]

# Description: Set Backup License Server Address
# Data type: string
# Format:  "<address>"
#BackupServerAddress=

# Description: Set Backup License Server port number
# Data type: integer
# Format:  <port>, default is 7070
#BackupServerPort=

# Description: Set Feature to be enabled
# Data type: integer
# Possible values:
#    0 => for unlicensed state
#    1 => for GRID vGPU
#    2 => for Quadro Virtual Datacenter Workstation
FeatureType=[TYPE_ID]

# Description: Parameter to enable or disable Grid Licensing tab in nvidia-settings
# Data type: boolean
# Possible values: TRUE or FALSE, default is FALSE
EnableUI=TRUE

# Description: Set license borrow period in minutes
# Data type: integer
# Possible values: 10 to 10080 mins(7 days), default is 1440 mins(1 day)
#LicenseInterval=1440

# Description: Set license linger period in minutes
# Data type: integer
# Possible values: 0 to 10080 mins(7 days), default is 0 mins
#LingerInterval=10

Copy to Clipboard

Toggle word wrap

Prepare the guest image and the NVIDIA GRID guest driver

Download the NVIDIA GRID guest driver RPM package that corresponds to your GPU device from the NVIDIA website. To determine which driver you need, see the NVIDIA Driver Downloads Portal.
Note
You must be a registered NVIDIA customer to download the drivers from the portal.

Create an ISO image from the driver RPM package. You will use this ISO image to install the driver on your Compute nodes in subsequent steps.

genisoimage -o nvidia-guest.iso -R -J -V NVIDIA nvidia-guest/
I: -input-charset not specified, using utf-8 (detected in locale settings)
  9.06% done, estimate finish Wed Oct 31 10:59:50 2018
 18.08% done, estimate finish Wed Oct 31 10:59:50 2018
 27.14% done, estimate finish Wed Oct 31 10:59:50 2018
 36.17% done, estimate finish Wed Oct 31 10:59:50 2018
 45.22% done, estimate finish Wed Oct 31 10:59:50 2018
 54.25% done, estimate finish Wed Oct 31 10:59:50 2018
 63.31% done, estimate finish Wed Oct 31 10:59:50 2018
 72.34% done, estimate finish Wed Oct 31 10:59:50 2018
 81.39% done, estimate finish Wed Oct 31 10:59:50 2018
 90.42% done, estimate finish Wed Oct 31 10:59:50 2018
 99.48% done, estimate finish Wed Oct 31 10:59:50 2018
Total translation table size: 0
Total rockridge attributes bytes: 358
Total directory bytes: 0
Path table size(bytes): 10
Max brk space used 0
55297 extents written (108 MB)

[root@virtlab607 guest]# genisoimage -o nvidia-guest.iso -R -J -V NVIDIA nvidia-guest/
I: -input-charset not specified, using utf-8 (detected in locale settings)
  9.06% done, estimate finish Wed Oct 31 10:59:50 2018
 18.08% done, estimate finish Wed Oct 31 10:59:50 2018
 27.14% done, estimate finish Wed Oct 31 10:59:50 2018
 36.17% done, estimate finish Wed Oct 31 10:59:50 2018
 45.22% done, estimate finish Wed Oct 31 10:59:50 2018
 54.25% done, estimate finish Wed Oct 31 10:59:50 2018
 63.31% done, estimate finish Wed Oct 31 10:59:50 2018
 72.34% done, estimate finish Wed Oct 31 10:59:50 2018
 81.39% done, estimate finish Wed Oct 31 10:59:50 2018
 90.42% done, estimate finish Wed Oct 31 10:59:50 2018
 99.48% done, estimate finish Wed Oct 31 10:59:50 2018
Total translation table size: 0
Total rockridge attributes bytes: 358
Total directory bytes: 0
Path table size(bytes): 10
Max brk space used 0
55297 extents written (108 MB)

Copy to Clipboard

Toggle word wrap

Copy the guest image that you want to customize for GPU instances. For example:

cp rhel-server-8.0-update-4-x86_64-kvm.qcow2 rhel-server-8.0-update-4-x86_64-kvm-gpu.qcow2

[root@virtlab607 guest]# cp rhel-server-8.0-update-4-x86_64-kvm.qcow2 rhel-server-8.0-update-4-x86_64-kvm-gpu.qcow2

Copy to Clipboard

Toggle word wrap

Create and run the customization script

By default, you must install the NVIDIA GRID drivers on each instance that you want to designate for GPU workloads. This process involves modifying the guest image, rebooting, and then installing the guest drivers. You can create a script to automate this process for the guest instances.

Create a script named nvidia-prepare-guest.sh to enable the required repositories, update the instance to the latest kernel, install the NVIDIA GRID guest driver, and attach the gridd.conf license file to the instance.

#/bin/bash

# Add build tooling
subscription-manager register --username [USERNAME] --password [PASSWORD]
subscription-manager attach --pool=8a85f98c651a88990165399d8eea03e7
subscription-manager repos --disable=*
subscription-manager repos --enable=rhel-8-server-rpms
dnf upgrade -y
dnf install -y gcc make kernel-devel cpp glibc-devel glibc-headers kernel-headers libmpc mpfr elfutils-libelf-devel

# NVIDIA GRID guest script
mkdir /tmp/mount
mount LABEL=NVIDIA /tmp/mount
/bin/sh /tmp/mount/NVIDIA-Linux-x86_64-430.24-grid.run

mkdir -p /etc/nvidia
cp /tmp/mount/gridd.conf /etc/nvidia

#/bin/bash

# Add build tooling
subscription-manager register --username [USERNAME] --password [PASSWORD]
subscription-manager attach --pool=8a85f98c651a88990165399d8eea03e7
subscription-manager repos --disable=*
subscription-manager repos --enable=rhel-8-server-rpms
dnf upgrade -y
dnf install -y gcc make kernel-devel cpp glibc-devel glibc-headers kernel-headers libmpc mpfr elfutils-libelf-devel

# NVIDIA GRID guest script
mkdir /tmp/mount
mount LABEL=NVIDIA /tmp/mount
/bin/sh /tmp/mount/NVIDIA-Linux-x86_64-430.24-grid.run

mkdir -p /etc/nvidia
cp /tmp/mount/gridd.conf /etc/nvidia

Copy to Clipboard

Toggle word wrap

Run the script on the guest image that you copied previously. For example:

virt-customize --attach nvidia-guest.iso -a rhel-server-8.0-update-4-x86_64-kvm-gpu.qcow2 -v --run nvidia-prepare-guest.sh

$ virt-customize --attach nvidia-guest.iso -a rhel-server-8.0-update-4-x86_64-kvm-gpu.qcow2 -v --run nvidia-prepare-guest.sh

Copy to Clipboard

Toggle word wrap

Upload the custom guest image to Glance.

openstack image create rhelgpu --file /var/images/x86_64/rhel-server-8.0-update-4-x86_64-kvm-gpu.qcow2 --disk-format qcow2 --container-format bare --public

(overcloud) [stack@director ~]$ openstack image create rhelgpu --file /var/images/x86_64/rhel-server-8.0-update-4-x86_64-kvm-gpu.qcow2 --disk-format qcow2 --container-format bare --public

Copy to Clipboard

Toggle word wrap

8.2.5. Create a vGPU profile for instances
Copy link

After you build the custom guest image, you create a GPU flavor and assign a vGPU resource to that flavor. When you later launch instances with this flavor, the vGPU resource will be available to each instance.

Note

You can assign only one vGPU resource for each instance.

Create an NVIDIA GPU flavor to tag each instance that you want to designate for GPU workloads. For example:

openstack flavor create --vcpus 6 --ram 8192 --disk 100 m1.small-gpu
+----------------------------+--------------------------------------+
| Field                      | Value                                |
+----------------------------+--------------------------------------+
| OS-FLV-DISABLED:disabled   | False                                |
| OS-FLV-EXT-DATA:ephemeral  | 0                                    |
| disk                       | 100                                  |
| id                         | a27b14dd-c42d-4084-9b6a-225555876f68 |
| name                       | m1.small-gpu                         |
| os-flavor-access:is_public | True                                 |
| properties                 |                                      |
| ram                        | 8192                                 |
| rxtx_factor                | 1.0                                  |
| swap                       |                                      |
| vcpus                      | 6                                    |
+----------------------------+--------------------------------------+

(overcloud) [stack@virtlab-director2 ~]$ openstack flavor create --vcpus 6 --ram 8192 --disk 100 m1.small-gpu
+----------------------------+--------------------------------------+
| Field                      | Value                                |
+----------------------------+--------------------------------------+
| OS-FLV-DISABLED:disabled   | False                                |
| OS-FLV-EXT-DATA:ephemeral  | 0                                    |
| disk                       | 100                                  |
| id                         | a27b14dd-c42d-4084-9b6a-225555876f68 |
| name                       | m1.small-gpu                         |
| os-flavor-access:is_public | True                                 |
| properties                 |                                      |
| ram                        | 8192                                 |
| rxtx_factor                | 1.0                                  |
| swap                       |                                      |
| vcpus                      | 6                                    |
+----------------------------+--------------------------------------+

Copy to Clipboard

Toggle word wrap

Assign a vGPU resource to the flavor that you created. Currently you can assign only one vGPU for each instance.

openstack flavor set m1.small-gpu --property "resources:VGPU=1"

openstack flavor show m1.small-gpu
+----------------------------+--------------------------------------+
| Field                      | Value                                |
+----------------------------+--------------------------------------+
| OS-FLV-DISABLED:disabled   | False                                |
| OS-FLV-EXT-DATA:ephemeral  | 0                                    |
| access_project_ids         | None                                 |
| disk                       | 100                                  |
| id                         | a27b14dd-c42d-4084-9b6a-225555876f68 |
| name                       | m1.small-gpu                         |
| os-flavor-access:is_public | True                                 |
| properties                 | resources:VGPU='1'                   |
| ram                        | 8192                                 |
| rxtx_factor                | 1.0                                  |
| swap                       |                                      |
| vcpus                      | 6                                    |
+----------------------------+--------------------------------------+

(overcloud) [stack@virtlab-director2 ~]$ openstack flavor set m1.small-gpu --property "resources:VGPU=1"

(overcloud) [stack@virtlab-director2 ~]$ openstack flavor show m1.small-gpu
+----------------------------+--------------------------------------+
| Field                      | Value                                |
+----------------------------+--------------------------------------+
| OS-FLV-DISABLED:disabled   | False                                |
| OS-FLV-EXT-DATA:ephemeral  | 0                                    |
| access_project_ids         | None                                 |
| disk                       | 100                                  |
| id                         | a27b14dd-c42d-4084-9b6a-225555876f68 |
| name                       | m1.small-gpu                         |
| os-flavor-access:is_public | True                                 |
| properties                 | resources:VGPU='1'                   |
| ram                        | 8192                                 |
| rxtx_factor                | 1.0                                  |
| swap                       |                                      |
| vcpus                      | 6                                    |
+----------------------------+--------------------------------------+

Copy to Clipboard

Toggle word wrap

8.2.6. Launch and test a vGPU instance
Copy link

After you prepare the guest image and create the GPU flavor, you launch the GPU-enabled instance and install the NVIDIA guest driver from the ISO that you attached to the custom image in Section 8.2.4, “Build a custom GPU guest image”.

Launch a new instance with the GPU flavor that you created in Section 8.2.5, “Create a vGPU profile for instances”. For example:

openstack server create --flavor m1.small-gpu --image rhelgpu --security-group web --nic net-id=internal0 --key-name lambda instance0

(overcloud) [stack@virtlab-director2 ~]$ openstack server create --flavor m1.small-gpu --image rhelgpu --security-group web --nic net-id=internal0 --key-name lambda instance0

Copy to Clipboard

Toggle word wrap

Log in to the instance and install the NVIDIA GRID driver. The exact installer name is available from the files that you attached to the guest image. For example:
```
sh NVIDIA-Linux-x86_64-430.24-grid.run
```
```
[root@instance0 tmp]# sh NVIDIA-Linux-x86_64-430.24-grid.run
```
Copy to Clipboard Toggle word wrap

Check the status of the NVIDIA GRID daemon.

systemctl status nvidia-gridd.service
● nvidia-gridd.service - NVIDIA Grid Daemon
   Loaded: loaded (/usr/lib/systemd/system/nvidia-gridd.service; enabled; vendor preset: disabled)
   Active: active (running) since Wed 2018-10-31 20:00:41 EDT; 15s ago
  Process: 18143 ExecStopPost=/bin/rm -rf /var/run/nvidia-gridd (code=exited, status=0/SUCCESS)
  Process: 18145 ExecStart=/usr/bin/nvidia-gridd (code=exited, status=0/SUCCESS)
 Main PID: 18146 (nvidia-gridd)
   CGroup: /system.slice/nvidia-gridd.service
           └─18146 /usr/bin/nvidia-gridd

Oct 31 20:00:41 instance0 systemd[1]: Stopped NVIDIA Grid Daemon.
Oct 31 20:00:41 instance0 systemd[1]: Starting NVIDIA Grid Daemon...
Oct 31 20:00:41 instance0 systemd[1]: Started NVIDIA Grid Daemon.
Oct 31 20:00:41 instance0 nvidia-gridd[18146]: Started (18146)
Oct 31 20:00:41 instance0 nvidia-gridd[18146]: Ignore Service Provider Licensing.
Oct 31 20:00:41 instance0 nvidia-gridd[18146]: Calling load_byte_array(tra)
Oct 31 20:00:42 instance0 nvidia-gridd[18146]: Acquiring license for GRID vGPU Edition.
Oct 31 20:00:42 instance0 nvidia-gridd[18146]: Calling load_byte_array(tra)
Oct 31 20:00:45 instance0 nvidia-gridd[18146]: License acquired successfully. (Info: http://dhcp158-15.virt.lab.eng.bos.redhat.com:7070/request; GRID-Virtual-WS,2.0)

[root@instance0 nvidia]# systemctl status nvidia-gridd.service
● nvidia-gridd.service - NVIDIA Grid Daemon
   Loaded: loaded (/usr/lib/systemd/system/nvidia-gridd.service; enabled; vendor preset: disabled)
   Active: active (running) since Wed 2018-10-31 20:00:41 EDT; 15s ago
  Process: 18143 ExecStopPost=/bin/rm -rf /var/run/nvidia-gridd (code=exited, status=0/SUCCESS)
  Process: 18145 ExecStart=/usr/bin/nvidia-gridd (code=exited, status=0/SUCCESS)
 Main PID: 18146 (nvidia-gridd)
   CGroup: /system.slice/nvidia-gridd.service
           └─18146 /usr/bin/nvidia-gridd

Oct 31 20:00:41 instance0 systemd[1]: Stopped NVIDIA Grid Daemon.
Oct 31 20:00:41 instance0 systemd[1]: Starting NVIDIA Grid Daemon...
Oct 31 20:00:41 instance0 systemd[1]: Started NVIDIA Grid Daemon.
Oct 31 20:00:41 instance0 nvidia-gridd[18146]: Started (18146)
Oct 31 20:00:41 instance0 nvidia-gridd[18146]: Ignore Service Provider Licensing.
Oct 31 20:00:41 instance0 nvidia-gridd[18146]: Calling load_byte_array(tra)
Oct 31 20:00:42 instance0 nvidia-gridd[18146]: Acquiring license for GRID vGPU Edition.
Oct 31 20:00:42 instance0 nvidia-gridd[18146]: Calling load_byte_array(tra)
Oct 31 20:00:45 instance0 nvidia-gridd[18146]: License acquired successfully. (Info: http://dhcp158-15.virt.lab.eng.bos.redhat.com:7070/request; GRID-Virtual-WS,2.0)

Copy to Clipboard

Toggle word wrap

Chapter 8. Configuring Virtual GPU for guest instances

8.1. Supported configurations and limitations
Copy link

8.2. Deploying NVIDIA GRID vGPU
Copy link

8.2.1. Build a custom GPU overcloud image
Copy link

8.2.2. Configure the vGPU role, profile, and flavor
Copy link

8.2.3. Prepare configuration files and deploying the overcloud
Copy link

8.2.4. Build a custom GPU guest image
Copy link

8.2.5. Create a vGPU profile for instances
Copy link

8.2.6. Launch and test a vGPU instance
Copy link

Learn

Try, buy, & sell

Communities

About Red Hat Documentation

Making open source more inclusive

About Red Hat

Theme

Red Hat legal and privacy links

Red Hat legal and privacy links

Chapter 8. Configuring Virtual GPU for guest instances

8.1. Supported configurations and limitationsCopy linkLink copied to clipboard!

8.2. Deploying NVIDIA GRID vGPUCopy linkLink copied to clipboard!

8.2.1. Build a custom GPU overcloud imageCopy linkLink copied to clipboard!

8.2.2. Configure the vGPU role, profile, and flavorCopy linkLink copied to clipboard!

8.2.3. Prepare configuration files and deploying the overcloudCopy linkLink copied to clipboard!

8.2.4. Build a custom GPU guest imageCopy linkLink copied to clipboard!

8.2.5. Create a vGPU profile for instancesCopy linkLink copied to clipboard!

8.2.6. Launch and test a vGPU instanceCopy linkLink copied to clipboard!

Learn

Try, buy, & sell

Communities

About Red Hat Documentation

Making open source more inclusive

About Red Hat

Theme

Red Hat legal and privacy links

Red Hat legal and privacy links

8.1. Supported configurations and limitations
Copy link

8.2. Deploying NVIDIA GRID vGPU
Copy link

8.2.1. Build a custom GPU overcloud image
Copy link

8.2.2. Configure the vGPU role, profile, and flavor
Copy link

8.2.3. Prepare configuration files and deploying the overcloud
Copy link

8.2.4. Build a custom GPU guest image
Copy link

8.2.5. Create a vGPU profile for instances
Copy link

8.2.6. Launch and test a vGPU instance
Copy link