此内容没有您所选择的语言版本。
Chapter 8. Configuring Virtual GPU for guest instances
To support GPU-based rendering on your guest instances, you can define and manage virtual GPU (vGPU) resources according to your available physical GPU devices and your hypervisor type. This configuration allows you to divide the rendering workloads between all your physical GPU devices more effectively, and to have more control over scheduling, tuning, and monitoring your vGPU-enabled guest instances.
To enable vGPU in OpenStack Compute, you create flavors that you can use to request Red Hat Enterprise Linux guests with vGPU devices, and then you assign those flavors to Compute instances. Each instance can then support GPU workloads with virtual GPU devices that correspond to the physical GPU devices.
The OpenStack Compute service tracks the number and size of the vGPU devices that are available on each host, schedules guests to these hosts based on the flavor, attaches the devices, and monitors usage on an ongoing basis. In case the guest is no longer available, OpenStack Compute adds the vGPU devices back to the available pool.
8.1. Supported configurations and limitations 复制链接链接已复制到粘贴板!
This section lists currently supported virtual GPU (vGPU) graphics cards, as well as considerations and limitations for setting up vGPU devices in OpenStack Compute.
Supported GPU cards
For a list of supported NVIDIA GPU cards, see Virtual GPU Software Supported Products on the NVIDIA website.
Limitations and considerations
- You can use only one vGPU type for each Compute host.
- You can use only one vGPU resource for each Compute instance.
- Live migration of vGPU between hosts is not supported.
- Suspend operations on a vGPU-enabled guest is not supported due to a libvirt limitation. Instead, you can snapshot or shelve the instance.
- Resize and cold migration operations on an instance with a vGPU flavor does not automatically re-allocate the vGPU resources to the instance. After you resize or migrate the instance, you must rebuild it manually to re-allocate the vGPU resources.
- By default, vGPU types on Compute hosts are not exposed to API users. To allow access, you can add the hosts to a host aggregate. For general information about host aggregates, see Section 4.4, “Manage Host Aggregates”
- If you use NVIDIA accelerator hardware, you must comply with the NVIDIA licensing requirements. For example, NVIDIA vGPU GRID requires a licensing server. For more information about the NVIDIA licensing requirements, see the NVIDIA License Server Release Notes web page.
8.2. Deploying NVIDIA GRID vGPU 复制链接链接已复制到粘贴板!
This section describes how to deploy virtual GPU (vGPU) for NVIDIA devices on your Compute node hosts and on your guest instances. This end-to-end process includes the following steps:
- Building a custom GPU-enabled overcloud image
- Preparing the GPU role, profile, and flavor
- Configuring and deploying the overcloud
- Building a custom vGPU-enabled guest image
- Preparing the vGPU flavor for the instances
- Launching and configuring the vGPU-enabled instances
Prerequisites
Before you deploy NVIDIA GRID vGPU on your overcloud, make sure that your environment meets the following requirements:
- Your deployment must meet the requirements for vGPU devices, as described in Section 8.1, “Supported configurations and limitations”.
- Your undercloud must be deployed and the default overcloud image must be uploaded to Glance.
- You must comply with the NVIDIA GRID licensing requirements and you must have the URL of your self-hosted license server. For more information about the NVIDIA licensing requirements and self-hosted server installation, see the NVIDIA License Server Release Notes web page.
8.2.1. Build a custom GPU overcloud image 复制链接链接已复制到粘贴板!
Perform the following steps on the undercloud to install the NVIDIA GRID host driver on an overcloud Compute image and upload the image to Glance.
Copy the overcloud image and add the
gpu
suffix to the copied image.cp overcloud-full.qcow2 overcloud-full-gpu.qcow2
$ cp overcloud-full.qcow2 overcloud-full-gpu.qcow2
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Install an ISO image generator tool from YUM.
sudo yum install genisoimage -y
$ sudo yum install genisoimage -y
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Download the NVIDIA GRID host driver RPM package that corresponds to your GPU device from the NVIDIA website. To determine which driver you need, see the NVIDIA Driver Downloads Portal.
NoteYou must be a registered NVIDIA customer to download the drivers from the portal.
Create an ISO image from the driver RPM package and save the image in the nvidia-guest directory. You will use this ISO image to install the driver on your Compute nodes in subsequent steps.
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Create a driver installation script for your Compute nodes. This script installs the NVIDIA GRID host driver on each Compute node that you run it on. In this example the script is named install_nvidia.sh.
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Customize the overcloud image by attaching the ISO image that you generated and running the driver installation script that you created. For example:
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Relabel the customized image with SELinux.
virt-customize -a overcloud-full-gpu.qcow2 --selinux-relabel
$ virt-customize -a overcloud-full-gpu.qcow2 --selinux-relabel [ 0.0] Examining the guest ... [ 2.2] Setting a random seed [ 2.2] SELinux relabelling [ 27.4] Finishing off
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Prepare the custom image files for a Glance upload. For example:
mkdir /var/image/x86_64/image guestmount -a overcloud-full-gpu.qcow2 -i --ro image cp image/boot/vmlinuz-3.10.0-862.14.4.el8.x86_64 ./overcloud-full-gpu.vmlinuz cp image/boot/initramfs-3.10.0-862.14.4.el8.x86_64.img ./overcloud-full-gpu.initrd
$ mkdir /var/image/x86_64/image $ guestmount -a overcloud-full-gpu.qcow2 -i --ro image $ cp image/boot/vmlinuz-3.10.0-862.14.4.el8.x86_64 ./overcloud-full-gpu.vmlinuz $ cp image/boot/initramfs-3.10.0-862.14.4.el8.x86_64.img ./overcloud-full-gpu.initrd
Copy to Clipboard Copied! Toggle word wrap Toggle overflow From the undercloud, upload the custom image to Glance.
(undercloud) $ openstack overcloud image upload --update-existing --os-image-name overcloud-full-gpu.qcow2
(undercloud) $ openstack overcloud image upload --update-existing --os-image-name overcloud-full-gpu.qcow2
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
8.2.2. Configure the vGPU role, profile, and flavor 复制链接链接已复制到粘贴板!
After you build the custom GPU overcloud image, you prepare the Compute nodes for GPU-enabled overcloud deployment. This section describes how to configure the role, profile, and flavor for the GPU-enabled Compute nodes.
Create the new
ComputeGPU
role file by copying the file /home/stack/templates/roles/Compute.yaml to /home/stack/templates/roles/ComputeGPU.yaml and editing the following file sections:Expand Table 8.1. ComputeGPU role file edits Section Current value New value Role comment
Role: Compute
Role: ComputeGpu
Role name
name: Compute
name: ComputeGpu
Description
Basic Compute Node role
GPU role
CountDefault
1
0
ImageDefault
overcloud-full
overcloud-gpu
HostnameFormatDefault
-compute-
-computegpu-
deprecated_nic_config_name
compute.yaml
compute-gpu.yaml
Generate a new roles data file named gpu_roles_data.yaml that includes the
Controller
,Compute
, andComputeGpu
roles.openstack overcloud roles generate -o /home/stack/templates/gpu_roles_data.yaml Controller Compute ComputeGpu
(undercloud) [stack@director templates]$ openstack overcloud roles generate -o /home/stack/templates/gpu_roles_data.yaml Controller Compute ComputeGpu
Copy to Clipboard Copied! Toggle word wrap Toggle overflow The following example shows the
ComputeGpu
role details:Copy to Clipboard Copied! Toggle word wrap Toggle overflow Create the
compute-vgpu-nvidia
flavor to tag nodes that you want to designate for vGPU workloads.Copy to Clipboard Copied! Toggle word wrap Toggle overflow Tag each node that you want to designate for GPU workloads with the
compute-vgpu-nvidia
profile.openstack baremetal node set --property capabilities='profile:compute-vgpu-nvidia,boot_option:local' 9d07a673-b6bf-4a20-a538-3b05e8fa2c13
(undercloud) [stack@director templates]$ openstack baremetal node set --property capabilities='profile:compute-vgpu-nvidia,boot_option:local' 9d07a673-b6bf-4a20-a538-3b05e8fa2c13
Copy to Clipboard Copied! Toggle word wrap Toggle overflow - Register the overcloud and run the standard hardware introspection on your nodes.
After you prepare your overcloud for vGPU, you retrieve and assign the vGPU type that corresponds to the physical GPU device in your environment and prepare the configuration templates.
Configure the vGPU type for your NVIDIA device
To determine the vGPU type for your physical GPU device, you must check the available device type from a different machine. You can perform these steps from any temporary Red Hat Enterprise Linux unused Compute node, and then delete the node. You do not need to deploy the overcloud to perform these steps.
- Install Red Hat Enterprise Linux and the NVIDIA GRID driver on one Compute node and launch the node. For information on installing the NVIDIA GRID driver, see Section 8.2.1, “Build a custom GPU overcloud image”.
On the Compute node, locate the vGPU type of the physical GPU device that you want to enable. For libvirt, virtual GPUs are seen as mediated devices, or
mdev
type devices. To discover the supportedmdev
devices, run the following command:ls /sys/class/mdev_bus/0000\:06\:00.0/mdev_supported_types/ cat /sys/class/mdev_bus/0000\:06\:00.0/mdev_supported_types/nvidia-18/description
[root@overcloud-computegpu-0 ~]# ls /sys/class/mdev_bus/0000\:06\:00.0/mdev_supported_types/ nvidia-11 nvidia-12 nvidia-13 nvidia-14 nvidia-15 nvidia-16 nvidia-17 nvidia-18 nvidia-19 nvidia-20 nvidia-21 nvidia-210 nvidia-22 [root@overcloud-computegpu-0 ~]# cat /sys/class/mdev_bus/0000\:06\:00.0/mdev_supported_types/nvidia-18/description num_heads=4, frl_config=60, framebuffer=2048M, max_resolution=4096x2160, max_instance=4
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
Prepare the configuration templates
Add the compute-gpu.yaml file to the network-environment.yaml file. For example:
resource_registry: OS::TripleO::Compute::Net::SoftwareConfig: /home/stack/templates/nic-configs/compute.yaml OS::TripleO::ComputeGpu::Net::SoftwareConfig: /home/stack/templates/nic-configs/compute-gpu.yaml OS::TripleO::Controller::Net::SoftwareConfig: /home/stack/templates/nic-configs/controller.yaml #OS::TripleO::AllNodes::Validation: OS::Heat::None
resource_registry: OS::TripleO::Compute::Net::SoftwareConfig: /home/stack/templates/nic-configs/compute.yaml OS::TripleO::ComputeGpu::Net::SoftwareConfig: /home/stack/templates/nic-configs/compute-gpu.yaml OS::TripleO::Controller::Net::SoftwareConfig: /home/stack/templates/nic-configs/controller.yaml #OS::TripleO::AllNodes::Validation: OS::Heat::None
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Add the
OvercloudComputeGpuFlavor
flavor to the node-info.yaml file. For example:Copy to Clipboard Copied! Toggle word wrap Toggle overflow Replace the
NTP_SERVER_URL
variable with the address of your NTP server.Create a gpu.yaml file with the vGPU type that you retrieved for your GPU device. For example:
parameter_defaults: ComputeGpuExtraConfig: nova::compute::vgpu::enabled_vgpu_types: - nvidia-18
parameter_defaults: ComputeGpuExtraConfig: nova::compute::vgpu::enabled_vgpu_types: - nvidia-18
Copy to Clipboard Copied! Toggle word wrap Toggle overflow NoteOnly one virtual GPU type is supported per physical GPU. If you specify multiple vGPU types in this property, only the first type is used.
Deploy the overcloud
Run the overcloud deploy
command with the custom GPU image and the configuration templates that you prepared.
openstack overcloud deploy -r /home/stack/templates/nvidia/gpu_roles_data.yaml -e /home/stack/templates/nvidia/gpu.yaml
$ openstack overcloud deploy -r /home/stack/templates/nvidia/gpu_roles_data.yaml -e /home/stack/templates/nvidia/gpu.yaml
8.2.4. Build a custom GPU guest image 复制链接链接已复制到粘贴板!
After you deploy the overcloud with GPU-enabled Compute nodes, you build a custom vGPU-enabled instance image with the NVIDIA GRID guest driver and license file.
Create the NVIDIA GRID license file
In the overcloud host, create a gridd.conf file that contains the NVIDIA GRID license information. Use the license server information from your self-hosted NVIDIA GRID license server that you installed previously. For example:
Prepare the guest image and the NVIDIA GRID guest driver
Download the NVIDIA GRID guest driver RPM package that corresponds to your GPU device from the NVIDIA website. To determine which driver you need, see the NVIDIA Driver Downloads Portal.
NoteYou must be a registered NVIDIA customer to download the drivers from the portal.
Create an ISO image from the driver RPM package. You will use this ISO image to install the driver on your Compute nodes in subsequent steps.
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Copy the guest image that you want to customize for GPU instances. For example:
cp rhel-server-8.0-update-4-x86_64-kvm.qcow2 rhel-server-8.0-update-4-x86_64-kvm-gpu.qcow2
[root@virtlab607 guest]# cp rhel-server-8.0-update-4-x86_64-kvm.qcow2 rhel-server-8.0-update-4-x86_64-kvm-gpu.qcow2
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
Create and run the customization script
By default, you must install the NVIDIA GRID drivers on each instance that you want to designate for GPU workloads. This process involves modifying the guest image, rebooting, and then installing the guest drivers. You can create a script to automate this process for the guest instances.
Create a script named nvidia-prepare-guest.sh to enable the required repositories, update the instance to the latest kernel, install the NVIDIA GRID guest driver, and attach the gridd.conf license file to the instance.
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Run the script on the guest image that you copied previously. For example:
virt-customize --attach nvidia-guest.iso -a rhel-server-8.0-update-4-x86_64-kvm-gpu.qcow2 -v --run nvidia-prepare-guest.sh
$ virt-customize --attach nvidia-guest.iso -a rhel-server-8.0-update-4-x86_64-kvm-gpu.qcow2 -v --run nvidia-prepare-guest.sh
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Upload the custom guest image to Glance.
openstack image create rhelgpu --file /var/images/x86_64/rhel-server-8.0-update-4-x86_64-kvm-gpu.qcow2 --disk-format qcow2 --container-format bare --public
(overcloud) [stack@director ~]$ openstack image create rhelgpu --file /var/images/x86_64/rhel-server-8.0-update-4-x86_64-kvm-gpu.qcow2 --disk-format qcow2 --container-format bare --public
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
8.2.5. Create a vGPU profile for instances 复制链接链接已复制到粘贴板!
After you build the custom guest image, you create a GPU flavor and assign a vGPU resource to that flavor. When you later launch instances with this flavor, the vGPU resource will be available to each instance.
You can assign only one vGPU resource for each instance.
Create an NVIDIA GPU flavor to tag each instance that you want to designate for GPU workloads. For example:
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Assign a vGPU resource to the flavor that you created. Currently you can assign only one vGPU for each instance.
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
8.2.6. Launch and test a vGPU instance 复制链接链接已复制到粘贴板!
After you prepare the guest image and create the GPU flavor, you launch the GPU-enabled instance and install the NVIDIA guest driver from the ISO that you attached to the custom image in Section 8.2.4, “Build a custom GPU guest image”.
Launch a new instance with the GPU flavor that you created in Section 8.2.5, “Create a vGPU profile for instances”. For example:
openstack server create --flavor m1.small-gpu --image rhelgpu --security-group web --nic net-id=internal0 --key-name lambda instance0
(overcloud) [stack@virtlab-director2 ~]$ openstack server create --flavor m1.small-gpu --image rhelgpu --security-group web --nic net-id=internal0 --key-name lambda instance0
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Log in to the instance and install the NVIDIA GRID driver. The exact installer name is available from the files that you attached to the guest image. For example:
sh NVIDIA-Linux-x86_64-430.24-grid.run
[root@instance0 tmp]# sh NVIDIA-Linux-x86_64-430.24-grid.run
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Check the status of the NVIDIA GRID daemon.
Copy to Clipboard Copied! Toggle word wrap Toggle overflow