Accueiil
Products
Red Hat OpenShift Service on AWS
4
Architecture
Chapter 5. NVIDIA GPU architecture overview

Ce contenu n'est pas disponible dans la langue sélectionnée.

Chapter 5. NVIDIA GPU architecture overview

NVIDIA supports the use of graphics processing unit (GPU) resources on Red Hat OpenShift Service on AWS. Red Hat OpenShift Service on AWS is a security-focused and hardened Kubernetes platform developed and supported by Red Hat for deploying and managing Kubernetes clusters at scale. Red Hat OpenShift Service on AWS includes enhancements to Kubernetes so that users can easily configure and use NVIDIA GPU resources to accelerate workloads.

The NVIDIA GPU Operator leverages the Operator framework within Red Hat OpenShift Service on AWS to manage the full lifecycle of NVIDIA software components required to run GPU-accelerated workloads.

These components include the NVIDIA drivers (to enable CUDA), the Kubernetes device plugin for GPUs, the NVIDIA Container Toolkit, automatic node tagging using GPU feature discovery (GFD), DCGM-based monitoring, and others.

Note

The NVIDIA GPU Operator is only supported by NVIDIA. For more information about obtaining support from NVIDIA, see Obtaining Support from NVIDIA.

5.1. NVIDIA GPU prerequisites
Copier lien

A working OpenShift cluster with at least one GPU worker node.
Access to the OpenShift cluster as a cluster-admin to perform the required steps.
OpenShift CLI (oc) is installed.
The node feature discovery (NFD) Operator is installed and a nodefeaturediscovery instance is created.

5.2. GPUs and ROSA
Copier lien

You can deploy Red Hat OpenShift Service on AWS on NVIDIA GPU instance types.

It is important that this compute instance is a GPU-accelerated compute instance and that the GPU type matches the list of supported GPUs from NVIDIA AI Enterprise. For example, T4, V100, and A100 are part of this list.

You can choose one of the following methods to access the containerized GPUs:

GPU passthrough to access and use GPU hardware within a virtual machine (VM).
GPU (vGPU) time slicing when the entire GPU is not required.

5.4. NVIDIA GPU features for Red Hat OpenShift Service on AWS
Copier lien

NVIDIA Container Toolkit

NVIDIA Container Toolkit enables you to create and run GPU-accelerated containers. The toolkit includes a container runtime library and utilities to automatically configure containers to use NVIDIA GPUs.

NVIDIA AI Enterprise

NVIDIA AI Enterprise is an end-to-end, cloud-native suite of AI and data analytics software optimized, certified, and supported with NVIDIA-Certified systems.

NVIDIA AI Enterprise includes support for Red Hat Red Hat OpenShift Service on AWS. The following installation methods are supported:

Red Hat OpenShift Service on AWS on bare metal or VMware vSphere with GPU Passthrough.
Red Hat OpenShift Service on AWS on VMware vSphere with NVIDIA vGPU.

GPU Feature Discovery

NVIDIA GPU Feature Discovery for Kubernetes is a software component that enables you to automatically generate labels for the GPUs available on a node. GPU Feature Discovery uses node feature discovery (NFD) to perform this labeling.

The Node Feature Discovery Operator (NFD) manages the discovery of hardware features and configurations in an OpenShift Container Platform cluster by labeling nodes with hardware-specific information. NFD labels the host with node-specific attributes, such as PCI cards, kernel, OS version, and so on.

You can find the NFD Operator in the Operator Hub by searching for “Node Feature Discovery”.

NVIDIA GPU Operator with OpenShift Virtualization

Up until this point, the GPU Operator only provisioned worker nodes to run GPU-accelerated containers. Now, the GPU Operator can also be used to provision worker nodes for running GPU-accelerated virtual machines (VMs).

You can configure the GPU Operator to deploy different software components to worker nodes depending on which GPU workload is configured to run on those nodes.

GPU Monitoring dashboard

You can install a monitoring dashboard to display GPU usage information on the cluster Observe page in the Red Hat OpenShift Service on AWS web console. GPU utilization information includes the number of available GPUs, power consumption (in watts), temperature (in degrees Celsius), utilization (in percent), and other metrics for each GPU.

Retour au début

Ce contenu n'est pas disponible dans la langue sélectionnée.

Chapter 5. NVIDIA GPU architecture overview

5.1. NVIDIA GPU prerequisites
Copier lien

5.2. GPUs and ROSA
Copier lien

5.4. NVIDIA GPU features for Red Hat OpenShift Service on AWS
Copier lien

Apprendre

Essayez, achetez et vendez

Communautés

À propos de la documentation Red Hat

Rendre l’open source plus inclusif

À propos de Red Hat

Theme

Red Hat legal and privacy links

Red Hat legal and privacy links

Ce contenu n'est pas disponible dans la langue sélectionnée.

Chapter 5. NVIDIA GPU architecture overview

5.1. NVIDIA GPU prerequisitesCopier lienLien copié sur presse-papiers!

5.2. GPUs and ROSACopier lienLien copié sur presse-papiers!

5.3. GPU sharing methodsCopier lienLien copié sur presse-papiers!

5.3.1. CUDA streamsCopier lienLien copié sur presse-papiers!

5.3.2. Time-slicingCopier lienLien copié sur presse-papiers!

5.3.3. CUDA Multi-Process ServiceCopier lienLien copié sur presse-papiers!

5.3.4. Multi-instance GPUCopier lienLien copié sur presse-papiers!

5.3.5. Virtualization with vGPUCopier lienLien copié sur presse-papiers!

5.4. NVIDIA GPU features for Red Hat OpenShift Service on AWSCopier lienLien copié sur presse-papiers!

Apprendre

Essayez, achetez et vendez

Communautés

À propos de la documentation Red Hat

Rendre l’open source plus inclusif

À propos de Red Hat

Theme

Red Hat legal and privacy links

Red Hat legal and privacy links

5.1. NVIDIA GPU prerequisites
Copier lien

5.2. GPUs and ROSA
Copier lien

5.3. GPU sharing methods
Copier lien

5.3.1. CUDA streams
Copier lien

5.3.2. Time-slicing
Copier lien

5.3.3. CUDA Multi-Process Service
Copier lien

5.3.4. Multi-instance GPU
Copier lien

5.3.5. Virtualization with vGPU
Copier lien

5.4. NVIDIA GPU features for Red Hat OpenShift Service on AWS
Copier lien