2장. NVIDIA GPU architecture
NVIDIA supports the use of graphics processing unit (GPU) resources on OpenShift Container Platform. OpenShift Container Platform is a security-focused and hardened Kubernetes platform developed and supported by Red Hat for deploying and managing Kubernetes clusters at scale. OpenShift Container Platform includes enhancements to Kubernetes so that users can easily configure and use NVIDIA GPU resources to accelerate workloads.
The NVIDIA GPU Operator uses the Operator framework within OpenShift Container Platform to manage the full lifecycle of NVIDIA software components required to run GPU-accelerated workloads.
These components include the NVIDIA drivers (to enable CUDA), the Kubernetes device plugin for GPUs, the NVIDIA Container Toolkit, automatic node tagging using GPU feature discovery (GFD), DCGM-based monitoring, and others.
The NVIDIA GPU Operator is only supported by NVIDIA. For more information about obtaining support from NVIDIA, see Obtaining Support from NVIDIA.
2.1. NVIDIA GPU prerequisites 링크 복사링크가 클립보드에 복사되었습니다!
- A working OpenShift cluster with at least one GPU worker node.
-
Access to the OpenShift cluster as a
cluster-adminto perform the required steps. -
OpenShift CLI (
oc) is installed. -
The node feature discovery (NFD) Operator is installed and a
nodefeaturediscoveryinstance is created.