Chapter 8. Enabling NVIDIA GPUs
Before you can use NVIDIA GPUs in OpenShift AI, you must install the NVIDIA GPU Operator.
If you are using OpenShift AI in a disconnected self-managed environment, see Enabling NVIDIA GPUs instead.
Prerequisites
- You have logged in to your OpenShift cluster.
-
You have the
cluster-admin
role in your OpenShift cluster.
Procedure
- To enable GPU support on an OpenShift cluster, follow the instructions here: NVIDIA GPU Operator on Red Hat OpenShift Container Platform in the NVIDIA documentation.
Delete the migration-gpu-status ConfigMap.
- In the OpenShift web console, switch to the Administrator perspective.
- Set the Project to All Projects or redhat-ods-applications to ensure you can see the appropriate ConfigMap.
- Search for the migration-gpu-status ConfigMap.
Click the action menu (⋮) and select Delete ConfigMap from the list.
The Delete ConfigMap dialog appears.
- Inspect the dialog and confirm that you are deleting the correct ConfigMap.
- Click Delete.
Restart the dashboard replicaset.
- In the OpenShift web console, switch to the Administrator perspective.
-
Click Workloads
Deployments. - Set the Project to All Projects or redhat-ods-applications to ensure you can see the appropriate deployment.
- Search for the rhods-dashboard deployment.
- Click the action menu (⋮) and select Restart Rollout from the list.
- Wait until the Status column indicates that all pods in the rollout have fully restarted.
Verification
-
The NVIDIA GPU Operator appears on the Operators
Installed Operators page in the OpenShift web console. -
The reset migration-gpu-status instance is present on the Instances tab on the
AcceleratorProfile
custom resource definition (CRD) details page.
In OpenShift AI 2-latest, Red Hat supports the use of accelerators within the same cluster only. Red Hat does not support remote direct memory access (RDMA) between accelerators, or the use of accelerators across a network, for example, by using technology such as NVIDIA GPUDirect or NVLink.
After installing the NVIDIA GPU Operator, create an accelerator profile as described in Working with accelerator profiles.