Chapter 7. Enabling GPU support in OpenShift AI
Optionally, to ensure that your data scientists can use compute-heavy workloads in their models, you can enable graphics processing units (GPUs) in OpenShift AI.
If you are using OpenShift AI in a disconnected self-managed environment, see Enabling GPU support in OpenShift AI instead.
Prerequisites
- You have logged in to your OpenShift Container Platform cluster.
-
You have the
cluster-admin
role in your OpenShift Container Platform cluster.
Procedure
- To enable GPU support on an OpenShift cluster, follow the instructions here: NVIDIA GPU Operator on Red Hat OpenShift Container Platform in the NVIDIA documentation.
Delete the migration-gpu-status ConfigMap.
- In the OpenShift Container Platform web console, switch to the Administrator perspective.
- Set the Project to All Projects or redhat-ods-applications to ensure you can see the appropriate ConfigMap.
- Search for the migration-gpu-status ConfigMap.
Click the action menu (⋮) and select Delete ConfigMap from the list.
The Delete ConfigMap dialog appears.
- Inspect the dialog and confirm that you are deleting the correct ConfigMap.
- Click Delete.
Restart the dashboard replicaset.
- In the OpenShift Container Platform web console, switch to the Administrator perspective.
-
Click Workloads
Deployments. - Set the Project to All Projects or redhat-ods-applications to ensure you can see the appropriate deployment.
- Search for the rhods-dashboard deployment.
- Click the action menu (⋮) and select Restart Rollout from the list.
- Wait until the Status column indicates that all pods in the rollout have fully restarted.
Verification
-
The NVIDIA GPU Operator appears on the Operators
Installed Operators page in the OpenShift Container Platform web console. -
The reset migration-gpu-status instance is present in the Instances tab on the
AcceleratorProfile
custom resource definition (CRD) details page.
After installing the NVIDIA GPU Operator, create an accelerator profile as described in Working with accelerator profiles.