Search

Chapter 8. Enabling NVIDIA GPUs

download PDF

Before you can use NVIDIA GPUs in OpenShift AI, you must install the NVIDIA GPU Operator.

Important

If you are using OpenShift AI in a disconnected self-managed environment, see Enabling NVIDIA GPUs instead.

Prerequisites

  • You have logged in to your OpenShift cluster.
  • You have the cluster-admin role in your OpenShift cluster.

Procedure

  1. To enable GPU support on an OpenShift cluster, follow the instructions here: NVIDIA GPU Operator on Red Hat OpenShift Container Platform in the NVIDIA documentation.
  2. Delete the migration-gpu-status ConfigMap.

    1. In the OpenShift web console, switch to the Administrator perspective.
    2. Set the Project to All Projects or redhat-ods-applications to ensure you can see the appropriate ConfigMap.
    3. Search for the migration-gpu-status ConfigMap.
    4. Click the action menu (⋮) and select Delete ConfigMap from the list.

      The Delete ConfigMap dialog appears.

    5. Inspect the dialog and confirm that you are deleting the correct ConfigMap.
    6. Click Delete.
  3. Restart the dashboard replicaset.

    1. In the OpenShift web console, switch to the Administrator perspective.
    2. Click Workloads Deployments.
    3. Set the Project to All Projects or redhat-ods-applications to ensure you can see the appropriate deployment.
    4. Search for the rhods-dashboard deployment.
    5. Click the action menu (⋮) and select Restart Rollout from the list.
    6. Wait until the Status column indicates that all pods in the rollout have fully restarted.

Verification

  • The NVIDIA GPU Operator appears on the Operators Installed Operators page in the OpenShift web console.
  • The reset migration-gpu-status instance is present on the Instances tab on the AcceleratorProfile custom resource definition (CRD) details page.
Note

In OpenShift AI 2-latest, Red Hat supports the use of accelerators within the same cluster only. Red Hat does not support remote direct memory access (RDMA) between accelerators, or the use of accelerators across a network, for example, by using technology such as NVIDIA GPUDirect or NVLink.

After installing the NVIDIA GPU Operator, create an accelerator profile as described in Working with accelerator profiles.

Red Hat logoGithubRedditYoutubeTwitter

Learn

Try, buy, & sell

Communities

About Red Hat Documentation

We help Red Hat users innovate and achieve their goals with our products and services with content they can trust.

Making open source more inclusive

Red Hat is committed to replacing problematic language in our code, documentation, and web properties. For more details, see the Red Hat Blog.

About Red Hat

We deliver hardened solutions that make it easier for enterprises to work across platforms and environments, from the core datacenter to the network edge.

© 2024 Red Hat, Inc.