Ce contenu n'est pas disponible dans la langue sélectionnée.

Chapter 3. Enabling NVIDIA GPUs


Before you can use NVIDIA GPUs in OpenShift AI, you must install the NVIDIA GPU Operator.

Important

The NVIDIA GPU add-on is no longer supported. Instead, enable GPUs by installing the NVIDIA GPU Operator. If your deployment has a previously-installed NVIDIA GPU add-on, before you install the NVIDIA GPU Operator, use Red Hat OpenShift Cluster Manager to uninstall the NVIDIA GPU add-on from your cluster.

Prerequisites

  • You have logged in to your OpenShift cluster.
  • You have the cluster-admin role in your OpenShift cluster.
  • You have installed an NVIDIA GPU and confirmed that it is detected in your environment.

Procedure

  1. To enable GPU support on an OpenShift cluster, follow the instructions here: NVIDIA GPU Operator on Red Hat OpenShift Container Platform in the NVIDIA documentation.

    Important

    After you install the Node Feature Discovery (NFD) Operator, you must create an instance of NodeFeatureDiscovery. In addition, after you install the NVIDIA GPU Operator, you must create a ClusterPolicy and populate it with default values.

  2. Delete the migration-gpu-status ConfigMap.

    1. In the OpenShift web console, switch to the Administrator perspective.
    2. Set the Project to All Projects or redhat-ods-applications to ensure you can see the appropriate ConfigMap.
    3. Search for the migration-gpu-status ConfigMap.
    4. Click the action menu (⋮) and select Delete ConfigMap from the list.

      The Delete ConfigMap dialog opens.

    5. Inspect the dialog and confirm that you are deleting the correct ConfigMap.
    6. Click Delete.
  3. Restart the dashboard replicaset.

    1. In the OpenShift web console, switch to the Administrator perspective.
    2. Click Workloads Deployments.
    3. Set the Project to All Projects or redhat-ods-applications to ensure you can see the appropriate deployment.
    4. Search for the rhods-dashboard deployment.
    5. Click the action menu (⋮) and select Restart Rollout from the list.
    6. Wait until the Status column indicates that all pods in the rollout have fully restarted.

Verification

  • The reset migration-gpu-status instance is present on the Instances tab on the AcceleratorProfile custom resource definition (CRD) details page.
  • From the Administrator perspective, go to the Operators Installed Operators page. Confirm that the following Operators appear:

    • NVIDIA GPU
    • Node Feature Discovery (NFD)
    • Kernel Module Management (KMM)
  • The GPU is correctly detected a few minutes after full installation of the Node Feature Discovery (NFD) and NVIDIA GPU Operators. The OpenShift CLI (oc) displays the appropriate output for the GPU worker node. For example:

    # Expected output when the GPU is detected properly
    oc describe node <node name>
    ...
    Capacity:
      cpu:                4
      ephemeral-storage:  313981932Ki
      hugepages-1Gi:      0
      hugepages-2Mi:      0
      memory:             16076568Ki
      nvidia.com/gpu:     1
      pods:               250
    Allocatable:
      cpu:                3920m
      ephemeral-storage:  288292006229
      hugepages-1Gi:      0
      hugepages-2Mi:      0
      memory:             12828440Ki
      nvidia.com/gpu:     1
      pods:               250
    Copy to Clipboard Toggle word wrap
Note

In OpenShift AI, Red Hat supports the use of accelerators within the same cluster only.

Red Hat supports remote direct memory access (RDMA) for NVIDIA GPUs only, enabling them to communicate directly with each other by using NVIDIA GPUDirect RDMA across either Ethernet or InfiniBand networks.

After installing the NVIDIA GPU Operator, create an accelerator profile as described in Working with accelerator profiles.

Important

By default, hardware profiles are hidden in the dashboard navigation menu and user interface, while accelerator profiles remain visible. In addition, user interface components associated with the deprecated accelerator profiles functionality are still displayed. To show the Settings Hardware profiles option in the dashboard navigation menu, and the user interface components associated with hardware profiles, set the disableHardwareProfiles value to false in the OdhDashboardConfig custom resource (CR) in OpenShift. For more information about setting dashboard configuration options, see Customizing the dashboard.

Retour au début
Red Hat logoGithubredditYoutubeTwitter

Apprendre

Essayez, achetez et vendez

Communautés

À propos de la documentation Red Hat

Nous aidons les utilisateurs de Red Hat à innover et à atteindre leurs objectifs grâce à nos produits et services avec un contenu auquel ils peuvent faire confiance. Découvrez nos récentes mises à jour.

Rendre l’open source plus inclusif

Red Hat s'engage à remplacer le langage problématique dans notre code, notre documentation et nos propriétés Web. Pour plus de détails, consultez le Blog Red Hat.

À propos de Red Hat

Nous proposons des solutions renforcées qui facilitent le travail des entreprises sur plusieurs plates-formes et environnements, du centre de données central à la périphérie du réseau.

Theme

© 2025 Red Hat