Chapter 3. Installing the NVIDIA GPU Operator
Install the NVIDIA GPU Operator to use the underlying NVIDIA CUDA AI accelerators that are available in the cluster.
Prerequisites
-
You have installed the OpenShift CLI (
oc). -
You have logged in as a user with
cluster-adminprivileges. - You have installed the Node Feature Discovery Operator.
Procedure
Create the
NamespaceCR for the NVIDIA GPU Operator:oc apply -f - <<EOF apiVersion: v1 kind: Namespace metadata: name: nvidia-gpu-operator EOFCreate the
OperatorGroupCR:oc apply -f - <<EOF apiVersion: operators.coreos.com/v1 kind: OperatorGroup metadata: name: gpu-operator-certified namespace: nvidia-gpu-operator spec: targetNamespaces: - nvidia-gpu-operator EOFCreate the
SubscriptionCR:oc apply -f - <<EOF apiVersion: operators.coreos.com/v1alpha1 kind: Subscription metadata: name: gpu-operator-certified namespace: nvidia-gpu-operator spec: channel: "stable" installPlanApproval: Manual name: gpu-operator-certified source: certified-operators sourceNamespace: openshift-marketplace EOF
Verification
Verify that the NVIDIA GPU Operator deployment is successful by running the following command:
$ oc get pods -n nvidia-gpu-operator
Example output
NAME READY STATUS RESTARTS AGE
gpu-feature-discovery-c2rfm 1/1 Running 0 6m28s
gpu-operator-84b7f5bcb9-vqds7 1/1 Running 0 39m
nvidia-container-toolkit-daemonset-pgcrf 1/1 Running 0 6m28s
nvidia-cuda-validator-p8gv2 0/1 Completed 0 99s
nvidia-dcgm-exporter-kv6k8 1/1 Running 0 6m28s
nvidia-dcgm-tpsps 1/1 Running 0 6m28s
nvidia-device-plugin-daemonset-gbn55 1/1 Running 0 6m28s
nvidia-device-plugin-validator-z7ltr 0/1 Completed 0 82s
nvidia-driver-daemonset-410.84.202203290245-0-xxgdv 2/2 Running 0 6m28s
nvidia-node-status-exporter-snmsm 1/1 Running 0 6m28s
nvidia-operator-validator-6pfk6 1/1 Running 0 6m28s