Hardware accelerators
Hardware accelerators
Abstract
Chapter 1. About hardware accelerators Copy linkLink copied to clipboard!
Specialized hardware accelerators play a key role in the emerging generative artificial intelligence and machine learning (AI/ML) industry. Specifically, hardware accelerators are essential to the training and serving of large language and other foundational models that power this new technology. Data scientists, data engineers, ML engineers, and developers can take advantage of the specialized hardware acceleration for data-intensive transformations and model development and serving. Much of that ecosystem is open source, with several contributing partners and open source foundations.
Red Hat OpenShift Container Platform provides support for cards and peripheral hardware that add processing units that comprise hardware accelerators:
- Graphical processing units (GPUs)
- Neural processing units (NPUs)
- Application-specific integrated circuits (ASICs)
- Data processing units (DPUs)
Specialized hardware accelerators provide a rich set of benefits for AI/ML development:
- One platform for all
- A collaborative environment for developers, data engineers, data scientists, and DevOps
- Extended capabilities with Operators
- Operators allow for bringing AI/ML capabilities to OpenShift Container Platform
- Hybrid-cloud support
- On-premise support for model development, delivery, and deployment
- Support for AI/ML workloads
- Model testing, iteration, integration, promotion, and serving into production as services
Red Hat provides an optimized platform to enable these specialized hardware accelerators in Red Hat Enterprise Linux (RHEL) and OpenShift Container Platform platforms at the Linux (kernel and userspace) and Kubernetes layers. To do this, Red Hat combines the proven capabilities of Red Hat OpenShift AI and Red Hat OpenShift Container Platform in a single enterprise-ready AI application platform.
Hardware Operators use the operating framework of a Kubernetes cluster to enable the required accelerator resources. You can also deploy the provided device plugin manually or as a daemon set. This plugin registers the GPU in the cluster.
Certain specialized hardware accelerators are designed to work within disconnected environments where a secure environment must be maintained for development and testing.
1.1. Hardware accelerators Copy linkLink copied to clipboard!
Red Hat OpenShift Container Platform enables the following hardware accelerators:
- NVIDIA GPU
- AMD Instinct® GPU
- Intel® Gaudi®
Chapter 2. NVIDIA GPU architecture Copy linkLink copied to clipboard!
NVIDIA supports the use of graphics processing unit (GPU) resources on OpenShift Container Platform. OpenShift Container Platform is a security-focused and hardened Kubernetes platform developed and supported by Red Hat for deploying and managing Kubernetes clusters at scale. OpenShift Container Platform includes enhancements to Kubernetes so that users can easily configure and use NVIDIA GPU resources to accelerate workloads.
The NVIDIA GPU Operator uses the Operator framework within OpenShift Container Platform to manage the full lifecycle of NVIDIA software components required to run GPU-accelerated workloads.
These components include the NVIDIA drivers (to enable CUDA), the Kubernetes device plugin for GPUs, the NVIDIA Container Toolkit, automatic node tagging using GPU feature discovery (GFD), DCGM-based monitoring, and others.
The NVIDIA GPU Operator is only supported by NVIDIA. For more information about obtaining support from NVIDIA, see Obtaining Support from NVIDIA.
2.1. NVIDIA GPU prerequisites Copy linkLink copied to clipboard!
- A working OpenShift cluster with at least one GPU worker node.
-
Access to the OpenShift cluster as a
cluster-admin
to perform the required steps. -
OpenShift CLI (
oc
) is installed. -
The node feature discovery (NFD) Operator is installed and a
nodefeaturediscovery
instance is created.
2.2. NVIDIA GPU enablement Copy linkLink copied to clipboard!
The following diagram shows how the GPU architecture is enabled for OpenShift:
Figure 2.1. NVIDIA GPU enablement
MIG is supported on GPUs starting with the NVIDIA Ampere generation. For a list of GPUs that support MIG, see the NVIDIA MIG User Guide.
2.2.1. GPUs and bare metal Copy linkLink copied to clipboard!
You can deploy OpenShift Container Platform on an NVIDIA-certified bare metal server but with some limitations:
- Control plane nodes can be CPU nodes.
Worker nodes must be GPU nodes, provided that AI/ML workloads are executed on these worker nodes.
In addition, the worker nodes can host one or more GPUs, but they must be of the same type. For example, a node can have two NVIDIA A100 GPUs, but a node with one A100 GPU and one T4 GPU is not supported. The NVIDIA Device Plugin for Kubernetes does not support mixing different GPU models on the same node.
- When using OpenShift, note that one or three or more servers are required. Clusters with two servers are not supported. The single server deployment is called single node openShift (SNO) and using this configuration results in a non-high availability OpenShift environment.
You can choose one of the following methods to access the containerized GPUs:
- GPU passthrough
- Multi-Instance GPU (MIG)
2.2.2. GPUs and virtualization Copy linkLink copied to clipboard!
Many developers and enterprises are moving to containerized applications and serverless infrastructures, but there is still a lot of interest in developing and maintaining applications that run on virtual machines (VMs). Red Hat OpenShift Virtualization provides this capability, enabling enterprises to incorporate VMs into containerized workflows within clusters.
You can choose one of the following methods to connect the worker nodes to the GPUs:
- GPU passthrough to access and use GPU hardware within a virtual machine (VM).
- GPU (vGPU) time-slicing, when GPU compute capacity is not saturated by workloads.
2.2.3. GPUs and vSphere Copy linkLink copied to clipboard!
You can deploy OpenShift Container Platform on an NVIDIA-certified VMware vSphere server that can host different GPU types.
An NVIDIA GPU driver must be installed in the hypervisor in case vGPU instances are used by the VMs. For VMware vSphere, this host driver is provided in the form of a VIB file.
The maximum number of vGPUS that can be allocated to worker node VMs depends on the version of vSphere:
- vSphere 7.0: maximum 4 vGPU per VM
vSphere 8.0: maximum 8 vGPU per VM
NotevSphere 8.0 introduced support for multiple full or fractional heterogenous profiles associated with a VM.
You can choose one of the following methods to attach the worker nodes to the GPUs:
- GPU passthrough for accessing and using GPU hardware within a virtual machine (VM)
- GPU (vGPU) time-slicing, when not all of the GPU is needed
Similar to bare metal deployments, one or three or more servers are required. Clusters with two servers are not supported.
2.2.4. GPUs and Red Hat KVM Copy linkLink copied to clipboard!
You can use OpenShift Container Platform on an NVIDIA-certified kernel-based virtual machine (KVM) server.
Similar to bare-metal deployments, one or three or more servers are required. Clusters with two servers are not supported.
However, unlike bare-metal deployments, you can use different types of GPUs in the server. This is because you can assign these GPUs to different VMs that act as Kubernetes nodes. The only limitation is that a Kubernetes node must have the same set of GPU types at its own level.
You can choose one of the following methods to access the containerized GPUs:
- GPU passthrough for accessing and using GPU hardware within a virtual machine (VM)
- GPU (vGPU) time-slicing when not all of the GPU is needed
To enable the vGPU capability, a special driver must be installed at the host level. This driver is delivered as a RPM package. This host driver is not required at all for GPU passthrough allocation.
2.2.5. GPUs and CSPs Copy linkLink copied to clipboard!
You can deploy OpenShift Container Platform to one of the major cloud service providers (CSPs): Amazon Web Services (AWS), Google Cloud Platform (GCP), or Microsoft Azure.
Two modes of operation are available: a fully managed deployment and a self-managed deployment.
- In a fully managed deployment, everything is automated by Red Hat in collaboration with CSP. You can request an OpenShift instance through the CSP web console, and the cluster is automatically created and fully managed by Red Hat. You do not have to worry about node failures or errors in the environment. Red Hat is fully responsible for maintaining the uptime of the cluster. The fully managed services are available on AWS, Azure, and GCP. For AWS, the OpenShift service is called (Red Hat OpenShift Service on AWS). For Azure, the service is called Azure Red Hat OpenShift. For GCP, the service is called OpenShift Dedicated on GCP.
- In a self-managed deployment, you are responsible for instantiating and maintaining the OpenShift cluster. Red Hat provides the OpenShift-install utility to support the deployment of the OpenShift cluster in this case. The self-managed services are available globally to all CSPs.
It is important that this compute instance is a GPU-accelerated compute instance and that the GPU type matches the list of supported GPUs from NVIDIA AI Enterprise. For example, T4, V100, and A100 are part of this list.
You can choose one of the following methods to access the containerized GPUs:
- GPU passthrough to access and use GPU hardware within a virtual machine (VM).
- GPU (vGPU) time slicing when the entire GPU is not required.
2.2.6. GPUs and Red Hat Device Edge Copy linkLink copied to clipboard!
Red Hat Device Edge provides access to MicroShift. MicroShift provides the simplicity of a single-node deployment with the functionality and services you need for resource-constrained (edge) computing. Red Hat Device Edge meets the needs of bare-metal, virtual, containerized, or Kubernetes workloads deployed in resource-constrained environments.
You can enable NVIDIA GPUs on containers in a Red Hat Device Edge environment.
You use GPU passthrough to access the containerized GPUs.
2.3. GPU sharing methods Copy linkLink copied to clipboard!
Red Hat and NVIDIA have developed GPU concurrency and sharing mechanisms to simplify GPU-accelerated computing on an enterprise-level OpenShift Container Platform cluster.
Applications typically have different compute requirements that can leave GPUs underutilized. Providing the right amount of compute resources for each workload is critical to reduce deployment cost and maximize GPU utilization.
Concurrency mechanisms for improving GPU utilization exist that range from programming model APIs to system software and hardware partitioning, including virtualization. The following list shows the GPU concurrency mechanisms:
- Compute Unified Device Architecture (CUDA) streams
- Time-slicing
- CUDA Multi-Process Service (MPS)
- Multi-instance GPU (MIG)
- Virtualization with vGPU
Consider the following GPU sharing suggestions when using the GPU concurrency mechanisms for different OpenShift Container Platform scenarios:
- Bare metal
- vGPU is not available. Consider using MIG-enabled cards.
- VMs
- vGPU is the best choice.
- Older NVIDIA cards with no MIG on bare metal
- Consider using time-slicing.
- VMs with multiple GPUs and you want passthrough and vGPU
- Consider using separate VMs.
- Bare metal with OpenShift Virtualization and multiple GPUs
- Consider using pass-through for hosted VMs and time-slicing for containers.
Additional resources
2.3.1. CUDA streams Copy linkLink copied to clipboard!
Compute Unified Device Architecture (CUDA) is a parallel computing platform and programming model developed by NVIDIA for general computing on GPUs.
A stream is a sequence of operations that executes in issue-order on the GPU. CUDA commands are typically executed sequentially in a default stream and a task does not start until a preceding task has completed.
Asynchronous processing of operations across different streams allows for parallel execution of tasks. A task issued in one stream runs before, during, or after another task is issued into another stream. This allows the GPU to run multiple tasks simultaneously in no prescribed order, leading to improved performance.
Additional resources
2.3.2. Time-slicing Copy linkLink copied to clipboard!
GPU time-slicing interleaves workloads scheduled on overloaded GPUs when you are running multiple CUDA applications.
You can enable time-slicing of GPUs on Kubernetes by defining a set of replicas for a GPU, each of which can be independently distributed to a pod to run workloads on. Unlike multi-instance GPU (MIG), there is no memory or fault isolation between replicas, but for some workloads this is better than not sharing at all. Internally, GPU time-slicing is used to multiplex workloads from replicas of the same underlying GPU.
You can apply a cluster-wide default configuration for time-slicing. You can also apply node-specific configurations. For example, you can apply a time-slicing configuration only to nodes with Tesla T4 GPUs and not modify nodes with other GPU models.
You can combine these two approaches by applying a cluster-wide default configuration and then labeling nodes to give those nodes a node-specific configuration.
2.3.3. CUDA Multi-Process Service Copy linkLink copied to clipboard!
CUDA Multi-Process Service (MPS) allows a single GPU to use multiple CUDA processes. The processes run in parallel on the GPU, eliminating saturation of the GPU compute resources. MPS also enables concurrent execution, or overlapping, of kernel operations and memory copying from different processes to enhance utilization.
Additional resources
2.3.4. Multi-instance GPU Copy linkLink copied to clipboard!
Using Multi-instance GPU (MIG), you can split GPU compute units and memory into multiple MIG instances. Each of these instances represents a standalone GPU device from a system perspective and can be connected to any application, container, or virtual machine running on the node. The software that uses the GPU treats each of these MIG instances as an individual GPU.
MIG is useful when you have an application that does not require the full power of an entire GPU. The MIG feature of the new NVIDIA Ampere architecture enables you to split your hardware resources into multiple GPU instances, each of which is available to the operating system as an independent CUDA-enabled GPU.
NVIDIA GPU Operator version 1.7.0 and higher provides MIG support for the A100 and A30 Ampere cards. These GPU instances are designed to support up to seven multiple independent CUDA applications so that they operate completely isolated with dedicated hardware resources.
Additional resources
2.3.5. Virtualization with vGPU Copy linkLink copied to clipboard!
Virtual machines (VMs) can directly access a single physical GPU using NVIDIA vGPU. You can create virtual GPUs that can be shared by VMs across the enterprise and accessed by other devices.
This capability combines the power of GPU performance with the management and security benefits provided by vGPU. Additional benefits provided by vGPU includes proactive management and monitoring for your VM environment, workload balancing for mixed VDI and compute workloads, and resource sharing across multiple VMs.
Additional resources
2.4. NVIDIA GPU features for OpenShift Container Platform Copy linkLink copied to clipboard!
- NVIDIA Container Toolkit
- NVIDIA Container Toolkit enables you to create and run GPU-accelerated containers. The toolkit includes a container runtime library and utilities to automatically configure containers to use NVIDIA GPUs.
- NVIDIA AI Enterprise
NVIDIA AI Enterprise is an end-to-end, cloud-native suite of AI and data analytics software optimized, certified, and supported with NVIDIA-Certified systems.
NVIDIA AI Enterprise includes support for Red Hat OpenShift Container Platform. The following installation methods are supported:
- OpenShift Container Platform on bare metal or VMware vSphere with GPU Passthrough.
- OpenShift Container Platform on VMware vSphere with NVIDIA vGPU.
- GPU Feature Discovery
NVIDIA GPU Feature Discovery for Kubernetes is a software component that enables you to automatically generate labels for the GPUs available on a node. GPU Feature Discovery uses node feature discovery (NFD) to perform this labeling.
The Node Feature Discovery Operator (NFD) manages the discovery of hardware features and configurations in an OpenShift Container Platform cluster by labeling nodes with hardware-specific information. NFD labels the host with node-specific attributes, such as PCI cards, kernel, OS version, and so on.
You can find the NFD Operator in the Operator Hub by searching for “Node Feature Discovery”.
- NVIDIA GPU Operator with OpenShift Virtualization
Up until this point, the GPU Operator only provisioned worker nodes to run GPU-accelerated containers. Now, the GPU Operator can also be used to provision worker nodes for running GPU-accelerated virtual machines (VMs).
You can configure the GPU Operator to deploy different software components to worker nodes depending on which GPU workload is configured to run on those nodes.
- GPU Monitoring dashboard
- You can install a monitoring dashboard to display GPU usage information on the cluster Observe page in the OpenShift Container Platform web console. GPU utilization information includes the number of available GPUs, power consumption (in watts), temperature (in degrees Celsius), utilization (in percent), and other metrics for each GPU.
Chapter 3. AMD GPU Operator Copy linkLink copied to clipboard!
AMD Instinct GPU accelerators combined with the AMD GPU Operator within your OpenShift Container Platform cluster lets you seamlessly harness computing capabilities for machine learning, Generative AI, and GPU-accelerated applications.
This documentation provides the information you need to enable, configure, and test the AMD GPU Operator. For more information, see AMD Instinct™ Accelerators.
3.1. About the AMD GPU Operator Copy linkLink copied to clipboard!
The hardware acceleration capabilities of the AMD GPU Operator provide enhanced performance and cost efficiency for data scientists and developers using Red Hat OpenShift AI for creating artificial intelligence and machine learning (AI/ML) applications. Accelerating specific areas of GPU functions can minimize CPU processing and memory usage, improving overall application speed, memory consumption, and bandwidth restrictions.
3.2. Installing the AMD GPU Operator Copy linkLink copied to clipboard!
As a cluster administrator, you can install the AMD GPU Operator by using the OpenShift CLI and the web console. This is a multi-step procedure that requires the installation of the Node Feature Discovery Operator, the Kernel Module Management Operator, and then the AMD GPU Operator. Use the following steps in succession to install the AMD community release of the Operator.
Next steps
- Install the Node Feature Discovery Operator.
- Install the Kernel Module Management Operator.
- Install and configure the AMD GPU Operator.
3.3. Testing the AMD GPU Operator Copy linkLink copied to clipboard!
Use the following procedure to test the ROCmInfo installation and view the logs for the AMD MI210 GPU.
Procedure
Create a YAML file that tests ROCmInfo:
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Create the
rocminfo
pod:oc create -f rocminfo.yaml
$ oc create -f rocminfo.yaml
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
apiVersion: v1 pod/rocminfo created
apiVersion: v1 pod/rocminfo created
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Check the
rocmnfo
log with one MI210 GPU:oc logs rocminfo | grep -A5 "Agent"
$ oc logs rocminfo | grep -A5 "Agent"
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Delete the pod:
oc delete -f rocminfo.yaml
$ oc delete -f rocminfo.yaml
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
pod "rocminfo" deleted
pod "rocminfo" deleted
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
Chapter 4. Intel Gaudi AI accelerators Copy linkLink copied to clipboard!
You can use Intel Gaudi AI accelerators for your OpenShift Container Platform generative AI and machine learning (AI/ML) applications. Intel Gaudi AI accelerators offer a cost-efficient, flexible, and scalable solution for optimized deep learning workloads.
Red Hat supports Intel Gaudi 2 and Intel Gaudi 3 devices. Intel Gaudi 3 devices provide significant improvements in training speed and energy efficiency.
4.1. Intel Gaudi AI accelerators prerequisites Copy linkLink copied to clipboard!
- You have a working OpenShift Container Platform cluster with at least one GPU worker node.
- You have access to the OpenShift Container Platform cluster as a cluster-admin to perform the required steps.
-
You have installed OpenShift CLI (
oc
). -
You have installed the Node Feature Discovery (NFD) Operator and created a
NodeFeatureDiscovery
instance.
Chapter 5. NVIDIA GPUDirect Remote Direct Memory Access (RDMA) Copy linkLink copied to clipboard!
NVIDIA GPUDirect Remote Direct Memory Access (RDMA) allows for an application in one computer to directly access the memory of another computer without needing access through the operating system. This provides the ability to bypass kernel intervention in the process, freeing up resources and greatly reducing the CPU overhead normally needed to process network communications. This is useful for distributing GPU-accelerated workloads across clusters. And because RDMA is so suited toward high bandwidth and low latency applications, this makes it ideal for big data and machine learning applications.
There are currently three configuration methods for NVIDIA GPUDirect RDMA:
- Shared device
- This method allows for an NVIDIA GPUDirect RDMA device to be shared among multiple pods on the OpenShift Container Platform worker node where the device is exposed.
- Host device
- This method provides direct physical Ethernet access on the worker node by creating an additional host network on a pod. A plugin allows the network device to be moved from the host network namespace to the network namespace on the pod.
- SR-IOV legacy device
- The Single Root IO Virtualization (SR-IOV) method can share a single network device, such as an Ethernet adapter, with multiple pods. SR-IOV segments the device, recognized on the host node as a physical function (PF), into multiple virtual functions (VFs). The VF is used like any other network device.
Each of these methods can be used across either the NVIDIA GPUDirect RDMA over Converged Ethernet (RoCE) or Infiniband infrastructures, providing an aggregate total of six methods of configuration.
5.1. NVIDIA GPUDirect RDMA prerequisites Copy linkLink copied to clipboard!
All methods of NVIDIA GPUDirect RDMA configuration require the installation of specific Operators. Use the following steps to install the Operators:
- Install the Node Feature Discovery Operator.
- Install the SR-IOV Operator.
- Install the NVIDIA Network Operator (NVIDIA documentation).
- Install the NVIDIA GPU Operator (NVIDIA documentation).
5.2. Disabling the IRDMA kernel module Copy linkLink copied to clipboard!
On some systems, including the DellR750xa, the IRDMA kernel module creates problems for the NVIDIA Network Operator when unloading and loading the DOCA drivers. Use the following procedure to disable the module.
Procedure
Generate the following machine configuration file by running the following command:
cat <<EOF > 99-machine-config-blacklist-irdma.yaml
$ cat <<EOF > 99-machine-config-blacklist-irdma.yaml
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Create the machine configuration on the cluster and wait for the nodes to reboot by running the following command:
oc create -f 99-machine-config-blacklist-irdma.yaml
$ oc create -f 99-machine-config-blacklist-irdma.yaml
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
machineconfig.machineconfiguration.openshift.io/99-worker-blacklist-irdma created
machineconfig.machineconfiguration.openshift.io/99-worker-blacklist-irdma created
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Validate in a debug pod on each node that the module has not loaded by running the following command:
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
5.3. Creating persistent naming rules Copy linkLink copied to clipboard!
In some cases, device names won’t persist following a reboot. For example, on R760xa systems Mellanox devices might be renamed after a reboot. You can avoid this problem by using a MachineConfig
to set persistence.
Procedure
Gather the MAC address names from the worker nodes for the node into a file and provide names for the interfaces that need to persist. This example uses the file
70-persistent-net.rules
and stashes the details in it.Copy to Clipboard Copied! Toggle word wrap Toggle overflow Convert that file into a base64 string without line breaks and set the output to the variable
PERSIST
:PERSIST=`cat 70-persistent-net.rules| base64 -w 0` echo $PERSIST U1VCU1lTVEVNPT0ibmV0IixBQ1RJT049PSJhZGQiLEFUVFJ7YWRkcmVzc309PSJiODozZjpkMjozYjo1MToyOCIsQVRUUnt0eXBlfT09IjEiLE5BTUU9ImliczJmMCIKU1VCU1lTVEVNPT0ibmV0IixBQ1RJT049PSJhZGQiLEFUVFJ7YWRkcmVzc309PSJiODozZjpkMjozYjo1MToyOSIsQVRUUnt0eXBlfT09IjEiLE5BTUU9ImVuczhmMG5wMCIKU1VCU1lTVEVNPT0ibmV0IixBQ1RJT049PSJhZGQiLEFUVFJ7YWRkcmVzc309PSJiODozZjpkMjpmMDozNjpkMCIsQVRUUnt0eXBlfT09IjEiLE5BTUU9ImliczJmMCIKU1VCU1lTVEVNPT0ibmV0IixBQ1RJT049PSJhZGQiLEFUVFJ7YWRkcmVzc309PSJiODozZjpkMjpmMDozNjpkMSIsQVRUUnt0eXBlfT09IjEiLE5BTUU9ImVuczhmMG5wMCIK
$ PERSIST=`cat 70-persistent-net.rules| base64 -w 0` $ echo $PERSIST U1VCU1lTVEVNPT0ibmV0IixBQ1RJT049PSJhZGQiLEFUVFJ7YWRkcmVzc309PSJiODozZjpkMjozYjo1MToyOCIsQVRUUnt0eXBlfT09IjEiLE5BTUU9ImliczJmMCIKU1VCU1lTVEVNPT0ibmV0IixBQ1RJT049PSJhZGQiLEFUVFJ7YWRkcmVzc309PSJiODozZjpkMjozYjo1MToyOSIsQVRUUnt0eXBlfT09IjEiLE5BTUU9ImVuczhmMG5wMCIKU1VCU1lTVEVNPT0ibmV0IixBQ1RJT049PSJhZGQiLEFUVFJ7YWRkcmVzc309PSJiODozZjpkMjpmMDozNjpkMCIsQVRUUnt0eXBlfT09IjEiLE5BTUU9ImliczJmMCIKU1VCU1lTVEVNPT0ibmV0IixBQ1RJT049PSJhZGQiLEFUVFJ7YWRkcmVzc309PSJiODozZjpkMjpmMDozNjpkMSIsQVRUUnt0eXBlfT09IjEiLE5BTUU9ImVuczhmMG5wMCIK
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Create a machine configuration and set the base64 encoding in the custom resource file by running the following command:
cat <<EOF > 99-machine-config-udev-network.yaml
$ cat <<EOF > 99-machine-config-udev-network.yaml
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Copy to Clipboard Copied! Toggle word wrap Toggle overflow Create the machine configuration on the cluster by running the following command:
oc create -f 99-machine-config-udev-network.yaml
$ oc create -f 99-machine-config-udev-network.yaml
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
machineconfig.machineconfiguration.openshift.io/99-machine-config-udev-network created
machineconfig.machineconfiguration.openshift.io/99-machine-config-udev-network created
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Use the
get mcp
command to view the machine configuration status:oc get mcp
$ oc get mcp
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
NAME CONFIG UPDATED UPDATING DEGRADED MACHINECOUNT READYMACHINECOUNT UPDATEDMACHINECOUNT DEGRADEDMACHINECOUNT AGE master rendered-master-9adfe851c2c14d9598eea5ec3df6c187 True False False 1 1 1 0 6h21m worker rendered-worker-4568f1b174066b4b1a4de794cf538fee False True False 2 0 0 0 6h21m
NAME CONFIG UPDATED UPDATING DEGRADED MACHINECOUNT READYMACHINECOUNT UPDATEDMACHINECOUNT DEGRADEDMACHINECOUNT AGE master rendered-master-9adfe851c2c14d9598eea5ec3df6c187 True False False 1 1 1 0 6h21m worker rendered-worker-4568f1b174066b4b1a4de794cf538fee False True False 2 0 0 0 6h21m
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
The nodes will reboot and when the updating field returns to false
, you can validate on the nodes by looking at the devices in a debug pod.
5.4. Configuring the NFD Operator Copy linkLink copied to clipboard!
The Node Feature Discovery (NFD) Operator manages the detection of hardware features and configuration in an OpenShift Container Platform cluster by labeling the nodes with hardware-specific information. NFD labels the host with node-specific attributes, such as PCI cards, kernel, operating system version, and so on.
Prerequisites
- You have installed the NFD Operator.
Procedure
Validate that the Operator is installed and running by looking at the pods in the
openshift-nfd
namespace by running the following command:oc get pods -n openshift-nfd
$ oc get pods -n openshift-nfd
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
NAME READY STATUS RESTARTS AGE nfd-controller-manager-8698c88cdd-t8gbc 2/2 Running 0 2m
NAME READY STATUS RESTARTS AGE nfd-controller-manager-8698c88cdd-t8gbc 2/2 Running 0 2m
Copy to Clipboard Copied! Toggle word wrap Toggle overflow With the NFD controller running, generate the
NodeFeatureDiscovery
instance and add it to the cluster.The
ClusterServiceVersion
specification for NFD Operator provides default values, including the NFD operand image that is part of the Operator payload. Retrieve its value by running the following command:NFD_OPERAND_IMAGE=`echo $(oc get csv -n openshift-nfd -o json | jq -r '.items[0].metadata.annotations["alm-examples"]') | jq -r '.[] | select(.kind == "NodeFeatureDiscovery") | .spec.operand.image'`
$ NFD_OPERAND_IMAGE=`echo $(oc get csv -n openshift-nfd -o json | jq -r '.items[0].metadata.annotations["alm-examples"]') | jq -r '.[] | select(.kind == "NodeFeatureDiscovery") | .spec.operand.image'`
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Optional: Add entries to the default
deviceClassWhiteList
field, to support more network adapters, such as the NVIDIA BlueField DPUs.Copy to Clipboard Copied! Toggle word wrap Toggle overflow Create the 'NodeFeatureDiscovery` instance by running the following command:
oc create -f nfd-instance.yaml
$ oc create -f nfd-instance.yaml
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
nodefeaturediscovery.nfd.openshift.io/nfd-instance created
nodefeaturediscovery.nfd.openshift.io/nfd-instance created
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Validate that the instance is up and running by looking at the pods under the
openshift-nfd
namespace by running the following command:oc get pods -n openshift-nfd
$ oc get pods -n openshift-nfd
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Wait a short period of time and then verify that NFD has added labels to the node. The NFD labels are prefixed with
feature.node.kubernetes.io
, so you can easily filter them.Copy to Clipboard Copied! Toggle word wrap Toggle overflow Confirm there is a network device that is discovered:
oc describe node | grep -E 'Roles|pci' | grep pci-15b3
$ oc describe node | grep -E 'Roles|pci' | grep pci-15b3 feature.node.kubernetes.io/pci-15b3.present=true feature.node.kubernetes.io/pci-15b3.sriov.capable=true feature.node.kubernetes.io/pci-15b3.present=true feature.node.kubernetes.io/pci-15b3.sriov.capable=true
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
5.5. Configuring the SR-IOV Operator Copy linkLink copied to clipboard!
Single root I/O virtualization (SR-IOV) enhances the performance of NVIDIA GPUDirect RDMA by providing sharing across multiple pods from a single device.
Prerequisites
- You have installed the SR-IOV Operator.
Procedure
Validate that the Operator is installed and running by looking at the pods in the
openshift-sriov-network-operator
namespace by running the following command:oc get pods -n openshift-sriov-network-operator
$ oc get pods -n openshift-sriov-network-operator
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
NAME READY STATUS RESTARTS AGE sriov-network-operator-7cb6c49868-89486 1/1 Running 0 22s
NAME READY STATUS RESTARTS AGE sriov-network-operator-7cb6c49868-89486 1/1 Running 0 22s
Copy to Clipboard Copied! Toggle word wrap Toggle overflow For the default
SriovOperatorConfig
CR to work with the MLNX_OFED container, run this command to update the following values:Copy to Clipboard Copied! Toggle word wrap Toggle overflow Create the resource on the cluster by running the following command:
oc create -f sriov-operator-config.yaml
$ oc create -f sriov-operator-config.yaml
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
sriovoperatorconfig.sriovnetwork.openshift.io/default created
sriovoperatorconfig.sriovnetwork.openshift.io/default created
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Patch the sriov-operator so the MOFED container can work with it by running the following command:
oc patch sriovoperatorconfig default --type=merge -n openshift-sriov-network-operator --patch '{ "spec": { "configDaemonNodeSelector": { "network.nvidia.com/operator.mofed.wait": "false", "node-role.kubernetes.io/worker": "", "feature.node.kubernetes.io/pci-15b3.sriov.capable": "true" } } }'
$ oc patch sriovoperatorconfig default --type=merge -n openshift-sriov-network-operator --patch '{ "spec": { "configDaemonNodeSelector": { "network.nvidia.com/operator.mofed.wait": "false", "node-role.kubernetes.io/worker": "", "feature.node.kubernetes.io/pci-15b3.sriov.capable": "true" } } }'
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
sriovoperatorconfig.sriovnetwork.openshift.io/default patched
sriovoperatorconfig.sriovnetwork.openshift.io/default patched
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
5.6. Configuring the NVIDIA network Operator Copy linkLink copied to clipboard!
The NVIDIA network Operator manages NVIDIA networking resources and networking related components such as drivers and device plugins to enable NVIDIA GPUDirect RDMA workloads.
Prerequisites
- You have installed the NVIDIA network Operator.
Procedure
Validate that the network Operator is installed and running by confirming the controller is running in the
nvidia-network-operator
namespace by running the following command:oc get pods -n nvidia-network-operator
$ oc get pods -n nvidia-network-operator
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
NAME READY STATUS RESTARTS AGE nvidia-network-operator-controller-manager-6f7d6956cd-fw5wg 1/1 Running 0 5m
NAME READY STATUS RESTARTS AGE nvidia-network-operator-controller-manager-6f7d6956cd-fw5wg 1/1 Running 0 5m
Copy to Clipboard Copied! Toggle word wrap Toggle overflow With the Operator running, create the
NicClusterPolicy
custom resource file. The device you choose depends on your system configuration. In this example, the Infiniband interfaceibs2f0
is hard coded and is used as the shared NVIDIA GPUDirect RDMA device.Copy to Clipboard Copied! Toggle word wrap Toggle overflow Create the
NicClusterPolicy
custom resource on the cluster by running the following command:oc create -f network-sharedrdma-nic-cluster-policy.yaml
$ oc create -f network-sharedrdma-nic-cluster-policy.yaml
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
nicclusterpolicy.mellanox.com/nic-cluster-policy created
nicclusterpolicy.mellanox.com/nic-cluster-policy created
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Validate the
NicClusterPolicy
by running the following command in the DOCA/MOFED container:oc get pods -n nvidia-network-operator
$ oc get pods -n nvidia-network-operator
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
Copy to Clipboard Copied! Toggle word wrap Toggle overflow rsh
into themofed
container to check the status by running the following command:MOFED_POD=$(oc get pods -n nvidia-network-operator -o name | grep mofed) oc rsh -n nvidia-network-operator -c mofed-container ${MOFED_POD}
$ MOFED_POD=$(oc get pods -n nvidia-network-operator -o name | grep mofed) $ oc rsh -n nvidia-network-operator -c mofed-container ${MOFED_POD} sh-5.1# ofed_info -s
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
OFED-internal-24.07-0.6.1:
OFED-internal-24.07-0.6.1:
Copy to Clipboard Copied! Toggle word wrap Toggle overflow ibdev2netdev -v
sh-5.1# ibdev2netdev -v
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
0000:0d:00.0 mlx5_0 (MT41692 - 900-9D3B4-00EN-EA0) BlueField-3 E-series SuperNIC 400GbE/NDR single port QSFP112, PCIe Gen5.0 x16 FHHL, Crypto Enabled, 16GB DDR5, BMC, Tall Bracket fw 32.42.1000 port 1 (ACTIVE) ==> ibs2f0 (Up) 0000:a0:00.0 mlx5_1 (MT41692 - 900-9D3B4-00EN-EA0) BlueField-3 E-series SuperNIC 400GbE/NDR single port QSFP112, PCIe Gen5.0 x16 FHHL, Crypto Enabled, 16GB DDR5, BMC, Tall Bracket fw 32.42.1000 port 1 (ACTIVE) ==> ens8f0np0 (Up)
0000:0d:00.0 mlx5_0 (MT41692 - 900-9D3B4-00EN-EA0) BlueField-3 E-series SuperNIC 400GbE/NDR single port QSFP112, PCIe Gen5.0 x16 FHHL, Crypto Enabled, 16GB DDR5, BMC, Tall Bracket fw 32.42.1000 port 1 (ACTIVE) ==> ibs2f0 (Up) 0000:a0:00.0 mlx5_1 (MT41692 - 900-9D3B4-00EN-EA0) BlueField-3 E-series SuperNIC 400GbE/NDR single port QSFP112, PCIe Gen5.0 x16 FHHL, Crypto Enabled, 16GB DDR5, BMC, Tall Bracket fw 32.42.1000 port 1 (ACTIVE) ==> ens8f0np0 (Up)
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Create a
IPoIBNetwork
custom resource file:Copy to Clipboard Copied! Toggle word wrap Toggle overflow Create the
IPoIBNetwork
resource on the cluster by running the following command:oc create -f ipoib-network.yaml
$ oc create -f ipoib-network.yaml
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
ipoibnetwork.mellanox.com/example-ipoibnetwork created
ipoibnetwork.mellanox.com/example-ipoibnetwork created
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Create a
MacvlanNetwork
custom resource file for your other interface:Copy to Clipboard Copied! Toggle word wrap Toggle overflow Create the resource on the cluster by running the following command:
oc create -f macvlan-network.yaml
$ oc create -f macvlan-network.yaml
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
macvlannetwork.mellanox.com/rdmashared-net created
macvlannetwork.mellanox.com/rdmashared-net created
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
5.7. Configuring the GPU Operator Copy linkLink copied to clipboard!
The GPU Operator automates the management of the NVIDIA drivers, device plugins for GPUs, the NVIDIA Container Toolkit, and other components required for GPU provisioning.
Prerequisites
- You have installed the GPU Operator.
Procedure
Check that the Operator pod is running to look at the pods under the namespace by running the following command:
oc get pods -n nvidia-gpu-operator
$ oc get pods -n nvidia-gpu-operator
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
NAME READY STATUS RESTARTS AGE gpu-operator-b4cb7d74-zxpwq 1/1 Running 0 32s
NAME READY STATUS RESTARTS AGE gpu-operator-b4cb7d74-zxpwq 1/1 Running 0 32s
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Create a GPU cluster policy custom resource file similar to the following example:
Copy to Clipboard Copied! Toggle word wrap Toggle overflow When the GPU
ClusterPolicy
custom resource has generated, create the resource on the cluster by running the following command:oc create -f gpu-cluster-policy.yaml
$ oc create -f gpu-cluster-policy.yaml
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
clusterpolicy.nvidia.com/gpu-cluster-policy created
clusterpolicy.nvidia.com/gpu-cluster-policy created
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Validate that the Operator is installed and running by running the following command:
oc get pods -n nvidia-gpu-operator
$ oc get pods -n nvidia-gpu-operator
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Optional: When you have verified the pods are running, remote shell into the NVIDIA driver daemonset pod and confirm that the NVIDIA modules are loaded. Specifically, ensure the
nvidia_peermem
is loaded.oc rsh -n nvidia-gpu-operator $(oc -n nvidia-gpu-operator get pod -o name -l app.kubernetes.io/component=nvidia-driver)
$ oc rsh -n nvidia-gpu-operator $(oc -n nvidia-gpu-operator get pod -o name -l app.kubernetes.io/component=nvidia-driver) sh-4.4# lsmod|grep nvidia
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
Copy to Clipboard Copied! Toggle word wrap Toggle overflow -
Optional: Run the
nvidia-smi
utility to show the details about the driver and the hardware:
nvidia-smi
sh-4.4# nvidia-smi
+ .Example output
While you are still in the driver pod, set the GPU clock to maximum using the
nvidia-smi
command:oc rsh -n nvidia-gpu-operator nvidia-driver-daemonset-416.94.202410172137-0-ndhzc
$ oc rsh -n nvidia-gpu-operator nvidia-driver-daemonset-416.94.202410172137-0-ndhzc sh-4.4# nvidia-smi -i 0 -lgc $(nvidia-smi -i 0 --query-supported-clocks=graphics --format=csv,noheader,nounits | sort -h | tail -n 1)
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
GPU clocks set to "(gpuClkMin 1740, gpuClkMax 1740)" for GPU 00000000:61:00.0 All done.
GPU clocks set to "(gpuClkMin 1740, gpuClkMax 1740)" for GPU 00000000:61:00.0 All done.
Copy to Clipboard Copied! Toggle word wrap Toggle overflow nvidia-smi -i 1 -lgc $(nvidia-smi -i 1 --query-supported-clocks=graphics --format=csv,noheader,nounits | sort -h | tail -n 1)
sh-4.4# nvidia-smi -i 1 -lgc $(nvidia-smi -i 1 --query-supported-clocks=graphics --format=csv,noheader,nounits | sort -h | tail -n 1)
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
GPU clocks set to "(gpuClkMin 1740, gpuClkMax 1740)" for GPU 00000000:E1:00.0 All done.
GPU clocks set to "(gpuClkMin 1740, gpuClkMax 1740)" for GPU 00000000:E1:00.0 All done.
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Validate the resource is available from a node describe perspective by running the following command:
oc describe node -l node-role.kubernetes.io/worker=| grep -E 'Capacity:|Allocatable:' -A9
$ oc describe node -l node-role.kubernetes.io/worker=| grep -E 'Capacity:|Allocatable:' -A9
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
5.8. Creating the machine configuration Copy linkLink copied to clipboard!
Before you create the resource pods, you need to create the machineconfig.yaml
custom resource (CR) that provides access to the GPU and networking resources without the need for user privileges.
Procedure
Generate a
Machineconfig
CR:Copy to Clipboard Copied! Toggle word wrap Toggle overflow
5.9. Creating the workload pods Copy linkLink copied to clipboard!
Use the procedures in this section to create the workload pods for the shared and host devices.
5.9.2. Creating a host device RDMA on RoCE Copy linkLink copied to clipboard!
Create the workload pods for a host device Remote Direct Memory Access (RDMA) for the NVIDIA Network Operator and test the pod configuration.
Prerequisites
- Ensure that the Operator is running.
-
Delete the
NicClusterPolicy
custom resource (CR), if it exists.
Procedure
Generate a new host device
NicClusterPolicy
(CR), as shown below:Copy to Clipboard Copied! Toggle word wrap Toggle overflow Create the
NicClusterPolicy
CR on the cluster by using the following command:oc create -f network-hostdev-nic-cluster-policy.yaml
$ oc create -f network-hostdev-nic-cluster-policy.yaml
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
nicclusterpolicy.mellanox.com/nic-cluster-policy created
nicclusterpolicy.mellanox.com/nic-cluster-policy created
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Verify that the host device
NicClusterPolicy
CR by using the following command in the DOCA/MOFED container:oc get pods -n nvidia-network-operator
$ oc get pods -n nvidia-network-operator
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Confirm that the resources appear in the cluster
oc describe node
section by using the following command:oc describe node -l node-role.kubernetes.io/worker=| grep -E 'Capacity:|Allocatable:' -A7
$ oc describe node -l node-role.kubernetes.io/worker=| grep -E 'Capacity:|Allocatable:' -A7
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Create a
HostDeviceNetwork
CR file:Copy to Clipboard Copied! Toggle word wrap Toggle overflow Create the
HostDeviceNetwork
resource on the cluster by using the following command:oc create -f hostdev-network.yaml
$ oc create -f hostdev-network.yaml
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
hostdevicenetwork.mellanox.com/hostdev-net created
hostdevicenetwork.mellanox.com/hostdev-net created
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Confirm that the resources appear in the cluster
oc describe node
section by using the following command:oc describe node -l node-role.kubernetes.io/worker=| grep -E 'Capacity:|Allocatable:' -A8
$ oc describe node -l node-role.kubernetes.io/worker=| grep -E 'Capacity:|Allocatable:' -A8
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
5.9.3. Creating a SR-IOV legacy mode RDMA on RoCE Copy linkLink copied to clipboard!
Configure a Single Root I/O Virtualization (SR-IOV) legacy mode host device RDMA on RoCE.
Procedure
Generate a new host device
NicClusterPolicy
custom resource (CR):Copy to Clipboard Copied! Toggle word wrap Toggle overflow Create the policy on the cluster by using the following command:
oc create -f network-sriovleg-nic-cluster-policy.yaml
$ oc create -f network-sriovleg-nic-cluster-policy.yaml
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
nicclusterpolicy.mellanox.com/nic-cluster-policy created
nicclusterpolicy.mellanox.com/nic-cluster-policy created
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Verify the pods by using the following command in the DOCA/MOFED container:
oc get pods -n nvidia-network-operator
$ oc get pods -n nvidia-network-operator
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
NAME READY STATUS RESTARTS AGE mofed-rhcos4.16-696886fcb4-ds-4mb42 2/2 Running 0 40s mofed-rhcos4.16-696886fcb4-ds-8knwq 2/2 Running 0 40s nvidia-network-operator-controller-manager-68d547dbbd-qsdkf 1/1 Running 13 (4d ago) 4d21h
NAME READY STATUS RESTARTS AGE mofed-rhcos4.16-696886fcb4-ds-4mb42 2/2 Running 0 40s mofed-rhcos4.16-696886fcb4-ds-8knwq 2/2 Running 0 40s nvidia-network-operator-controller-manager-68d547dbbd-qsdkf 1/1 Running 13 (4d ago) 4d21h
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Create an
SriovNetworkNodePolicy
CR that generates the Virtual Functions (VFs) for the device you want to operate in SR-IOV legacy mode. See the following example:Copy to Clipboard Copied! Toggle word wrap Toggle overflow Create the CR on the cluster by using the following command:
NoteEnsure that SR-IOV Global Enable is enabled. For more information, see Unable to enable SR-IOV and receiving the message "not enough MMIO resources for SR-IOV" in Red Hat Enterprise Linux.
oc create -f sriov-network-node-policy.yaml
$ oc create -f sriov-network-node-policy.yaml
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
sriovnetworknodepolicy.sriovnetwork.openshift.io/sriov-legacy-policy created
sriovnetworknodepolicy.sriovnetwork.openshift.io/sriov-legacy-policy created
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Each node has scheduling disabled. The nodes reboot to apply the configuration. You can view the nodes by using the following command:
oc get nodes
$ oc get nodes
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
NAME STATUS ROLES AGE VERSION edge-19.edge.lab.eng.rdu2.redhat.com Ready control-plane,master,worker 5d v1.29.8+632b078 nvd-srv-32.nvidia.eng.rdu2.dc.redhat.com Ready worker 4d22h v1.29.8+632b078 nvd-srv-33.nvidia.eng.rdu2.dc.redhat.com NotReady,SchedulingDisabled worker 4d22h v1.29.8+632b078
NAME STATUS ROLES AGE VERSION edge-19.edge.lab.eng.rdu2.redhat.com Ready control-plane,master,worker 5d v1.29.8+632b078 nvd-srv-32.nvidia.eng.rdu2.dc.redhat.com Ready worker 4d22h v1.29.8+632b078 nvd-srv-33.nvidia.eng.rdu2.dc.redhat.com NotReady,SchedulingDisabled worker 4d22h v1.29.8+632b078
Copy to Clipboard Copied! Toggle word wrap Toggle overflow After the nodes have rebooted, verify that the VF interfaces exist by opening up a debug pod on each node. Run the following command:
a$ oc debug node/nvd-srv-33.nvidia.eng.rdu2.dc.redhat.com
a$ oc debug node/nvd-srv-33.nvidia.eng.rdu2.dc.redhat.com
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
Copy to Clipboard Copied! Toggle word wrap Toggle overflow - Repeat the previous steps on the second node, if necessary.
Optional: Confirm that the resources appear in the cluster
oc describe node
section by using the following command:oc describe node -l node-role.kubernetes.io/worker=| grep -E 'Capacity:|Allocatable:' -A8
$ oc describe node -l node-role.kubernetes.io/worker=| grep -E 'Capacity:|Allocatable:' -A8
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
Copy to Clipboard Copied! Toggle word wrap Toggle overflow After the VFs for SR-IOV legacy mode are in place, generate the
SriovNetwork
CR file. See the following example:Copy to Clipboard Copied! Toggle word wrap Toggle overflow Create the custom resource on the cluster by using the following command:
oc create -f sriov-network.yaml
$ oc create -f sriov-network.yaml
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
sriovnetwork.sriovnetwork.openshift.io/sriov-network created
sriovnetwork.sriovnetwork.openshift.io/sriov-network created
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
5.10. Verifying RDMA connectivity Copy linkLink copied to clipboard!
Confirm Remote Direct Memory Access (RDMA) connectivity is working between the systems, specifically for Legacy Single Root I/O Virtualization (SR-IOV) Ethernet.
Procedure
Connect to each
rdma-workload-client
pod by using the following command:oc rsh -n default rdma-sriov-32-workload
$ oc rsh -n default rdma-sriov-32-workload
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
sh-5.1#
sh-5.1#
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Check the IP address assigned to the first workload pod by using the following command. In this example, the first workload pod is the RDMA test server.
ip a
sh-5.1# ip a
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
Copy to Clipboard Copied! Toggle word wrap Toggle overflow The IP address of the RDMA server assigned to this pod is the
net1
interface. In this example, the IP address is192.168.4.225
.Run the
ibstatus
command to get thelink_layer
type, Ethernet or Infiniband, associated with each RDMA devicemlx5_x
. The output also shows the status of all of the RDMA devices by checking thestate
field, which shows eitherACTIVE
orDOWN
.ibstatus
sh-5.1# ibstatus
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
Copy to Clipboard Copied! Toggle word wrap Toggle overflow To get the
link_layer
for each RDMAmlx5
device on your worker node, run theibstat
command:ibstat | egrep "Port|Base|Link"
sh-5.1# ibstat | egrep "Port|Base|Link"
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
Copy to Clipboard Copied! Toggle word wrap Toggle overflow For RDMA Shared Device or Host Device workload pods, the RDMA device named
mlx5_x
is already known and is typicallymlx5_0
ormlx5_1
. For RDMA Legacy SR-IOV workload pods, you need to determine which RDMA device is associated with which Virtual Function (VF) subinterface. Provide this information by using the following command:rdma link show
sh-5.1# rdma link show
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
Copy to Clipboard Copied! Toggle word wrap Toggle overflow In this example, the RDMA device names
mlx5_7
is associated with thenet1
interface. This output is used in the next command to perform the RDMA bandwidth test, which also verifies RDMA connectivity between worker nodes.Run the following
ib_write_bw
RDMA bandwidth test command:/root/perftest/ib_write_bw -R -T 41 -s 65536 -F -x 3 -m 4096 --report_gbits -q 16 -D 60 -d mlx5_7 -p 10000 --source_ip 192.168.4.225 --use_cuda=0 --use_cuda_dmabuf
sh-5.1# /root/perftest/ib_write_bw -R -T 41 -s 65536 -F -x 3 -m 4096 --report_gbits -q 16 -D 60 -d mlx5_7 -p 10000 --source_ip 192.168.4.225 --use_cuda=0 --use_cuda_dmabuf
Copy to Clipboard Copied! Toggle word wrap Toggle overflow where:
-
The
mlx5_7
RDMA device is passed in the-d
switch. -
The source IP address is
192.168.4.225
to start the RDMA server. -
The
--use_cuda=0
,--use_cuda_dmabuf
switches indicate that the use of GPUDirect RDMA.
Example output
Copy to Clipboard Copied! Toggle word wrap Toggle overflow -
The
Open another terminal window and run
oc rsh
command on the second workload pod that acts as the RDMA test client pod:oc rsh -n default rdma-sriov-33-workload
$ oc rsh -n default rdma-sriov-33-workload
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
sh-5.1#
sh-5.1#
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Obtain the RDMA test client pod IP address from the
net1
interface by using the following command:ip a
sh-5.1# ip a
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Obtain the
link_layer
type associated with each RDMA devicemlx5_x
by using the following command:ibstatus
sh-5.1# ibstatus
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Optional: Obtain the firmware version of Mellanox cards by using the
ibstat
command:ibstat
sh-5.1# ibstat
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
Copy to Clipboard Copied! Toggle word wrap Toggle overflow To determine which RDMA device is associated with the Virtual Function subinterface that the client workload pod uses, run the following command. In this example, the
net1
interface is using the RDMA devicemlx5_2
.rdma link show
sh-5.1# rdma link show
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Run the following
ib_write_bw
RDMA bandwidth test command:/root/perftest/ib_write_bw -R -T 41 -s 65536 -F -x 3 -m 4096 --report_gbits -q 16 -D 60 -d mlx5_2 -p 10000 --source_ip 192.168.4.226 --use_cuda=0 --use_cuda_dmabuf 192.168.4.225
sh-5.1# /root/perftest/ib_write_bw -R -T 41 -s 65536 -F -x 3 -m 4096 --report_gbits -q 16 -D 60 -d mlx5_2 -p 10000 --source_ip 192.168.4.226 --use_cuda=0 --use_cuda_dmabuf 192.168.4.225
Copy to Clipboard Copied! Toggle word wrap Toggle overflow where:
-
The
mlx5_2
RDMA device is passed in the-d
switch. -
The source IP address
192.168.4.226
and the destination IP address of the RDMA server192.168.4.225
. The
--use_cuda=0
,--use_cuda_dmabuf
switches indicate that the use of GPUDirect RDMA.Example output
Copy to Clipboard Copied! Toggle word wrap Toggle overflow A positive test is seeing an expected BW average and MsgRate in Mpps.
Upon completion of the
ib_write_bw
command, the server side output also appears on the server pod. See the following example:Example output
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
-
The
Chapter 6. Dynamic Accelerator Slicer (DAS) Operator Copy linkLink copied to clipboard!
Dynamic Accelerator Slicer Operator is a Technology Preview feature only. Technology Preview features are not supported with Red Hat production service level agreements (SLAs) and might not be functionally complete. Red Hat does not recommend using them in production. These features provide early access to upcoming product features, enabling customers to test functionality and provide feedback during the development process.
For more information about the support scope of Red Hat Technology Preview features, see Technology Preview Features Support Scope.
The Dynamic Accelerator Slicer (DAS) Operator allows you to dynamically slice GPU accelerators in OpenShift Container Platform, instead of relying on statically sliced GPUs defined when the node is booted. This allows you to dynamically slice GPUs based on specific workload demands, ensuring efficient resource utilization.
Dynamic slicing is useful if you do not know all the accelerator partitions needed in advance on every node on the cluster.
The DAS Operator currently includes a reference implementation for NVIDIA Multi-Instance GPU (MIG) and is designed to support additional technologies such as NVIDIA MPS or GPUs from other vendors in the future.
Limitations
The following limitations apply when using the Dynamic Accelerator Slicer Operator:
- You need to identify potential incompatibilities and ensure the system works seamlessly with various GPU drivers and operating systems.
- The Operator only works with specific MIG compatible NVIDIA GPUs and drivers, such as H100 and A100.
- The Operator cannot use only a subset of the GPUs of a node.
- The NVIDIA device plugin cannot be used together with the Dynamic Accelerator Slicer Operator to manage the GPU resources of a cluster.
The DAS Operator is designed to work with MIG-enabled GPUs. It allocates MIG slices instead of whole GPUs. Installing the DAS Operator prevents the use of the standard resource request through the NVIDIA device plugin such as nvidia.com/gpu: "1"
, for allocating the entire GPU.
6.1. Installing the Dynamic Accelerator Slicer Operator Copy linkLink copied to clipboard!
As a cluster administrator, you can install the Dynamic Accelerator Slicer (DAS) Operator by using the OpenShift Container Platform web console or the OpenShift CLI.
6.1.1. Installing the Dynamic Accelerator Slicer Operator using the web console Copy linkLink copied to clipboard!
As a cluster administrator, you can install the Dynamic Accelerator Slicer (DAS) Operator using the OpenShift Container Platform web console.
Prerequisites
-
You have access to an OpenShift Container Platform cluster using an account with
cluster-admin
permissions. You have installed the required prerequisites:
- cert-manager Operator for Red Hat OpenShift
- Node Feature Discovery (NFD) Operator
- NVIDIA GPU Operator
- NodeFeatureDiscovery CR
Procedure
Configure the NVIDIA GPU Operator for MIG support:
- In the OpenShift Container Platform web console, navigate to Operators → Installed Operators.
- Select the NVIDIA GPU Operator from the list of installed operators.
- Click the ClusterPolicy tab and then click Create ClusterPolicy.
In the YAML editor, replace the default content with the following cluster policy configuration to disable the default NVIDIA device plugin and enable MIG support:
Copy to Clipboard Copied! Toggle word wrap Toggle overflow - Click Create to apply the cluster policy.
-
Navigate to Workloads → Pods and select the
nvidia-gpu-operator
namespace to monitor the cluster policy deployment. Wait for the NVIDIA GPU Operator cluster policy to reach the
Ready
state. You can monitor this by:- Navigating to Operators → Installed Operators → NVIDIA GPU Operator.
-
Clicking the ClusterPolicy tab and checking that the status shows
ready
.
-
Verify that all pods in the NVIDIA GPU Operator namespace are running by selecting the
nvidia-gpu-operator
namespace and navigating to Workloads → Pods. Label nodes with MIG-capable GPUs to enable MIG mode:
- Navigate to Compute → Nodes.
- Select a node that has MIG-capable GPUs.
- Click Actions → Edit Labels.
-
Add the label
nvidia.com/mig.config=all-enabled
. - Click Save.
Repeat for each node with MIG-capable GPUs.
ImportantAfter applying the MIG label, the labeled nodes will reboot to enable MIG mode. Wait for the nodes to come back online before proceeding.
-
Verify that MIG mode is successfully enabled on the GPU nodes by checking that the
nvidia.com/mig.config=all-enabled
label appears in the Labels section. To locate the label, navigate to Compute → Nodes, select the GPU node, and click the Details tab.
- In the OpenShift Container Platform web console, click Operators → OperatorHub.
- Search for Dynamic Accelerator Slicer or DAS in the filter box to locate the DAS Operator.
- Select the Dynamic Accelerator Slicer and click Install.
On the Install Operator page:
- Select All namespaces on the cluster (default) for the installation mode.
- Select Installed Namespace → Operator recommended Namespace: Project das-operator.
-
If creating a new namespace, enter
das-operator
as the namespace name. - Select an update channel.
- Select Automatic or Manual for the approval strategy.
- Click Install.
- In the OpenShift Container Platform web console, click Operators → Installed Operators.
- Select DAS Operator from the list.
- In the Provided APIs table column, click DASOperator. This takes you to the DASOperator tab of the Operator details page.
- Click Create DASOperator. This takes you to the Create DASOperator YAML view.
In the YAML editor, paste the following example:
Example
DASOperator
CRCopy to Clipboard Copied! Toggle word wrap Toggle overflow - 1
- The name of the
DASOperator
CR must becluster
.
- Click Create.
Verification
To verify that the DAS Operator installed successfully:
- Navigate to the Operators → Installed Operators page.
-
Ensure that Dynamic Accelerator Slicer is listed in the
das-operator
namespace with a Status of Succeeded.
To verify that the DASOperator
CR installed successfully:
-
After you create the
DASOperator
CR, the web console brings you to the DASOperator list view. The Status field of the CR changes to Available when all of the components are running. Optional. You can verify that the
DASOperator
CR installed successfully by running the following command in the OpenShift CLI:oc get dasoperator -n das-operator
$ oc get dasoperator -n das-operator
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
NAME STATUS AGE cluster Available 3m
NAME STATUS AGE cluster Available 3m
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
During installation an Operator might display a Failed status. If the installation later succeeds with an Succeeded message, you can ignore the Failed message.
You can also verify the installation by checking the pods:
-
Navigate to the Workloads → Pods page and select the
das-operator
namespace. Verify that all DAS Operator component pods are running:
-
das-operator
pods (main operator controllers) -
das-operator-webhook
pods (webhook servers) -
das-scheduler
pods (scheduler plugins) -
das-daemonset
pods (only on nodes with MIG-compatible GPUs)
-
The das-daemonset
pods will only appear on nodes that have MIG-compatible GPU hardware. If you do not see any daemonset pods, verify that your cluster has nodes with supported GPU hardware and that the NVIDIA GPU Operator is properly configured.
Troubleshooting
Use the following procedure if the Operator does not appear to be installed:
- Navigate to the Operators → Installed Operators page and inspect the Operator Subscriptions and Install Plans tabs for any failure or errors under Status.
-
Navigate to the Workloads → Pods page and check the logs for pods in the
das-operator
namespace.
6.1.2. Installing the Dynamic Accelerator Slicer Operator using the CLI Copy linkLink copied to clipboard!
As a cluster administrator, you can install the Dynamic Accelerator Slicer (DAS) Operator using the OpenShift CLI.
Prerequisites
-
You have access to an OpenShift Container Platform cluster using an account with
cluster-admin
permissions. -
You have installed the OpenShift CLI (
oc
). You have installed the required prerequisites:
- cert-manager Operator for Red Hat OpenShift
- Node Feature Discovery (NFD) Operator
- NVIDIA GPU Operator
- NodeFeatureDiscovery CR
Procedure
Configure the NVIDIA GPU Operator for MIG support:
Apply the following cluster policy to disable the default NVIDIA device plugin and enable MIG support. Create a file named
gpu-cluster-policy.yaml
with the following content:Copy to Clipboard Copied! Toggle word wrap Toggle overflow Apply the cluster policy by running the following command:
oc apply -f gpu-cluster-policy.yaml
$ oc apply -f gpu-cluster-policy.yaml
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Verify the NVIDIA GPU Operator cluster policy reaches the
Ready
state by running the following command:oc get clusterpolicies.nvidia.com gpu-cluster-policy -w
$ oc get clusterpolicies.nvidia.com gpu-cluster-policy -w
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Wait until the
STATUS
column showsready
.Example output
NAME STATUS AGE gpu-cluster-policy ready 2025-08-14T08:56:45Z
NAME STATUS AGE gpu-cluster-policy ready 2025-08-14T08:56:45Z
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Verify that all pods in the NVIDIA GPU Operator namespace are running by running the following command:
oc get pods -n nvidia-gpu-operator
$ oc get pods -n nvidia-gpu-operator
Copy to Clipboard Copied! Toggle word wrap Toggle overflow All pods should show a
Running
orCompleted
status.Label nodes with MIG-capable GPUs to enable MIG mode by running the following command:
oc label node $NODE_NAME nvidia.com/mig.config=all-enabled --overwrite
$ oc label node $NODE_NAME nvidia.com/mig.config=all-enabled --overwrite
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Replace
$NODE_NAME
with the name of each node that has MIG-capable GPUs.ImportantAfter applying the MIG label, the labeled nodes reboot to enable MIG mode. Wait for the nodes to come back online before proceeding.
Verify that the nodes have successfully enabled MIG mode by running the following command:
oc get nodes -l nvidia.com/mig.config=all-enabled
$ oc get nodes -l nvidia.com/mig.config=all-enabled
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
Create a namespace for the DAS Operator:
Create the following
Namespace
custom resource (CR) that defines thedas-operator
namespace, and save the YAML in thedas-namespace.yaml
file:Copy to Clipboard Copied! Toggle word wrap Toggle overflow Create the namespace by running the following command:
oc create -f das-namespace.yaml
$ oc create -f das-namespace.yaml
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
Install the DAS Operator in the namespace you created in the previous step by creating the following objects:
Create the following
OperatorGroup
CR and save the YAML in thedas-operatorgroup.yaml
file:Copy to Clipboard Copied! Toggle word wrap Toggle overflow Create the
OperatorGroup
CR by running the following command:oc create -f das-operatorgroup.yaml
$ oc create -f das-operatorgroup.yaml
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Create the following
Subscription
CR and save the YAML in thedas-sub.yaml
file:Example Subscription
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Create the subscription object by running the following command:
oc create -f das-sub.yaml
$ oc create -f das-sub.yaml
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Change to the
das-operator
project:oc project das-operator
$ oc project das-operator
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Create the following
DASOperator
CR and save the YAML in thedas-dasoperator.yaml
file:Example
DASOperator
CRCopy to Clipboard Copied! Toggle word wrap Toggle overflow - 1
- The name of the
DASOperator
CR must becluster
.
Create the
dasoperator
CR by running the following command:oc create -f das-dasoperator.yaml
oc create -f das-dasoperator.yaml
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
Verification
Verify that the Operator deployment is successful by running the following command:
oc get pods
$ oc get pods
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
Copy to Clipboard Copied! Toggle word wrap Toggle overflow A successful deployment shows all pods with a
Running
status. The deployment includes:- das-operator
- Main Operator controller pods
- das-operator-webhook
- Webhook server pods for mutating pod requests
- das-scheduler
- Scheduler plugin pods for MIG slice allocation
- das-daemonset
Daemonset pods that run only on nodes with MIG-compatible GPUs
NoteThe
das-daemonset
pods only appear on nodes that have MIG-compatible GPU hardware. If you do not see any daemonset pods, verify that your cluster has nodes with supported GPU hardware and that the NVIDIA GPU Operator is properly configured.
6.2. Uninstalling the Dynamic Accelerator Slicer Operator Copy linkLink copied to clipboard!
Use one of the following procedures to uninstall the Dynamic Accelerator Slicer (DAS) Operator, depending on how the Operator was installed.
6.2.1. Uninstalling the Dynamic Accelerator Slicer Operator using the web console Copy linkLink copied to clipboard!
You can uninstall the Dynamic Accelerator Slicer (DAS) Operator using the OpenShift Container Platform web console.
Prerequisites
-
You have access to an OpenShift Container Platform cluster using an account with
cluster-admin
permissions. - The DAS Operator is installed in your cluster.
Procedure
- In the OpenShift Container Platform web console, navigate to Operators → Installed Operators.
- Locate the Dynamic Accelerator Slicer in the list of installed Operators.
-
Click the Options menu
for the DAS Operator and select Uninstall Operator.
- In the confirmation dialog, click Uninstall to confirm the removal.
- Navigate to Home → Projects.
- Search for das-operator in the search box to locate the DAS Operator project.
-
Click the Options menu
next to the das-operator project, and select Delete Project.
-
In the confirmation dialog, type
das-operator
in the dialog box, and click Delete to confirm the deletion.
Verification
- Navigate to the Operators → Installed Operators page.
- Verify that the Dynamic Accelerator Slicer (DAS) Operator is no longer listed.
Optional. Verify that the
das-operator
namespace and its resources have been removed by running the following command:oc get namespace das-operator
$ oc get namespace das-operator
Copy to Clipboard Copied! Toggle word wrap Toggle overflow The command should return an error indicating that the namespace is not found.
Uninstalling the DAS Operator removes all GPU slice allocations and might cause running workloads that depend on GPU slices to fail. Ensure that no critical workloads are using GPU slices before proceeding with the uninstallation.
6.2.2. Uninstalling the Dynamic Accelerator Slicer Operator using the CLI Copy linkLink copied to clipboard!
You can uninstall the Dynamic Accelerator Slicer (DAS) Operator using the OpenShift CLI.
Prerequisites
-
You have access to an OpenShift Container Platform cluster using an account with
cluster-admin
permissions. -
You have installed the OpenShift CLI (
oc
). - The DAS Operator is installed in your cluster.
Procedure
List the installed operators to find the DAS Operator subscription by running the following command:
oc get subscriptions -n das-operator
$ oc get subscriptions -n das-operator
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
NAME PACKAGE SOURCE CHANNEL das-operator das-operator redhat-operators stable
NAME PACKAGE SOURCE CHANNEL das-operator das-operator redhat-operators stable
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Delete the subscription by running the following command:
oc delete subscription das-operator -n das-operator
$ oc delete subscription das-operator -n das-operator
Copy to Clipboard Copied! Toggle word wrap Toggle overflow List and delete the cluster service version (CSV) by running the following commands:
oc get csv -n das-operator
$ oc get csv -n das-operator
Copy to Clipboard Copied! Toggle word wrap Toggle overflow oc delete csv <csv-name> -n das-operator
$ oc delete csv <csv-name> -n das-operator
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Remove the operator group by running the following command:
oc delete operatorgroup das-operator -n das-operator
$ oc delete operatorgroup das-operator -n das-operator
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Delete any remaining
AllocationClaim
resources by running the following command:oc delete allocationclaims --all -n das-operator
$ oc delete allocationclaims --all -n das-operator
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Remove the DAS Operator namespace by running the following command:
oc delete namespace das-operator
$ oc delete namespace das-operator
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
Verification
Verify that the DAS Operator resources have been removed by running the following command:
oc get namespace das-operator
$ oc get namespace das-operator
Copy to Clipboard Copied! Toggle word wrap Toggle overflow The command should return an error indicating that the namespace is not found.
Verify that no
AllocationClaim
custom resource definitions remain by running the following command:oc get crd | grep allocationclaim
$ oc get crd | grep allocationclaim
Copy to Clipboard Copied! Toggle word wrap Toggle overflow The command should return an error indicating that no custom resource definitions are found.
Uninstalling the DAS Operator removes all GPU slice allocations and might cause running workloads that depend on GPU slices to fail. Ensure that no critical workloads are using GPU slices before proceeding with the uninstallation.
6.3. Deploying GPU workloads with the Dynamic Accelerator Slicer Operator Copy linkLink copied to clipboard!
You can deploy workloads that request GPU slices managed by the Dynamic Accelerator Slicer (DAS) Operator. The Operator dynamically partitions GPU accelerators and schedules workloads to available GPU slices.
Prerequisites
- You have MIG supported GPU hardware available in your cluster.
-
The NVIDIA GPU Operator is installed and the
ClusterPolicy
shows a Ready state. - You have installed the DAS Operator.
Procedure
Create a namespace by running the following command:
oc new-project cuda-workloads
oc new-project cuda-workloads
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Create a deployment that requests GPU resources using the NVIDIA MIG resource:
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Apply the deployment configuration by running the following command:
oc apply -f cuda-vectoradd-deployment.yaml
$ oc apply -f cuda-vectoradd-deployment.yaml
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Verify that the deployment is created and pods are scheduled by running the following command:
oc get deployment cuda-vectoradd
$ oc get deployment cuda-vectoradd
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
NAME READY UP-TO-DATE AVAILABLE AGE cuda-vectoradd 2/2 2 2 2m
NAME READY UP-TO-DATE AVAILABLE AGE cuda-vectoradd 2/2 2 2 2m
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Check the status of the pods by running the following command:
oc get pods -l app=cuda-vectoradd
$ oc get pods -l app=cuda-vectoradd
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
NAME READY STATUS RESTARTS AGE cuda-vectoradd-6b8c7d4f9b-abc12 1/1 Running 0 2m cuda-vectoradd-6b8c7d4f9b-def34 1/1 Running 0 2m
NAME READY STATUS RESTARTS AGE cuda-vectoradd-6b8c7d4f9b-abc12 1/1 Running 0 2m cuda-vectoradd-6b8c7d4f9b-def34 1/1 Running 0 2m
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
Verification
Check that
AllocationClaim
resources were created for your deployment pods by running the following command:oc get allocationclaims -n das-operator
$ oc get allocationclaims -n das-operator
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
NAME AGE 13950288-57df-4ab5-82bc-6138f646633e-harpatil000034jma-qh5fm-worker-f-57md9-cuda-vectoradd-0 2m ce997b60-a0b8-4ea4-9107-cf59b425d049-harpatil000034jma-qh5fm-worker-f-fl4wg-cuda-vectoradd-0 2m
NAME AGE 13950288-57df-4ab5-82bc-6138f646633e-harpatil000034jma-qh5fm-worker-f-57md9-cuda-vectoradd-0 2m ce997b60-a0b8-4ea4-9107-cf59b425d049-harpatil000034jma-qh5fm-worker-f-fl4wg-cuda-vectoradd-0 2m
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Verify that the GPU slices are properly allocated by checking one of the pod’s resource allocation by running the following command:
oc describe pod -l app=cuda-vectoradd
$ oc describe pod -l app=cuda-vectoradd
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Check the logs to verify the CUDA sample application runs successfully by running the following command:
oc logs -l app=cuda-vectoradd
$ oc logs -l app=cuda-vectoradd
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
[Vector addition of 50000 elements] Copy input data from the host memory to the CUDA device CUDA kernel launch with 196 blocks of 256 threads Copy output data from the CUDA device to the host memory Test PASSED
[Vector addition of 50000 elements] Copy input data from the host memory to the CUDA device CUDA kernel launch with 196 blocks of 256 threads Copy output data from the CUDA device to the host memory Test PASSED
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Check the environment variables to verify that the GPU devices are properly exposed to the container by running the following command:
oc exec deployment/cuda-vectoradd -- env | grep -E "(NVIDIA_VISIBLE_DEVICES|CUDA_VISIBLE_DEVICES)"
$ oc exec deployment/cuda-vectoradd -- env | grep -E "(NVIDIA_VISIBLE_DEVICES|CUDA_VISIBLE_DEVICES)"
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
NVIDIA_VISIBLE_DEVICES=MIG-d8ac9850-d92d-5474-b238-0afeabac1652 CUDA_VISIBLE_DEVICES=MIG-d8ac9850-d92d-5474-b238-0afeabac1652
NVIDIA_VISIBLE_DEVICES=MIG-d8ac9850-d92d-5474-b238-0afeabac1652 CUDA_VISIBLE_DEVICES=MIG-d8ac9850-d92d-5474-b238-0afeabac1652
Copy to Clipboard Copied! Toggle word wrap Toggle overflow These environment variables indicate that the GPU MIG slice has been properly allocated and is visible to the CUDA runtime within the container.
6.4. Troubleshooting the Dynamic Accelerator Slicer Operator Copy linkLink copied to clipboard!
If you experience issues with the Dynamic Accelerator Slicer (DAS) Operator, use the following troubleshooting steps to diagnose and resolve problems.
Prerequisites
- You have installed the DAS Operator.
- You have access to the OpenShift Container Platform cluster as a user with the cluster-admin role.
6.4.1. Debugging DAS Operator components Copy linkLink copied to clipboard!
Procedure
Check the status of all DAS Operator components by running the following command:
oc get pods -n das-operator
$ oc get pods -n das-operator
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Inspect the logs of the DAS Operator controller by running the following command:
oc logs -n das-operator deployment/das-operator
$ oc logs -n das-operator deployment/das-operator
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Check the logs of the webhook server by running the following command:
oc logs -n das-operator deployment/das-operator-webhook
$ oc logs -n das-operator deployment/das-operator-webhook
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Check the logs of the scheduler plugin by running the following command:
oc logs -n das-operator deployment/das-scheduler
$ oc logs -n das-operator deployment/das-scheduler
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Check the logs of the device plugin daemonset by running the following command:
oc logs -n das-operator daemonset/das-daemonset
$ oc logs -n das-operator daemonset/das-daemonset
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
6.4.2. Monitoring AllocationClaims Copy linkLink copied to clipboard!
Procedure
Inspect active
AllocationClaim
resources by running the following command:oc get allocationclaims -n das-operator
$ oc get allocationclaims -n das-operator
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
NAME AGE 13950288-57df-4ab5-82bc-6138f646633e-harpatil000034jma-qh5fm-worker-f-57md9-cuda-vectoradd-0 5m ce997b60-a0b8-4ea4-9107-cf59b425d049-harpatil000034jma-qh5fm-worker-f-fl4wg-cuda-vectoradd-0 5m
NAME AGE 13950288-57df-4ab5-82bc-6138f646633e-harpatil000034jma-qh5fm-worker-f-57md9-cuda-vectoradd-0 5m ce997b60-a0b8-4ea4-9107-cf59b425d049-harpatil000034jma-qh5fm-worker-f-fl4wg-cuda-vectoradd-0 5m
Copy to Clipboard Copied! Toggle word wrap Toggle overflow View detailed information about a specific
AllocationClaim
by running the following command:oc get allocationclaims -n das-operator -o yaml
$ oc get allocationclaims -n das-operator -o yaml
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example output (truncated)
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Check for claims in different states by running the following command:
oc get allocationclaims -n das-operator -o jsonpath='{range .items[*]}{.metadata.name}{"\t"}{.status.state}{"\n"}{end}'
$ oc get allocationclaims -n das-operator -o jsonpath='{range .items[*]}{.metadata.name}{"\t"}{.status.state}{"\n"}{end}'
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
13950288-57df-4ab5-82bc-6138f646633e-harpatil000034jma-qh5fm-worker-f-57md9-cuda-vectoradd-0 inUse ce997b60-a0b8-4ea4-9107-cf59b425d049-harpatil000034jma-qh5fm-worker-f-fl4wg-cuda-vectoradd-0 inUse
13950288-57df-4ab5-82bc-6138f646633e-harpatil000034jma-qh5fm-worker-f-57md9-cuda-vectoradd-0 inUse ce997b60-a0b8-4ea4-9107-cf59b425d049-harpatil000034jma-qh5fm-worker-f-fl4wg-cuda-vectoradd-0 inUse
Copy to Clipboard Copied! Toggle word wrap Toggle overflow View events related to
AllocationClaim
resources by running the following command:oc get events -n das-operator --field-selector involvedObject.kind=AllocationClaim
$ oc get events -n das-operator --field-selector involvedObject.kind=AllocationClaim
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Check
NodeAccelerator
resources to verify GPU hardware detection by running the following command:oc get nodeaccelerator -n das-operator
$ oc get nodeaccelerator -n das-operator
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
NAME AGE harpatil000034jma-qh5fm-worker-f-57md9 96m harpatil000034jma-qh5fm-worker-f-fl4wg 96m
NAME AGE harpatil000034jma-qh5fm-worker-f-57md9 96m harpatil000034jma-qh5fm-worker-f-fl4wg 96m
Copy to Clipboard Copied! Toggle word wrap Toggle overflow The
NodeAccelerator
resources represent the GPU-capable nodes detected by the DAS Operator.
Additional information
The AllocationClaim
custom resource tracks the following information:
- GPU UUID
- The unique identifier of the GPU device.
- Slice position
- The position of the MIG slice on the GPU.
- Pod reference
- The pod that requested the GPU slice.
- State
-
The current state of the claim (
staged
,created
, orreleased
).
Claims start in the staged
state and transition to created
when all requests are satisfied. When a pod is deleted, the associated claim is automatically cleaned up.
6.4.3. Verifying GPU device availability Copy linkLink copied to clipboard!
Procedure
On a node with GPU hardware, verify that CDI devices were created by running the following command:
oc debug node/<node-name>
$ oc debug node/<node-name>
Copy to Clipboard Copied! Toggle word wrap Toggle overflow chroot /host ls -l /var/run/cdi/
sh-4.4# chroot /host sh-4.4# ls -l /var/run/cdi/
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Check the NVIDIA GPU Operator status by running the following command:
oc get clusterpolicies.nvidia.com -o jsonpath='{.items[0].status.state}'
$ oc get clusterpolicies.nvidia.com -o jsonpath='{.items[0].status.state}'
Copy to Clipboard Copied! Toggle word wrap Toggle overflow The output should show
ready
.
6.4.4. Increasing log verbosity Copy linkLink copied to clipboard!
Procedure
To get more detailed debugging information:
Edit the
DASOperator
resource to increase log verbosity by running the following command:oc edit dasoperator -n das-operator
$ oc edit dasoperator -n das-operator
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Set the
operatorLogLevel
field toDebug
orTrace
:spec: operatorLogLevel: Debug
spec: operatorLogLevel: Debug
Copy to Clipboard Copied! Toggle word wrap Toggle overflow - Save the changes and verify that the operator pods restart with increased verbosity.
6.4.5. Common issues and solutions Copy linkLink copied to clipboard!
Due to kubernetes/kubernetes#128043, pods might enter an UnexpectedAdmissionError
state if admission fails. Pods managed by higher level controllers such as Deployments are recreated automatically. Naked pods, however, must be cleaned up manually with oc delete pod
. Using controllers is recommended until the upstream issue is resolved.
Prerequisites not met
If the DAS Operator fails to start or function properly, verify that all prerequisites are installed:
- Cert-manager
- Node Feature Discovery (NFD) Operator
- NVIDIA GPU Operator
Legal Notice
Copy linkLink copied to clipboard!
Copyright © 2025 Red Hat
OpenShift documentation is licensed under the Apache License 2.0 (https://www.apache.org/licenses/LICENSE-2.0).
Modified versions must remove all Red Hat trademarks.
Portions adapted from https://github.com/kubernetes-incubator/service-catalog/ with modifications by Red Hat.
Red Hat, Red Hat Enterprise Linux, the Red Hat logo, the Shadowman logo, JBoss, OpenShift, Fedora, the Infinity logo, and RHCE are trademarks of Red Hat, Inc., registered in the United States and other countries.
Linux® is the registered trademark of Linus Torvalds in the United States and other countries.
Java® is a registered trademark of Oracle and/or its affiliates.
XFS® is a trademark of Silicon Graphics International Corp. or its subsidiaries in the United States and/or other countries.
MySQL® is a registered trademark of MySQL AB in the United States, the European Union and other countries.
Node.js® is an official trademark of Joyent. Red Hat Software Collections is not formally related to or endorsed by the official Joyent Node.js open source or commercial project.
The OpenStack® Word Mark and OpenStack logo are either registered trademarks/service marks or trademarks/service marks of the OpenStack Foundation, in the United States and other countries and are used with the OpenStack Foundation’s permission. We are not affiliated with, endorsed or sponsored by the OpenStack Foundation, or the OpenStack community.
All other trademarks are the property of their respective owners.