Chapter 11. DPU Operator
11.1. DPU Operator Copy linkLink copied to clipboard!
As a cluster administrator, you can add the DPU Operator to your cluster to manage DPU devices and network attachments.
The DPU Operator is a Technology Preview feature only. Technology Preview features are not supported with Red Hat production service level agreements (SLAs) and might not be functionally complete. Red Hat does not recommend using them in production. These features provide early access to upcoming product features, enabling customers to test functionality and provide feedback during the development process.
For more information about the support scope of Red Hat Technology Preview features, see Technology Preview Features Support Scope.
11.1.1. Orchestrating DPUs with the DPU Operator Copy linkLink copied to clipboard!
You can use the Data Processing Unit (DPU) Operator to manage DPUs that offload networking, storage, and security workloads from host CPUs to improve cluster performance and efficiency.
A DPU is a type of programmable processor that represents one of the three fundamental pillars of computing, alongside CPUs and GPUs. While CPUs handle general computing tasks and GPUs accelerate specific workloads, the primary role of the DPU is to offload and accelerate data-centric workloads, such as networking, storage, and security functions.
DPUs are typically used in data centers and cloud environments to improve performance, reduce latency, and enhance security by offloading these tasks from the CPU. You can also use DPUs to create a more efficient and flexible infrastructure by enabling the deployment of specialized workloads closer to the data source.
The DPU Operator is responsible for managing the DPU devices and network attachments. The DPU Operator deploys the DPU daemon onto OpenShift Container Platform compute nodes that interface through an API controlling the DPU daemon running on the DPU. The DPU Operator is responsible for the life-cycle management of the ovn-kube components and the necessary host network initialization on the DPU.
The following table describes the currently supported DPU devices.
| Vendor | Device | Firmware | Description |
|---|---|---|---|
| Intel | IPU E2100 | Version 2.0.0.11126 or later | A DPU designed to offload networking, storage, and security tasks from host CPUs in data centers, improving efficiency and performance. For instructions on deploying a full end-to-end solution, see the Red Hat Knowledgebase solution Accelerating Confidential AI on OpenShift with the Intel E2100 IPU, DPU Operator, and F5 NGINX. |
| Senao | SX904 | 35.23.47.0008 or later | A SmartNIC designed to offload compute and network services from the host CPUs in data centers and edge computing environments, improving efficiency and isolation of workloads. |
| Marvell | Marvell Octeon 10 CN106 | SDK12.25.01 or later | A DPU designed to offload workloads that require high speed data processing from host CPUs in data centers and edge computing environments, improving performance and energy efficiency |
The NVIDIA BlueField-3 is not supported.
11.1.2. Installing the DPU Operator Copy linkLink copied to clipboard!
You can install the Data Processing Unit (DPU) Operator on both host and DPU clusters to manage device lifecycle and network attachments using the CLI or web console.
Cluster administrators can install the DPU Operator on the host cluster and all DPU clusters using the OpenShift Container Platform CLI or the web console. The DPU Operator manages the lifecycle, DPU devices, and network attachments for all supported DPUs."
You need to install the DPU Operator on the host cluster and each of the DPU clusters.
11.1.2.1. Installing the DPU Operator by using the CLI Copy linkLink copied to clipboard!
You can install the DPU Operator by using the CLI. You can use the DPU Operator to simplify the installation process when setting up DPU device management on host clusters.
As a cluster administrator, you can install the DPU Operator by using the CLI.
The CLI must be used to install the DPU Operator on the DPU cluster.
Prerequisites
-
Install the OpenShift CLI (
oc). -
An account with
cluster-adminprivileges.
Procedure
Create the
openshift-dpu-operatornamespace by entering the following command:Copy to Clipboard Copied! Toggle word wrap Toggle overflow Create an
OperatorGroupcustom resource (CR) by entering the following command:Copy to Clipboard Copied! Toggle word wrap Toggle overflow Create a
SubscriptionCR for the DPU Operator by entering the following command:Copy to Clipboard Copied! Toggle word wrap Toggle overflow
Verification
To verify that the Operator is installed, enter the following command and then check that output shows
Succeededfor the Operator:oc get csv -n openshift-dpu-operator \ -o custom-columns=Name:.metadata.name,Phase:.status.phase
$ oc get csv -n openshift-dpu-operator \ -o custom-columns=Name:.metadata.name,Phase:.status.phaseCopy to Clipboard Copied! Toggle word wrap Toggle overflow Change to the
openshift-dpu-operatorproject:oc project openshift-dpu-operator
$ oc project openshift-dpu-operatorCopy to Clipboard Copied! Toggle word wrap Toggle overflow Verify the DPU Operator is running by entering the following command:
oc get pods -n openshift-dpu-operator
$ oc get pods -n openshift-dpu-operatorCopy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
NAME READY STATUS RESTARTS AGE dpu-operator-controller-manager-6b7bbb5db8-7lvkj 2/2 Running 0 2m9s
NAME READY STATUS RESTARTS AGE dpu-operator-controller-manager-6b7bbb5db8-7lvkj 2/2 Running 0 2m9sCopy to Clipboard Copied! Toggle word wrap Toggle overflow
11.1.2.2. Installing the DPU Operator using the web console Copy linkLink copied to clipboard!
You can install the DPU Operator by using the web console. You can use the DPU Operator to simplify the installation process when setting up DPU device management on host clusters.
As a cluster administrator, you can install the DPU Operator by using the web console.
Prerequisites
-
Install the OpenShift CLI (
oc). -
An account with
cluster-adminprivileges.
Procedure
-
In the OpenShift Container Platform web console, click Ecosystem
Software Catalog. - Select DPU Operator from the list of available Operators, and then click Install.
On the Install Operator page, under Installed Namespace, the Operator recommended Namespace option is preselected by default. No action is required.
- Click Install.
Verification
-
Navigate to the Ecosystem
Installed Operators page. Ensure that the openshift-dpu-operator project lists DPU Operator with a Status of InstallSucceeded.
NoteDuring installation an Operator might display a Failed status. If the installation later succeeds with an InstallSucceeded message, you can ignore the Failed message.
Troubleshooting
- Inspect the Operator Subscriptions and Install Plans tabs for any failure or errors under Status.
-
Navigate to the Workloads
Pods page and check the logs for pods in the openshift-dpu-operatorproject. Check the namespace of the YAML file. If the annotation is missing, you can add the annotation
workload.openshift.io/allowed=managementto the Operator namespace with the following command:oc annotate ns/openshift-dpu-operator workload.openshift.io/allowed=management
$ oc annotate ns/openshift-dpu-operator workload.openshift.io/allowed=managementCopy to Clipboard Copied! Toggle word wrap Toggle overflow NoteFor single-node OpenShift clusters, the annotation
workload.openshift.io/allowed=managementis required for the namespace.
11.1.3. Configuring the DPU Operator Copy linkLink copied to clipboard!
You can configure the DPU Operator after installation to enable management of DPU devices and network attachments in both dual cluster and single cluster deployment modes.
You can configure the DPU Operator to manage the DPU devices and network attachments in your cluster.
To configure the DPU Operator follow these steps:
Procedure
Create the
DpuOperatorConfigCustom Resource (CR) based on your deployment mode:-
Dual Cluster Deployment: You must create the
DpuOperatorConfigCR on both the host OpenShift Container Platform cluster and on each of the Red Hat build of MicroShift (MicroShift) DPU clusters. Single Cluster Deployment: This deployment uses a standard OpenShift Container Platform cluster. You only need to create the
DpuOperatorConfigCR once on this cluster.The content of the CR is the same for all clusters.
-
Dual Cluster Deployment: You must create the
Create a file named
dpu-operator-config.yamlby using the following YAML:Copy to Clipboard Copied! Toggle word wrap Toggle overflow -
metadata.name: Specifies the name of the Custom Resource, which must bedpu-operator-config. -
spec.logLevel: Sets the desired logging verbosity in the operator container logs. The value0is the default setting.
-
Create the resource by running the following command:
oc apply -f dpu-operator-config.yaml
$ oc apply -f dpu-operator-config.yamlCopy to Clipboard Copied! Toggle word wrap Toggle overflow Label all nodes that either have an attached DPU or are functioning as a DPU. You can apply this label by running the following command:
oc label node <node_name> dpu=true
$ oc label node <node_name> dpu=trueCopy to Clipboard Copied! Toggle word wrap Toggle overflow where:
node_nameRefers to the name of your node, such as
worker-1.NoteThere are two ways to deploy clusters that are compatible with DPUs:
-
Dual cluster deployment: This consists of OpenShift Container Platform running on the hosts and Red Hat build of MicroShift (MicroShift) running on the DPU. In this mode, the Red Hat build of MicroShift (MicroShift) instance also needs to deploy the DPU Operator, and you must set the label
dpu=trueon the node. -
Single cluster deployment: This consists of only OpenShift Container Platform running on hosts, where the DPUs are integrated into the main cluster. DPUs just require the label
dpu=truefor both the host nodes with DPUs installed and the DPU nodes themselves. The DPU Operator automatically detects the role of the node whether it is running as a DPU or a host with an attached DPU.
-
Dual cluster deployment: This consists of OpenShift Container Platform running on the hosts and Red Hat build of MicroShift (MicroShift) running on the DPU. In this mode, the Red Hat build of MicroShift (MicroShift) instance also needs to deploy the DPU Operator, and you must set the label
11.1.4. Running a workload on the host with DPU Copy linkLink copied to clipboard!
You can deploy workloads on the host with DPU to offload specialized infrastructure tasks and improve performance while freeing up host CPU resources.
Running workloads on a DPU enables offloading specialized infrastructure tasks such as networking, security, and storage to a dedicated processing unit. This improves performance, enforces a stronger security boundary between infrastructure and application workloads, and frees up host CPU resources.
Follow these steps to deploy a workload on the host with DPU. This is the standard deployment model where the application runs on the host’s x86 CPU but utilizes the DPU for network acceleration and offload.
Prerequisites
-
The OpenShift CLI (
oc) is installed. -
An account with
cluster-adminprivileges is available. - The DPU Operator is installed.
Procedure
Create a sample workload designed to run on the host-side worker node by using the following YAML. Save the file as
workload-host.yaml:Copy to Clipboard Copied! Toggle word wrap Toggle overflow spec.nodeSelector: The node selector schedules the pod on the node with the DPU resource. You can use any standard Kubernetes selector for this, such askubernetes.io/hostname, to target a specific node as shown in the example YAML.NoteFor flexible scheduling, the DPU Operator creates the label dpu.config.openshift.io/dpuside: "dpu-host". This label enables the default scheduler to place the workload on any host with a DPU. The workload automatically joins that DPU secondary network. When the label on the node is
dpu.config.openshift.io/dpuside: "dpu", this signifies that the node is the DPU itself. The DPU Operator creates and manages thedpu.config.openshift.io/dpusidelabel .Create the workload by running the following command:
oc apply -f workload-host.yaml
$ oc apply -f workload-host.yamlCopy to Clipboard Copied! Toggle word wrap Toggle overflow
11.1.5. Running a workload on the DPU Copy linkLink copied to clipboard!
You can deploy network workloads directly on the DPU to improve performance, enhance security isolation, and reduce host CPU usage.
The DPU offloads network workloads, such as security functions or virtualized appliances, to improve performance, enhance security isolation, and free host CPU resources.
Follow this procedure to deploy a simple pod directly onto the DPU.
Prerequisites
-
Install the OpenShift CLI (
oc). -
An account with
cluster-adminprivileges. - Install the DPU Operator.
Procedure
Save the following YAML file example as
dpu-pod.yaml. This is an example of a simple pod that will be scheduled directly onto a DPU node by the Kubernetes default scheduler.Copy to Clipboard Copied! Toggle word wrap Toggle overflow -
metadata.name.annotations.k8s.v1.cni.cncf.io/networks: The valuedpunfcni-confspecifies the name of theNetworkAttachmentDefinitionresource. The DPU Operator creates this resource during installation to configure the DPU networking. -
spec.nodeSelector: ThenodeSelectoris the primary mechanism for scheduling this workload. The DPU Operator creates and maintains the label:dpu.config.openshift.io/dpuside: "dpu". This label ensures the pod is scheduled directly onto the DPU’s processing unit. -
spec.containers.name: The name of the container. -
spec.containers.image: The container image to pull and run.
-
Create the pod by running the following command:
oc apply -f dpu-pod.yaml
$ oc apply -f dpu-pod.yamlCopy to Clipboard Copied! Toggle word wrap Toggle overflow Verify the pod status by running the following command:
oc get pods -n openshift-dpu-operator
$ oc get pods -n openshift-dpu-operatorCopy to Clipboard Copied! Toggle word wrap Toggle overflow Ensure the pod’s status is
Running.
11.1.6. Monitoring the status of DPU Copy linkLink copied to clipboard!
You can monitor the DPU infrastructure status to check the current state and health of your DPU devices across the cluster.
You can monitor the DPU status to see the current state of the DPU infrastructure.
The oc get dpu command shows the current state of the DPU infrastructure. Follow this procedure to monitor the status of various cards.
Prerequisites
-
The OpenShift CLI (
oc) is installed. -
An account with
cluster-adminprivileges is available. - The DPU Operator is installed.
Procedure
Run the following command to check the overall health of your nodes:
oc get nodes
$ oc get nodesCopy to Clipboard Copied! Toggle word wrap Toggle overflow The example output provides a list of all nodes in the cluster along with their status. Ensure that all nodes are in the
Readystate before proceeding.Copy to Clipboard Copied! Toggle word wrap Toggle overflow This output shows three master nodes, and three worker nodes identified by the worker-host prefix, for example,
worker-host-ipu-219. Each worker node contains a DPU identified by the ocpcluster-dpu prefix, for example,ocpcluster-dpu-ipu-219.Run the following command to report on the status of the DPUs:
oc get dpu
$ oc get dpuCopy to Clipboard Copied! Toggle word wrap Toggle overflow The example output provides a list of detected DPUs.
Copy to Clipboard Copied! Toggle word wrap Toggle overflow -
DPU PRODUCT:Displays the vendor or type of DPU, for example, Intel or Marvell. -
DPU SIDE:Indicates whether the DPU is operating on the host side (false) or the DPU side (true). Each physical DPU is represented twice. -
MODE NAME:The name of the node where the DPU is located. This is the host worker node forfalseentries and the DPU node fortrueentries. STATUS:Indicates whether the DPU is functioning correctly (True) or has issues (False).NoteRun
oc get dpu -o yamlto get more details about the status.
-
11.1.7. Uninstalling the DPU Operator Copy linkLink copied to clipboard!
You can uninstall the DPU Operator from your cluster when you no longer need DPU device management, ensuring all workloads are deleted first.
To uninstall the DPU Operator, you must first delete any running DPU workloads. Follow this procedure to uninstall the DPU Operator.
Prerequisites
-
You have access to an OpenShift Container Platform cluster using an account with
cluster-adminpermissions. - You have the DPU Operator installed.
Procedure
Delete the
DpuOperatorConfigCR that was created by running the following commandoc delete DpuOperatorConfig dpu-operator-config
$ oc delete DpuOperatorConfig dpu-operator-configCopy to Clipboard Copied! Toggle word wrap Toggle overflow Delete the subscription that was used to install the DPU Operator by running the following command:
oc delete Subscription openshift-dpu-operator-subscription -n openshift-dpu-operator
$ oc delete Subscription openshift-dpu-operator-subscription -n openshift-dpu-operatorCopy to Clipboard Copied! Toggle word wrap Toggle overflow Remove the
OperatorGroupresource that was created by running the following command:oc delete OperatorGroup dpu-operators -n openshift-dpu-operator
$ oc delete OperatorGroup dpu-operators -n openshift-dpu-operatorCopy to Clipboard Copied! Toggle word wrap Toggle overflow Uninstall the DPU Operator as follows:
Check the installed Operators by running the following command:
oc get csv -n openshift-dpu-operator
$ oc get csv -n openshift-dpu-operatorCopy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
NAME DISPLAY VERSION REPLACES PHASE dpu-operator.v4.20.0-202503130333 DPU Operator 4.20.0-202503130333 Failed
NAME DISPLAY VERSION REPLACES PHASE dpu-operator.v4.20.0-202503130333 DPU Operator 4.20.0-202503130333 FailedCopy to Clipboard Copied! Toggle word wrap Toggle overflow Delete the DPU Operator by running the following command:
oc delete csv dpu-operator.v4.20.0-202503130333 -n openshift-dpu-operator
$ oc delete csv dpu-operator.v4.20.0-202503130333 -n openshift-dpu-operatorCopy to Clipboard Copied! Toggle word wrap Toggle overflow
Delete the namespace that was created for the DPU Operator by running the following command:
oc delete namespace openshift-dpu-operator
$ oc delete namespace openshift-dpu-operatorCopy to Clipboard Copied! Toggle word wrap Toggle overflow
Verification
Verify that the DPU Operator is uninstalled by running the following command. An example of successful command output is
No resources found in openshift-dpu-operator namespace.oc get csv -n openshift-dpu-operator
$ oc get csv -n openshift-dpu-operatorCopy to Clipboard Copied! Toggle word wrap Toggle overflow