Chapter 10. SR-IOV Operator
10.1. Installing the SR-IOV Network Operator Copy linkLink copied to clipboard!
You can install the Single Root I/O Virtualization (SR-IOV) Network Operator on your cluster to manage SR-IOV network devices and network attachments.
10.1.1. Installing the SR-IOV Network Operator Copy linkLink copied to clipboard!
As a cluster administrator, you can install the Single Root I/O Virtualization (SR-IOV) Network Operator by using the OpenShift Container Platform CLI or the web console.
10.1.1.1. CLI: Installing the SR-IOV Network Operator Copy linkLink copied to clipboard!
As a cluster administrator, you can install the Operator using the CLI.
Prerequisites
- A cluster installed on bare-metal hardware with nodes that have hardware that supports SR-IOV.
-
Install the OpenShift CLI (
oc
). -
An account with
cluster-admin
privileges.
Procedure
Create the
openshift-sriov-network-operator
namespace by entering the following command:Copy to Clipboard Copied! Toggle word wrap Toggle overflow Create an
OperatorGroup
custom resource (CR) by entering the following command:Copy to Clipboard Copied! Toggle word wrap Toggle overflow Create a
Subscription
CR for the SR-IOV Network Operator by entering the following command:Copy to Clipboard Copied! Toggle word wrap Toggle overflow Create an
SriovoperatorConfig
resource by entering the following command:Copy to Clipboard Copied! Toggle word wrap Toggle overflow
Verification
To verify that the Operator is installed, enter the following command and then check that output shows
Succeeded
for the Operator:oc get csv -n openshift-sriov-network-operator \ -o custom-columns=Name:.metadata.name,Phase:.status.phase
$ oc get csv -n openshift-sriov-network-operator \ -o custom-columns=Name:.metadata.name,Phase:.status.phase
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
10.1.1.2. Web console: Installing the SR-IOV Network Operator Copy linkLink copied to clipboard!
As a cluster administrator, you can install the Operator using the web console.
Prerequisites
- A cluster installed on bare-metal hardware with nodes that have hardware that supports SR-IOV.
-
Install the OpenShift CLI (
oc
). -
An account with
cluster-admin
privileges.
Procedure
Install the SR-IOV Network Operator:
-
In the OpenShift Container Platform web console, click Operators
OperatorHub. - Select SR-IOV Network Operator from the list of available Operators, and then click Install.
- On the Install Operator page, under Installed Namespace, select Operator recommended Namespace.
- Click Install.
-
In the OpenShift Container Platform web console, click Operators
Verify that the SR-IOV Network Operator is installed successfully:
-
Navigate to the Operators
Installed Operators page. Ensure that SR-IOV Network Operator is listed in the openshift-sriov-network-operator project with a Status of InstallSucceeded.
NoteDuring installation an Operator might display a Failed status. If the installation later succeeds with an InstallSucceeded message, you can ignore the Failed message.
If the Operator does not appear as installed, to troubleshoot further:
- Inspect the Operator Subscriptions and Install Plans tabs for any failure or errors under Status.
-
Navigate to the Workloads
Pods page and check the logs for pods in the openshift-sriov-network-operator
project. Check the namespace of the YAML file. If the annotation is missing, you can add the annotation
workload.openshift.io/allowed=management
to the Operator namespace with the following command:oc annotate ns/openshift-sriov-network-operator workload.openshift.io/allowed=management
$ oc annotate ns/openshift-sriov-network-operator workload.openshift.io/allowed=management
Copy to Clipboard Copied! Toggle word wrap Toggle overflow NoteFor single-node OpenShift clusters, the annotation
workload.openshift.io/allowed=management
is required for the namespace.
-
Navigate to the Operators
10.1.2. Next steps Copy linkLink copied to clipboard!
10.2. Configuring the SR-IOV Network Operator Copy linkLink copied to clipboard!
The Single Root I/O Virtualization (SR-IOV) Network Operator manages the SR-IOV network devices and network attachments in your cluster.
10.2.1. Configuring the SR-IOV Network Operator Copy linkLink copied to clipboard!
Create a
SriovOperatorConfig
custom resource (CR) to deploy all the SR-IOV Operator components:Create a file named
sriovOperatorConfig.yaml
using the following YAML:Copy to Clipboard Copied! Toggle word wrap Toggle overflow - 1
- The only valid name for the
SriovOperatorConfig
resource isdefault
and it must be in the namespace where the Operator is deployed. - 2
- The
enableInjector
field, if not specified in the CR or explicitly set totrue
, defaults tofalse
or<none>
, preventing anynetwork-resources-injector
pod from running in the namespace. The recommended setting istrue
. - 3
- The
enableOperatorWebhook
field, if not specified in the CR or explicitly set to true, defaults tofalse
or<none>
, preventing anyoperator-webhook
pod from running in the namespace. The recommended setting istrue
.
Create the resource by running the following command:
oc apply -f sriovOperatorConfig.yaml
$ oc apply -f sriovOperatorConfig.yaml
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
10.2.1.1. SR-IOV Network Operator config custom resource Copy linkLink copied to clipboard!
The fields for the sriovoperatorconfig
custom resource are described in the following table:
Field | Type | Description |
---|---|---|
|
|
Specifies the name of the SR-IOV Network Operator instance. The default value is |
|
|
Specifies the namespace of the SR-IOV Network Operator instance. The default value is |
|
| Specifies the node selection to control scheduling the SR-IOV Network Config Daemon on selected nodes. By default, this field is not set and the Operator deploys the SR-IOV Network Config daemon set on worker nodes. |
|
|
Specifies whether to disable the node draining process or enable the node draining process when you apply a new policy to configure the NIC on a node. Setting this field to
For single-node clusters, set this field to |
|
| Specifies whether to enable or disable the Network Resources Injector daemon set. |
|
| Specifies whether to enable or disable the Operator Admission Controller webhook daemon set. |
|
|
Specifies the log verbosity level of the Operator. By default, this field is set to |
|
|
Specifies whether to enable or disable the optional features. For example, |
|
|
Specifies whether to enable or disable the SR-IOV Network Operator metrics. By default, this field is set to |
|
|
Specifies whether to reset the firmware on virtual function (VF) changes in the SR-IOV Network Operator. Some chipsets, such as the Intel C740 Series, do not completely power off the PCI-E devices, which is required to configure VFs on NVIDIA/Mellanox NICs. By default, this field is set to Important
The For more information about the support scope of Red Hat Technology Preview features, see Technology Preview Features Support Scope. |
10.2.1.2. About the Network Resources Injector Copy linkLink copied to clipboard!
The Network Resources Injector is a Kubernetes Dynamic Admission Controller application. It provides the following capabilities:
- Mutation of resource requests and limits in a pod specification to add an SR-IOV resource name according to an SR-IOV network attachment definition annotation.
-
Mutation of a pod specification with a Downward API volume to expose pod annotations, labels, and huge pages requests and limits. Containers that run in the pod can access the exposed information as files under the
/etc/podnetinfo
path.
The Network Resources Injector is enabled by the SR-IOV Network Operator when the enableInjector
is set to true
in the SriovOperatorConfig
CR. The network-resources-injector
pod runs as a daemon set on all control plane nodes. The following is an example of Network Resources Injector pods running in a cluster with three control plane nodes:
oc get pods -n openshift-sriov-network-operator
$ oc get pods -n openshift-sriov-network-operator
Example output
NAME READY STATUS RESTARTS AGE network-resources-injector-5cz5p 1/1 Running 0 10m network-resources-injector-dwqpx 1/1 Running 0 10m network-resources-injector-lktz5 1/1 Running 0 10m
NAME READY STATUS RESTARTS AGE
network-resources-injector-5cz5p 1/1 Running 0 10m
network-resources-injector-dwqpx 1/1 Running 0 10m
network-resources-injector-lktz5 1/1 Running 0 10m
10.2.1.3. Disabling or enabling the Network Resources Injector Copy linkLink copied to clipboard!
To disable or enable the Network Resources Injector, complete the following procedure.
Prerequisites
-
Install the OpenShift CLI (
oc
). -
Log in as a user with
cluster-admin
privileges. - You must have installed the SR-IOV Network Operator.
Procedure
Set the
enableInjector
field. Replace<value>
withfalse
to disable the feature ortrue
to enable the feature.oc patch sriovoperatorconfig default \ --type=merge -n openshift-sriov-network-operator \ --patch '{ "spec": { "enableInjector": <value> } }'
$ oc patch sriovoperatorconfig default \ --type=merge -n openshift-sriov-network-operator \ --patch '{ "spec": { "enableInjector": <value> } }'
Copy to Clipboard Copied! Toggle word wrap Toggle overflow TipYou can alternatively apply the following YAML to update the Operator:
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
10.2.1.4. About the SR-IOV Network Operator admission controller webhook Copy linkLink copied to clipboard!
The SR-IOV Network Operator Admission Controller webhook is a Kubernetes Dynamic Admission Controller application. It provides the following capabilities:
-
Validation of the
SriovNetworkNodePolicy
CR when it is created or updated. -
Mutation of the
SriovNetworkNodePolicy
CR by setting the default value for thepriority
anddeviceType
fields when the CR is created or updated.
The SR-IOV Network Operator Admission Controller webhook is enabled by the Operator when the enableOperatorWebhook
is set to true
in the SriovOperatorConfig
CR. The operator-webhook
pod runs as a daemon set on all control plane nodes.
Use caution when disabling the SR-IOV Network Operator Admission Controller webhook. You can disable the webhook under specific circumstances, such as troubleshooting, or if you want to use unsupported devices. For information about configuring unsupported devices, see Configuring the SR-IOV Network Operator to use an unsupported NIC.
The following is an example of the Operator Admission Controller webhook pods running in a cluster with three control plane nodes:
oc get pods -n openshift-sriov-network-operator
$ oc get pods -n openshift-sriov-network-operator
Example output
NAME READY STATUS RESTARTS AGE operator-webhook-9jkw6 1/1 Running 0 16m operator-webhook-kbr5p 1/1 Running 0 16m operator-webhook-rpfrl 1/1 Running 0 16m
NAME READY STATUS RESTARTS AGE
operator-webhook-9jkw6 1/1 Running 0 16m
operator-webhook-kbr5p 1/1 Running 0 16m
operator-webhook-rpfrl 1/1 Running 0 16m
10.2.1.5. Disabling or enabling the SR-IOV Network Operator admission controller webhook Copy linkLink copied to clipboard!
To disable or enable the admission controller webhook, complete the following procedure.
Prerequisites
-
Install the OpenShift CLI (
oc
). -
Log in as a user with
cluster-admin
privileges. - You must have installed the SR-IOV Network Operator.
Procedure
Set the
enableOperatorWebhook
field. Replace<value>
withfalse
to disable the feature ortrue
to enable it:oc patch sriovoperatorconfig default --type=merge \ -n openshift-sriov-network-operator \ --patch '{ "spec": { "enableOperatorWebhook": <value> } }'
$ oc patch sriovoperatorconfig default --type=merge \ -n openshift-sriov-network-operator \ --patch '{ "spec": { "enableOperatorWebhook": <value> } }'
Copy to Clipboard Copied! Toggle word wrap Toggle overflow TipYou can alternatively apply the following YAML to update the Operator:
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
10.2.1.6. About custom node selectors Copy linkLink copied to clipboard!
The SR-IOV Network Config daemon discovers and configures the SR-IOV network devices on cluster nodes. By default, it is deployed to all the worker
nodes in the cluster. You can use node labels to specify on which nodes the SR-IOV Network Config daemon runs.
10.2.1.7. Configuring a custom NodeSelector for the SR-IOV Network Config daemon Copy linkLink copied to clipboard!
The SR-IOV Network Config daemon discovers and configures the SR-IOV network devices on cluster nodes. By default, it is deployed to all the worker
nodes in the cluster. You can use node labels to specify on which nodes the SR-IOV Network Config daemon runs.
To specify the nodes where the SR-IOV Network Config daemon is deployed, complete the following procedure.
When you update the configDaemonNodeSelector
field, the SR-IOV Network Config daemon is recreated on each selected node. While the daemon is recreated, cluster users are unable to apply any new SR-IOV Network node policy or create new SR-IOV pods.
Procedure
To update the node selector for the operator, enter the following command:
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Replace
<node_label>
with a label to apply as in the following example:"node-role.kubernetes.io/worker": ""
.TipYou can alternatively apply the following YAML to update the Operator:
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
10.2.1.8. Configuring the SR-IOV Network Operator for single node installations Copy linkLink copied to clipboard!
By default, the SR-IOV Network Operator drains workloads from a node before every policy change. The Operator performs this action to ensure that there no workloads using the virtual functions before the reconfiguration.
For installations on a single node, there are no other nodes to receive the workloads. As a result, the Operator must be configured not to drain the workloads from the single node.
After performing the following procedure to disable draining workloads, you must remove any workload that uses an SR-IOV network interface before you change any SR-IOV network node policy.
Prerequisites
-
Install the OpenShift CLI (
oc
). -
Log in as a user with
cluster-admin
privileges. - You must have installed the SR-IOV Network Operator.
Procedure
To set the
disableDrain
field totrue
and theconfigDaemonNodeSelector
field tonode-role.kubernetes.io/master: ""
, enter the following command:oc patch sriovoperatorconfig default --type=merge -n openshift-sriov-network-operator --patch '{ "spec": { "disableDrain": true, "configDaemonNodeSelector": { "node-role.kubernetes.io/master": "" } } }'
$ oc patch sriovoperatorconfig default --type=merge -n openshift-sriov-network-operator --patch '{ "spec": { "disableDrain": true, "configDaemonNodeSelector": { "node-role.kubernetes.io/master": "" } } }'
Copy to Clipboard Copied! Toggle word wrap Toggle overflow TipYou can alternatively apply the following YAML to update the Operator:
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
10.2.1.9. Deploying the SR-IOV Operator for hosted control planes Copy linkLink copied to clipboard!
After you configure and deploy your hosting service cluster, you can create a subscription to the SR-IOV Operator on a hosted cluster. The SR-IOV pod runs on worker machines rather than the control plane.
Prerequisites
You must configure and deploy the hosted cluster on AWS.
Procedure
Create a namespace and an Operator group:
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Create a subscription to the SR-IOV Operator:
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
Verification
To verify that the SR-IOV Operator is ready, run the following command and view the resulting output:
oc get csv -n openshift-sriov-network-operator
$ oc get csv -n openshift-sriov-network-operator
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
NAME DISPLAY VERSION REPLACES PHASE sriov-network-operator.4.18.0-202211021237 SR-IOV Network Operator 4.18.0-202211021237 sriov-network-operator.4.18.0-202210290517 Succeeded
NAME DISPLAY VERSION REPLACES PHASE sriov-network-operator.4.18.0-202211021237 SR-IOV Network Operator 4.18.0-202211021237 sriov-network-operator.4.18.0-202210290517 Succeeded
Copy to Clipboard Copied! Toggle word wrap Toggle overflow To verify that the SR-IOV pods are deployed, run the following command:
oc get pods -n openshift-sriov-network-operator
$ oc get pods -n openshift-sriov-network-operator
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
10.2.2. About the SR-IOV network metrics exporter Copy linkLink copied to clipboard!
The Single Root I/O Virtualization (SR-IOV) network metrics exporter reads the metrics for SR-IOV virtual functions (VFs) and exposes these VF metrics in Prometheus format. When the SR-IOV network metrics exporter is enabled, you can query the SR-IOV VF metrics by using the OpenShift Container Platform web console to monitor the networking activity of the SR-IOV pods.
When you query the SR-IOV VF metrics by using the web console, the SR-IOV network metrics exporter fetches and returns the VF network statistics along with the name and namespace of the pod that the VF is attached to.
The SR-IOV VF metrics that the metrics exporter reads and exposes in Prometheus format are described in the following table:
Metric | Description | Example PromQL query to examine the VF metric |
---|---|---|
| Received bytes per virtual function. |
|
| Transmitted bytes per virtual function. |
|
| Received packets per virtual function. |
|
| Transmitted packets per virtual function. |
|
| Dropped packets upon receipt per virtual function. |
|
| Dropped packets during transmission per virtual function. |
|
| Received multicast packets per virtual function. |
|
| Received broadcast packets per virtual function. |
|
| Virtual functions linked to active pods. | - |
You can also combine these queries with the kube-state-metrics to get more information about the SR-IOV pods. For example, you can use the following query to get the VF network statistics along with the application name from the standard Kubernetes pod label:
(sriov_vf_tx_packets * on (pciAddr,node) group_left(pod,namespace) sriov_kubepoddevice) * on (pod,namespace) group_left (label_app_kubernetes_io_name) kube_pod_labels
(sriov_vf_tx_packets * on (pciAddr,node) group_left(pod,namespace) sriov_kubepoddevice) * on (pod,namespace) group_left (label_app_kubernetes_io_name) kube_pod_labels
10.2.2.1. Enabling the SR-IOV network metrics exporter Copy linkLink copied to clipboard!
The Single Root I/O Virtualization (SR-IOV) network metrics exporter is disabled by default. To enable the metrics exporter, you must set the spec.featureGates.metricsExporter
field to true
.
When the metrics exporter is enabled, the SR-IOV Network Operator deploys the metrics exporter only on nodes with SR-IOV capabilities.
Prerequisites
-
You have installed the OpenShift CLI (
oc
). -
You have logged in as a user with
cluster-admin
privileges. - You have installed the SR-IOV Network Operator.
Procedure
Enable cluster monitoring by running the following command:
oc label ns/openshift-sriov-network-operator openshift.io/cluster-monitoring=true
$ oc label ns/openshift-sriov-network-operator openshift.io/cluster-monitoring=true
Copy to Clipboard Copied! Toggle word wrap Toggle overflow To enable cluster monitoring, you must add the
openshift.io/cluster-monitoring=true
label in the namespace where you have installed the SR-IOV Network Operator.Set the
spec.featureGates.metricsExporter
field totrue
by running the following command:oc patch -n openshift-sriov-network-operator sriovoperatorconfig/default \ --type='merge' -p='{"spec": {"featureGates": {"metricsExporter": true}}}'
$ oc patch -n openshift-sriov-network-operator sriovoperatorconfig/default \ --type='merge' -p='{"spec": {"featureGates": {"metricsExporter": true}}}'
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
Verification
Check that the SR-IOV network metrics exporter is enabled by running the following command:
oc get pods -n openshift-sriov-network-operator
$ oc get pods -n openshift-sriov-network-operator
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
NAME READY STATUS RESTARTS AGE operator-webhook-hzfg4 1/1 Running 0 5d22h sriov-network-config-daemon-tr54m 1/1 Running 0 5d22h sriov-network-metrics-exporter-z5d7t 1/1 Running 0 10s sriov-network-operator-cc6fd88bc-9bsmt 1/1 Running 0 5d22h
NAME READY STATUS RESTARTS AGE operator-webhook-hzfg4 1/1 Running 0 5d22h sriov-network-config-daemon-tr54m 1/1 Running 0 5d22h sriov-network-metrics-exporter-z5d7t 1/1 Running 0 10s sriov-network-operator-cc6fd88bc-9bsmt 1/1 Running 0 5d22h
Copy to Clipboard Copied! Toggle word wrap Toggle overflow The
sriov-network-metrics-exporter
pod must be in theREADY
state.- Optional: Examine the SR-IOV virtual function (VF) metrics by using the OpenShift Container Platform web console. For more information, see "Querying metrics".
10.2.3. Next steps Copy linkLink copied to clipboard!
10.3. Uninstalling the SR-IOV Network Operator Copy linkLink copied to clipboard!
To uninstall the SR-IOV Network Operator, you must delete any running SR-IOV workloads, uninstall the Operator, and delete the webhooks that the Operator used.
10.3.1. Uninstalling the SR-IOV Network Operator Copy linkLink copied to clipboard!
As a cluster administrator, you can uninstall the SR-IOV Network Operator.
Prerequisites
-
You have access to an OpenShift Container Platform cluster using an account with
cluster-admin
permissions. - You have the SR-IOV Network Operator installed.
Procedure
Delete all SR-IOV custom resources (CRs):
oc delete sriovnetwork -n openshift-sriov-network-operator --all
$ oc delete sriovnetwork -n openshift-sriov-network-operator --all
Copy to Clipboard Copied! Toggle word wrap Toggle overflow oc delete sriovnetworknodepolicy -n openshift-sriov-network-operator --all
$ oc delete sriovnetworknodepolicy -n openshift-sriov-network-operator --all
Copy to Clipboard Copied! Toggle word wrap Toggle overflow oc delete sriovibnetwork -n openshift-sriov-network-operator --all
$ oc delete sriovibnetwork -n openshift-sriov-network-operator --all
Copy to Clipboard Copied! Toggle word wrap Toggle overflow oc delete sriovoperatorconfigs -n openshift-sriov-network-operator --all
$ oc delete sriovoperatorconfigs -n openshift-sriov-network-operator --all
Copy to Clipboard Copied! Toggle word wrap Toggle overflow - Follow the instructions in the "Deleting Operators from a cluster" section to remove the SR-IOV Network Operator from your cluster.
Delete the SR-IOV custom resource definitions that remain in the cluster after the SR-IOV Network Operator is uninstalled:
oc delete crd sriovibnetworks.sriovnetwork.openshift.io
$ oc delete crd sriovibnetworks.sriovnetwork.openshift.io
Copy to Clipboard Copied! Toggle word wrap Toggle overflow oc delete crd sriovnetworknodepolicies.sriovnetwork.openshift.io
$ oc delete crd sriovnetworknodepolicies.sriovnetwork.openshift.io
Copy to Clipboard Copied! Toggle word wrap Toggle overflow oc delete crd sriovnetworknodestates.sriovnetwork.openshift.io
$ oc delete crd sriovnetworknodestates.sriovnetwork.openshift.io
Copy to Clipboard Copied! Toggle word wrap Toggle overflow oc delete crd sriovnetworkpoolconfigs.sriovnetwork.openshift.io
$ oc delete crd sriovnetworkpoolconfigs.sriovnetwork.openshift.io
Copy to Clipboard Copied! Toggle word wrap Toggle overflow oc delete crd sriovnetworks.sriovnetwork.openshift.io
$ oc delete crd sriovnetworks.sriovnetwork.openshift.io
Copy to Clipboard Copied! Toggle word wrap Toggle overflow oc delete crd sriovoperatorconfigs.sriovnetwork.openshift.io
$ oc delete crd sriovoperatorconfigs.sriovnetwork.openshift.io
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Delete the SR-IOV Network Operator namespace:
oc delete namespace openshift-sriov-network-operator
$ oc delete namespace openshift-sriov-network-operator
Copy to Clipboard Copied! Toggle word wrap Toggle overflow