이 콘텐츠는 선택한 언어로 제공되지 않습니다.
Chapter 8. Hardware networks
8.1. About Single Root I/O Virtualization (SR-IOV) hardware networks
The Single Root I/O Virtualization (SR-IOV) specification is a standard for a type of PCI device assignment that can share a single device with multiple pods.
SR-IOV enables you to segment a compliant network device, recognized on the host node as a physical function (PF), into multiple virtual functions (VFs). The VF is used like any other network device. The SR-IOV device driver for the device determines how the VF is exposed in the container:
-
netdevice
driver: A regular kernel network device in thenetns
of the container -
vfio-pci
driver: A character device mounted in the container
You can use SR-IOV network devices with additional networks on your OpenShift Container Platform cluster for application that require high bandwidth or low latency.
8.1.1. Components that manage SR-IOV network devices
The SR-IOV Network Operator creates and manages the components of the SR-IOV stack. It performs the following functions:
- Orchestrates discovery and management of SR-IOV network devices
- Generates NetworkAttachmentDefinition custom resources for the SR-IOV Container Network Interface (CNI)
- Creates and updates the configuration of the SR-IOV network device plug-in
- Creates node specific SriovNetworkNodeState custom resources
-
Updates the
spec.interfaces
field in each SriovNetworkNodeState custom resource
The Operator provisions the following components:
- SR-IOV network configuration daemon
- A DaemonSet that is deployed on worker nodes when the SR-IOV Operator starts. The daemon is responsible for discovering and initializing SR-IOV network devices in the cluster.
- SR-IOV Operator webhook
- A dynamic admission controller webhook that validates the Operator custom resource and sets appropriate default values for unset fields.
- SR-IOV Network resources injector
- A dynamic admission controller webhook that provides functionality for patching Kubernetes Pod specifications with requests and limits for custom network resources such as SR-IOV VFs.
- SR-IOV network device plug-in
- A device plug-in that discovers, advertises, and allocates SR-IOV network virtual function (VF) resources. Device plug-ins are used in Kubernetes to enable the use of limited resources, typically in physical devices. Device plug-ins give the Kubernetes scheduler awareness of resource availability, so that the scheduler can schedule Pods on nodes with sufficient resources.
- SR-IOV CNI plug-in
- A CNI plug-in that attaches VF interfaces allocated from the SR-IOV device plug-in directly into a Pod.
The SR-IOV Network resources injector and SR-IOV Network Operator webhook are enabled by default and can be disabled by editing the default
SriovOperatorConfig CR.
8.1.1.1. Supported devices
OpenShift Container Platform supports the following Network Interface Card (NIC) models:
-
Intel XXV710-DA2 25G card with vendor ID
0x8086
and device ID0x158b
-
Mellanox MT27710 Family [ConnectX-4 Lx] 25G card with vendor ID
0x15b3
and device ID0x1015
-
Mellanox MT27800 Family [ConnectX-5] 100G card with vendor ID
0x15b3
and device ID0x1017
8.1.1.2. Example use of a virtual function in a Pod
You can run a remote direct memory access (RDMA) or a Data Plane Development Kit (DPDK) application in a Pod with SR-IOV VF attached.
This example shows a Pod using a virtual function (VF) in RDMA mode:
Pod spec that uses RDMA mode
apiVersion: v1 kind: Pod metadata: name: rdma-app annotations: k8s.v1.cni.cncf.io/networks: sriov-rdma-mlnx spec: containers: - name: testpmd image: <RDMA_image> imagePullPolicy: IfNotPresent securityContext: capabilities: add: ["IPC_LOCK"] command: ["sleep", "infinity"]
The following example shows a Pod with a VF in DPDK mode:
Pod spec that uses DPDK mode
apiVersion: v1 kind: Pod metadata: name: dpdk-app annotations: k8s.v1.cni.cncf.io/networks: sriov-dpdk-net spec: containers: - name: testpmd image: <DPDK_image> securityContext: capabilities: add: ["IPC_LOCK"] volumeMounts: - mountPath: /dev/hugepages name: hugepage resources: limits: memory: "1Gi" cpu: "2" hugepages-1Gi: "4Gi" requests: memory: "1Gi" cpu: "2" hugepages-1Gi: "4Gi" command: ["sleep", "infinity"] volumes: - name: hugepage emptyDir: medium: HugePages
An optional library is available to aid the application running in a container in gathering network information associated with a pod. This library is called 'app-netutil'. See the library’s source code in the app-netutil
GitHub repo.
This library is intended to ease the integration of the SR-IOV VFs in DPDK mode into the container. The library provides both a GO API and a C API, as well as examples of using both languages.
There is also a sample Docker image, 'dpdk-app-centos', which can run one of the following DPDK sample applications based on an environmental variable in the pod-spec: l2fwd, l3wd or testpmd. This Docker image provides an example of integrating the 'app-netutil' into the container image itself. The library can also integrate into an init-container which collects the desired data and passes the data to an existing DPDK workload.
8.1.2. Next steps
8.2. Installing the SR-IOV Network Operator
You can install the Single Root I/O Virtualization (SR-IOV) Network Operator on your cluster to manage SR-IOV network devices and network attachments.
8.2.1. Installing SR-IOV Network Operator
As a cluster administrator, you can install the SR-IOV Network Operator by using the OpenShift Container Platform CLI or the web console.
8.2.1.1. CLI: Installing the SR-IOV Network Operator
As a cluster administrator, you can install the Operator using the CLI.
Prerequisites
- A cluster installed on bare-metal hardware with nodes that have hardware that supports SR-IOV.
-
Install the OpenShift CLI (
oc
). -
An account with
cluster-admin
privileges.
Procedure
To create the
openshift-sriov-network-operator
namespace, enter the following command:$ cat << EOF| oc create -f - apiVersion: v1 kind: Namespace metadata: name: openshift-sriov-network-operator labels: openshift.io/run-level: "1" EOF
To create an OperatorGroup CR, enter the following command:
$ cat << EOF| oc create -f - apiVersion: operators.coreos.com/v1 kind: OperatorGroup metadata: name: sriov-network-operators namespace: openshift-sriov-network-operator spec: targetNamespaces: - openshift-sriov-network-operator EOF
Subscribe to the SR-IOV Network Operator.
Run the following command to get the OpenShift Container Platform major and minor version. It is required for the
channel
value in the next step.$ OC_VERSION=$(oc version -o yaml | grep openshiftVersion | \ grep -o '[0-9]*[.][0-9]*' | head -1)
To create a Subscription CR for the SR-IOV Network Operator, enter the following command:
$ cat << EOF| oc create -f - apiVersion: operators.coreos.com/v1alpha1 kind: Subscription metadata: name: sriov-network-operator-subscription namespace: openshift-sriov-network-operator spec: channel: "${OC_VERSION}" name: sriov-network-operator source: redhat-operators sourceNamespace: openshift-marketplace EOF
To verify that the Operator is installed, enter the following command:
$ oc get csv -n openshift-sriov-network-operator \ -o custom-columns=Name:.metadata.name,Phase:.status.phase
Example output
Name Phase sriov-network-operator.4.4.0-202006160135 Succeeded
8.2.1.2. Web console: Installing the SR-IOV Network Operator
As a cluster administrator, you can install the Operator using the web console.
You must create the OperatorGroup CR by using the CLI.
Prerequisites
- A cluster installed on bare-metal hardware with nodes that have hardware that supports SR-IOV.
-
Install the OpenShift CLI (
oc
). -
An account with
cluster-admin
privileges.
Procedure
Create a namespace for the SR-IOV Network Operator:
-
In the OpenShift Container Platform web console, click Administration
Namespaces. - Click Create Namespace.
-
In the Name field, enter
openshift-sriov-network-operator
, and then click Create. -
In the Filter by name field, enter
openshift-sriov-network-operator
. -
From the list of results, click
openshift-sriov-network-operator
, and then click YAML. Update the namespace by adding the following stanza to the namespace definition:
labels: openshift.io/run-level: "1"
- Click Save.
-
In the OpenShift Container Platform web console, click Administration
Install the SR-IOV Network Operator:
-
In the OpenShift Container Platform web console, click Operators
OperatorHub. - Select SR-IOV Network Operator from the list of available Operators, and then click Install.
- On the Create Operator Subscription page, under A specific namespace on the cluster, select openshift-sriov-network-operator.
- Click Subscribe.
-
In the OpenShift Container Platform web console, click Operators
Verify that the SR-IOV Network Operator is installed successfully:
-
Navigate to the Operators
Installed Operators page. Ensure that SR-IOV Network Operator is listed in the openshift-sriov-network-operator project with a Status of InstallSucceeded.
NoteDuring installation an Operator might display a Failed status. If the installation later succeeds with an InstallSucceeded message, you can ignore the Failed message.
If the operator does not appear as installed, to troubleshoot further:
- Inspect the Operator Subscriptions and Install Plans tabs for any failure or errors under Status.
-
Navigate to the Workloads
Pods page and check the logs for Pods in the openshift-sriov-network-operator
project.
-
Navigate to the Operators
8.2.2. Next steps
- Optional: Configuring the SR-IOV Network Operator
8.3. Configuring the SR-IOV Network Operator
The Single Root I/O Virtualization (SR-IOV) Network Operator manages the SR-IOV network devices and network attachments in your cluster.
8.3.1. Configuring the SR-IOV Network Operator
Modifying the SR-IOV Network Operator configuration is not normally necessary. The default configuration is recommended for most use cases. Complete the steps to modify the relevant configuration only if the default behavior of the Operator is not compatible with your use case.
The SR-IOV Network Operator adds the SriovOperatorConfig.sriovnetwork.openshift.io
CustomResourceDefinition resource. The operator automatically creates a SriovOperatorConfig custom resource (CR) named default
in the openshift-sriov-network-operator
namespace.
The default
CR contains the SR-IOV Network Operator configuration for your cluster. To change the operator configuration, you must modify this CR.
The SriovOperatorConfig CR provides several fields for configuring the operator:
-
enableInjector
allows project administrators to enable or disable the Network Resources Injector DaemonSet. -
enableOperatorWebhook
allows project administrators to enable or disable the Operator Admission Controller webook DaemonSet. -
configDaemonNodeSelector
allows project administrators to schedule the SR-IOV Network Config Daemon on selected nodes.
8.3.1.1. About the Network Resources Injector
The Network Resources Injector is a Kubernetes Dynamic Admission Controller application. It provides the following capabilities:
- Mutation of resource requests and limits in Pod specification to add an SR-IOV resource name according to an SR-IOV network attachment definition annotation.
-
Mutation of Pod specifications with downward API volume to expose pod annotations and labels to the running container as files under the
/etc/podnetinfo
path.
By default the Network Resources Injector is enabled by the SR-IOV operator and runs as a DaemonSet on all master nodes. The following is an example of Network Resources Injector Pods running in a cluster with three master nodes:
$ oc get pods -n openshift-sriov-network-operator
Example output
NAME READY STATUS RESTARTS AGE network-resources-injector-5cz5p 1/1 Running 0 10m network-resources-injector-dwqpx 1/1 Running 0 10m network-resources-injector-lktz5 1/1 Running 0 10m
8.3.1.2. About the SR-IOV Operator admission controller webhook
The SR-IOV Operator Admission Controller webook is a Kubernetes Dynamic Admission Controller application. It provides the following capabilities:
-
Validation of the
SriovNetworkNodePolicy
CR when it is created or updated. -
Mutation of the
SriovNetworkNodePolicy
CR by setting the default value for thepriority
anddeviceType
fields when the CR is created or updated.
By default the SR-IOV Operator Admission Controller webook is enabled by the operator and runs as a DaemonSet on all master nodes. The following is an example of the Operator Admission Controller webook Pods running in a cluster with three master nodes:
$ oc get pods -n openshift-sriov-network-operator
Example output
NAME READY STATUS RESTARTS AGE operator-webhook-9jkw6 1/1 Running 0 16m operator-webhook-kbr5p 1/1 Running 0 16m operator-webhook-rpfrl 1/1 Running 0 16m
8.3.1.3. About custom node selectors
The SR-IOV Network Config daemon discovers and configures the SR-IOV network devices on cluster nodes. By default, it is deployed to all the worker
nodes in the cluster. You can use node labels to specify on which nodes the SR-IOV Network Config daemon runs.
8.3.1.4. Disabling or enabling the Network Resources Injector
To disable or enable the Network Resources Injector, which is enabled by default, complete the following procedure.
Prerequisites
-
Install the OpenShift CLI (
oc
). -
Log in as a user with
cluster-admin
privileges. - You must have installed the SR-IOV Operator.
Procedure
Set the
enableInjector
field. Replace<value>
withfalse
to disable the feature ortrue
to enable the feature.$ oc patch sriovoperatorconfig default \ --type=merge -n openshift-sriov-network-operator \ --patch '{ "spec": { "enableInjector": <value> } }'
8.3.1.5. Disabling or enabling the SR-IOV Operator admission controller webhook
To disable or enable the admission controller webhook, which is enabled by default, complete the following procedure.
Prerequisites
-
Install the OpenShift CLI (
oc
). -
Log in as a user with
cluster-admin
privileges. - You must have installed the SR-IOV Operator.
Procedure
Set the
enableOperatorWebhook
field. Replace<value>
withfalse
to disable the feature ortrue
to enable it:$ oc patch sriovoperatorconfig default --type=merge \ -n openshift-sriov-network-operator \ --patch '{ "spec": { "enableOperatorWebhook": <value> } }'
8.3.1.6. Configuring a custom NodeSelector for the SR-IOV Network Config daemon
The SR-IOV Network Config daemon discovers and configures the SR-IOV network devices on cluster nodes. By default, it is deployed to all the worker
nodes in the cluster. You can use node labels to specify on which nodes the SR-IOV Network Config daemon runs.
To specify the nodes where the SR-IOV Network Config daemon is deployed, complete the following procedure.
When you update the configDaemonNodeSelector
field, the SR-IOV Network Config daemon is recreated on each selected node. While the daemon is recreated, cluster users are unable to apply any new SR-IOV Network node policy or create new SR-IOV pods.
Procedure
To update the node selector for the operator, enter the following command:
$ oc patch sriovoperatorconfig default --type=json \ -n openshift-sriov-network-operator \ --patch '[{ "op": "replace", "path": "/spec/configDaemonNodeSelector", "value": {<node-label>} }]'
Replace
<node-label>
with a label to apply as in the following example:"node-role.kubernetes.io/worker": ""
.
8.3.2. Next steps
8.4. Configuring an SR-IOV network device
You can configure a Single Root I/O Virtualization (SR-IOV) device in your cluster.
8.4.1. Automated discovery of SR-IOV network devices
The SR-IOV Network Operator searches your cluster for SR-IOV capable network devices on worker nodes. The Operator creates and updates a SriovNetworkNodeState custom resource (CR) for each worker node that provides a compatible SR-IOV network device.
The CR is assigned the same name as the worker node. The status.interfaces
list provides information about the network devices on a node.
Do not modify a SriovNetworkNodeState CR. The Operator creates and manages these resources automatically.
8.4.1.1. Example SriovNetworkNodeState CR
The following YAML is an example of a SriovNetworkNodeState CR created by the SR-IOV Network Operator:
An SriovNetworkNodeState object
apiVersion: sriovnetwork.openshift.io/v1 kind: SriovNetworkNodeState metadata: name: node-25 1 namespace: openshift-sriov-network-operator ownerReferences: - apiVersion: sriovnetwork.openshift.io/v1 blockOwnerDeletion: true controller: true kind: SriovNetworkNodePolicy name: default spec: dpConfigVersion: "39824" status: interfaces: 2 - deviceID: "1017" driver: mlx5_core mtu: 1500 name: ens785f0 pciAddress: "0000:18:00.0" totalvfs: 8 vendor: 15b3 - deviceID: "1017" driver: mlx5_core mtu: 1500 name: ens785f1 pciAddress: "0000:18:00.1" totalvfs: 8 vendor: 15b3 - deviceID: 158b driver: i40e mtu: 1500 name: ens817f0 pciAddress: 0000:81:00.0 totalvfs: 64 vendor: "8086" - deviceID: 158b driver: i40e mtu: 1500 name: ens817f1 pciAddress: 0000:81:00.1 totalvfs: 64 vendor: "8086" - deviceID: 158b driver: i40e mtu: 1500 name: ens803f0 pciAddress: 0000:86:00.0 totalvfs: 64 vendor: "8086" syncStatus: Succeeded
8.4.2. Configuring SR-IOV network devices
The SR-IOV Network Operator adds the SriovNetworkNodePolicy.sriovnetwork.openshift.io
CustomResourceDefinition to OpenShift Container Platform. You can configure an SR-IOV network device by creating a SriovNetworkNodePolicy custom resource (CR).
When applying the configuration specified in a SriovNetworkNodePolicy CR, the SR-IOV Operator may drain the nodes, and in some cases, reboot nodes. It might take several minutes for a configuration change to apply. Ensure that there are enough available nodes in your cluster to handle the evicted workload beforehand.
Prerequisites
-
Install the OpenShift CLI (
oc
). -
An account with
cluster-admin
privileges. - You must have installed the SR-IOV Operator.
Procedure
-
Create the following SriovNetworkNodePolicy CR, and then save the YAML in the
<name>-sriov-node-network.yaml
file. Replace<name>
with the name for this configuration.
apiVersion: sriovnetwork.openshift.io/v1 kind: SriovNetworkNodePolicy metadata: name: <name> 1 namespace: openshift-sriov-network-operator 2 spec: resourceName: <sriov_resource_name> 3 nodeSelector: feature.node.kubernetes.io/network-sriov.capable: "true" 4 priority: <priority> 5 mtu: <mtu> 6 numVfs: <num> 7 nicSelector: 8 vendor: "<vendor_code>" 9 deviceID: "<device_id>" 10 pfNames: ["<pf_name>", ...] 11 rootDevices: ["<pci_bus_id>", "..."] 12 deviceType: <device_type> 13 isRdma: false 14
- 1
- Specify a name for the CR object.
- 2
- Specify the namespace where the SR-IOV Operator is installed.
- 3
- Specify the resource name of the SR-IOV device plug-in. You can create multiple SriovNetworkNodePolicy CRs for a resource name.
- 4
- Specify the node selector to select which nodes are configured. Only SR-IOV network devices on selected nodes are configured. The SR-IOV Container Network Interface (CNI) plug-in and device plug-in are deployed only on selected nodes.
- 5
- Optional: Specify an integer value between
0
and99
. A smaller number gets higher priority, so a priority of10
is higher than a priority of99
. The default value is99
. - 6
- Optional: Specify a value for the maximum transmission unit (MTU) of the virtual function. The maximum MTU value can vary for different NIC models.
- 7
- Specify the number of the virtual functions (VF) to create for the SR-IOV physical network device. For an Intel Network Interface Card (NIC), the number of VFs cannot be larger than the total VFs supported by the device. For a Mellanox NIC, the number of VFs cannot be larger than
128
. - 8
- The
nicSelector
mapping selects the Ethernet device for the Operator to configure. You do not need to specify values for all the parameters. It is recommended to identify the Ethernet adapter with enough precision to minimize the possibility of selecting an Ethernet device unintentionally. If you specifyrootDevices
, you must also specify a value forvendor
,deviceID
, orpfNames
. If you specify bothpfNames
androotDevices
at the same time, ensure that they point to an identical device. - 9
- Optional: Specify the vendor hex code of the SR-IOV network device. The only allowed values are either
8086
or15b3
. - 10
- Optional: Specify the device hex code of SR-IOV network device. The only allowed values are
158b
,1015
,1017
. - 11
- Optional: The parameter accepts an array of one or more physical function (PF) names for the Ethernet device.
- 12
- The parameter accepts an array of one or more PCI bus addresses for the physical function of the Ethernet device. Provide the address in the following format:
0000:02:00.1
. - 13
- Optional: Specify the driver type for the virtual functions. You can specify one of the following values:
netdevice
orvfio-pci
. The default value isnetdevice
.NoteFor a Mellanox card to work in Data Plane Development Kit (DPDK) mode on bare metal nodes, use the
netdevice
driver type and setisRdma
totrue
. For a Mellanox card to work in DPDK mode with Container-Native Virtualization (CNV), use thevfio-pci
driver type and setisRdma
tofalse
. - 14
- Optional. Specify whether to enable remote direct memory access (RDMA) mode. The default value is
false
. Only RDMA over Converged Ethernet (RoCE) mode is supported on Mellanox Ethernet adapters.NoteIf
isRDMA
flag is set totrue
, you can continue to use the RDMA enabled VF as a normal network device. A device can be used in either mode.
Create the SriovNetworkNodePolicy CR. Replace
<name>
with the name for this configuration.$ oc create -f <name>-sriov-node-network.yaml
After applying the configuration update, all the Pods in
sriov-network-operator
namespace transition to theRunning
status.To verify that the SR-IOV network device is configured, enter the following command. Replace
<node_name>
with the name of a node with the SR-IOV network device that you just configured.$ oc get sriovnetworknodestates -n openshift-sriov-network-operator <node_name> -o jsonpath='{.status.syncStatus}'
8.4.3. Next steps
8.5. Configuring an SR-IOV network attachment
You can configure a network attachment for an Single Root I/O Virtualization (SR-IOV) device in the cluster.
8.5.1. Configuring SR-IOV additional network
You can configure an additional network that uses SR-IOV hardware by creating a SriovNetwork custom resource (CR). When you create a SriovNetwork CR, the SR-IOV Operator automatically creates a NetworkAttachmentDefinition CR.
Do not modify or delete a SriovNetwork CR if it is attached to any Pods in the running
state.
Prerequisites
-
Install the OpenShift CLI (
oc
). -
Log in as a user with
cluster-admin
privileges.
Procedure
-
Create the following SriovNetwork CR, and then save the YAML in the
<name>-sriov-network.yaml
file. Replace<name>
with a name for this additional network.
apiVersion: sriovnetwork.openshift.io/v1 kind: SriovNetwork metadata: name: <name> 1 namespace: openshift-sriov-network-operator 2 spec: networkNamespace: <target_namespace> 3 ipam: <ipam> 4 vlan: <vlan> 5 resourceName: <sriov_resource_name> 6 linkState: <link_state> 7 maxTxRate: <max_tx_rate> 8 minTxRate: <min_rx_rate> 9 vlanQoS: <vlan_qos> 10 spoofChk: "<spoof_check>" 11 trust: "<trust_vf>" 12 capabilities: <capabilities> 13
- 1
- Replace
<name>
with a name for the CR. The SR-IOV Network Operator creates a NetworkAttachmentDefinition CR with same name. - 2
- Specify the namespace where the SR-IOV Operator is installed.
- 3
- Optional: Replace
<target_namespace>
with the namespace where the NetworkAttachmentDefinition CR is created. The default value isopenshift-sriov-network-operator
. - 4
- Optional: Replace
<ipam>
a configuration object for the ipam CNI plug-in as a YAML block scalar. The plug-in manages IP address assignment for the attachment definition. - 5
- Optional: Replace
<vlan>
with a Virtual LAN (VLAN) ID for the additional network. The integer value must be from0
to4095
. The default value is0
. - 6
- Replace
<sriov_resource_name>
with the value for the.spec.resourceName
parameter from the SriovNetworkNodePolicy CR that defines the SR-IOV hardware for this additional network. - 7
- Optional: Replace
<link_state>
with the link state of virtual function (VF). Allowed value areenable
,disable
andauto
. - 8
- Optional: Replace
<max_tx_rate>
with a maximum transmission rate, in Mbps, for the VF. - 9
- Optional: Replace
<min_tx_rate>
with a minimum transmission rate, in Mbps, for the VF. This value should always be less than or equal to Maximum transmission rate.NoteIntel NICs do not support the
minTxRate
parameter. For more information, see BZ#1772847. - 10
- Optional: Replace
<vlan_qos>
with an IEEE 802.1p priority level for the VF. The default value is0
. - 11
- Optional: Replace
<spoof_check>
with the spoof check mode of the VF. The allowed values are the strings"on"
and"off"
.ImportantYou must enclose the value you specify in quotes or the CR is rejected by the SR-IOV Network Operator.
- 12
- Optional: Replace
<trust_vf>
with the trust mode of the VF. The allowed values are the strings"on"
and"off"
.ImportantYou must enclose the value you specify in quotes or the CR is rejected by the SR-IOV Network Operator.
- 13
- Optional: Replace
<capabilities>
with the capabilities to configure for this network. You can specify"{ "ips": true }"
to enable IP address support or"{ "mac": true }"
to enable MAC address support.
To create the CR object, enter the following command. Replace
<name>
with a name for this additional network.$ oc create -f <name>-sriov-network.yaml
Optional: To confirm that the NetworkAttachmentDefinition CR associated with the SriovNetwork CR that you created in the previous step exists, enter the following command. Replace
<namespace>
with the namespace you specified in the SriovNetwork CR.$ oc get net-attach-def -n <namespace>
8.5.1.1. Configuration for ipam CNI plug-in
The ipam Container Network Interface (CNI) plug-in provides IP address management (IPAM) for other CNI plug-ins. You can configure ipam for either static IP address assignment or dynamic IP address assignment by using DHCP. The DHCP server you specify must be reachable from the additional network.
The following JSON configuration object describes the parameters that you can set.
8.5.1.1.1. Static IP address assignment configuration
The following JSON describes the configuration for static IP address assignment:
Static assignment configuration
{ "ipam": { "type": "static", "addresses": [ 1 { "address": "<address>", 2 "gateway": "<gateway>" 3 } ], "routes": [ 4 { "dst": "<dst>" 5 "gw": "<gw>" 6 } ], "dns": { 7 "nameservers": ["<nameserver>"], 8 "domain": "<domain>", 9 "search": ["<search_domain>"] 10 } } }
- 1
- An array describing IP addresses to assign to the virtual interface. Both IPv4 and IPv6 IP addresses are supported.
- 2
- An IP address that you specify.
- 3
- The default gateway to route egress network traffic to.
- 4
- An array describing routes to configure inside the Pod.
- 5
- The IP address range in CIDR format.
- 6
- The gateway where network traffic is routed.
- 7
- Optional: DNS configuration.
- 8
- An of array of one or more IP addresses for to send DNS queries to.
- 9
- The default domain to append to a host name. For example, if the domain is set to
example.com
, a DNS lookup query forexample-host
is rewritten asexample-host.example.com
. - 10
- An array of domain names to append to an unqualified host name, such as
example-host
, during a DNS lookup query.
8.5.1.1.2. Dynamic IP address assignment configuration
The following JSON describes the configuration for dynamic IP address address assignment with DHCP.
A Pod obtains its original DHCP lease when it is created. The lease must be periodically renewed by a minimal DHCP server deployment running on the cluster.
The SR-IOV Network Operator does not create a DHCP server deployment; The Cluster Network Operator is responsible for creating the minimal DHCP server deployment.
To trigger the deployment of the DHCP server, you must create a shim network attachment by editing the Cluster Network Operator configuration, as in the following example:
Example shim network attachment definition
apiVersion: operator.openshift.io/v1 kind: Network metadata: name: cluster spec: ... additionalNetworks: - name: dhcp-shim namespace: default rawCNIConfig: |- { "name": "dhcp-shim", "cniVersion": "0.3.1", "type": "bridge", "master": "ens5", "ipam": { "type": "dhcp" } }
DHCP assignment configuration
{ "ipam": { "type": "dhcp" } }
8.5.1.1.3. Static IP address assignment configuration example
You can configure ipam for static IP address assignment:
{ "ipam": { "type": "static", "addresses": [ { "address": "191.168.1.7" } ] } }
8.5.1.1.4. Dynamic IP address assignment configuration example using DHCP
You can configure ipam for DHCP:
{ "ipam": { "type": "dhcp" } }
8.5.1.2. Configuring static MAC and IP addresses on additional SR-IOV networks
You can configure static MAC and IP addresses on an SR-IOV network by specifying Container Network Interface (CNI) runtimeConfig data in a Pod annotation.
Prerequisites
-
Install the OpenShift CLI (
oc
). -
Log in as a user with
cluster-admin
privileges when creating the SriovNetwork CR.
Procedure
Create the following SriovNetwork CR, and then save the YAML in the
<name>-sriov-network.yaml
file. Replace<name>
with a name for this additional network.apiVersion: sriovnetwork.openshift.io/v1 kind: SriovNetwork metadata: name: <name> 1 namespace: openshift-sriov-network-operator 2 spec: networkNamespace: <target_namespace> 3 ipam: '{ "type": "static" }' 4 capabilities: '{ "mac": true, "ips": true }' 5 resourceName: <sriov_resource_name> 6
- 1
- Replace
<name>
with a name for the CR. The SR-IOV Network Operator creates a NetworkAttachmentDefinition CR with same name. - 2
- Specify the namespace where the SR-IOV Network Operator is installed.
- 3
- Replace
<target_namespace>
with the namespace where the NetworkAttachmentDefinition CR is created. - 4
- Specify static type for the ipam CNI plug-in as a YAML block scalar.
- 5
- Specify
mac
andips
capabilities
totrue
. - 6
- Replace
<sriov_resource_name>
with the value for thespec.resourceName
parameter from the SriovNetworkNodePolicy CR that defines the SR-IOV hardware for this additional network.
Create the CR by running the following command:
$ oc create -f <filename> 1
- 1
- Replace
<filename>
with the name of the file you created in the previous step.
Optional: Confirm that the NetworkAttachmentDefinition CR associated with the SriovNetwork CR that you created in the previous step exists by running the following command. Replace
<namespace>
with the namespace you specified in the SriovNetwork CR.$ oc get net-attach-def -n <namespace>
Do not modify or delete a SriovNetwork Custom Resource (CR) if it is attached to any Pods in the running
state.
Create the following SR-IOV pod spec, and then save the YAML in the
<name>-sriov-pod.yaml
file. Replace<name>
with a name for this pod.apiVersion: v1 kind: Pod metadata: name: sample-pod annotations: k8s.v1.cni.cncf.io/networks: '[ { "name": "<name>", 1 "mac": "20:04:0f:f1:88:01", 2 "ips": ["192.168.10.1/24", "2001::1/64"] 3 } ]' spec: containers: - name: sample-container image: <image> imagePullPolicy: IfNotPresent command: ["sleep", "infinity"]
- 1
- Specify the name of the SR-IOV network attachment definition CR.
- 2
- Specify the MAC address for the SR-IOV device that is allocated from the resource type defined in the SR-IOV network attachment definition CR.
- 3
- Specify addresses for the SR-IOV device which is allocated from the resource type defined in the SR-IOV network attachment definition CR. Both IPv4 and IPv6 addresses are supported.
Create the sample SR-IOV pod by running the following command:
$ oc create -f <filename> 1
- 1
- Replace
<filename>
with the name of the file you created in the previous step.
Optional: Confirm that
mac
andips
addresses are applied to the SR-IOV device by running the following command. Replace<namespace>
with the namespace you specified in the SriovNetwork CR.$ oc exec sample-pod -n <namespace> -- ip addr show
8.5.2. Next steps
8.6. Adding a Pod to an SR-IOV additional network
You can add a Pod to an existing Single Root I/O Virtualization (SR-IOV) network.
8.6.1. Adding a Pod to an additional network
You can add a Pod to an additional network. The Pod continues to send normal cluster-related network traffic over the default network.
When a Pod is created additional networks are attached to it. However, if a Pod already exists, you cannot attach additional networks to it.
If a NetworkAttachmentDefinition is managed by the SR-IOV Network Operator, the SR-IOV Network Resource Injector adds the resource
field to the Pod object automatically.
When specifying an SR-IOV hardware network for a Deployment resource or a ReplicationController resource, you must specify the namespace of the NetworkAttachmentDefinition CR. For more information, see the following bugs: BZ#1846333 and BZ#1840962.
Prerequisites
- The Pod must be in the same namespace as the additional network.
-
Install the OpenShift CLI (
oc
). - You must log in to the cluster.
- You must have the SR-IOV Operator installed and a SriovNetwork CR defined.
Procedure
Add an annotation to the Pod object. Only one of the following annotation formats can be used:
To attach an additional network without any customization, add an annotation with the following format. Replace
<network>
with the name of the additional network to associate with the Pod:metadata: annotations: k8s.v1.cni.cncf.io/networks: <network>[,<network>,...] 1
- 1
- To specify more than one additional network, separate each network with a comma. Do not include whitespace between the comma. If you specify the same additional network multiple times, that Pod will have multiple network interfaces attached to that network.
To attach an additional network with customizations, add an annotation with the following format:
metadata: annotations: k8s.v1.cni.cncf.io/networks: |- [ { "name": "<network>", 1 "namespace": "<namespace>", 2 "default-route": ["<default-route>"] 3 } ]
To create the Pod, enter the following command. Replace
<name>
with the name of the Pod.$ oc create -f <name>.yaml
Optional: To Confirm that the annotation exists in the Pod CR, enter the following command, replacing
<name>
with the name of the Pod.$ oc get pod <name> -o yaml
In the following example, the
example-pod
Pod is attached to thenet1
additional network:$ oc get pod example-pod -o yaml apiVersion: v1 kind: Pod metadata: annotations: k8s.v1.cni.cncf.io/networks: macvlan-bridge k8s.v1.cni.cncf.io/networks-status: |- 1 [{ "name": "openshift-sdn", "interface": "eth0", "ips": [ "10.128.2.14" ], "default": true, "dns": {} },{ "name": "macvlan-bridge", "interface": "net1", "ips": [ "20.2.2.100" ], "mac": "22:2f:60:a5:f8:00", "dns": {} }] name: example-pod namespace: default spec: ... status: ...
- 1
- The
k8s.v1.cni.cncf.io/networks-status
parameter is a JSON array of objects. Each object describes the status of an additional network attached to the Pod. The annotation value is stored as a plain text value.
8.6.2. Creating a non-uniform memory access (NUMA) aligned SR-IOV pod
You can create a NUMA aligned SR-IOV pod by restricting SR-IOV and the CPU resources allocated from the same NUMA node with restricted
or single-numa-node
Topology Manager polices.
Prerequisites
-
Install the OpenShift CLI (
oc
). -
Enable a LatencySensitive profile and configure the CPU Manager policy to
static
.
Procedure
Create the following SR-IOV pod spec, and then save the YAML in the
<name>-sriov-pod.yaml
file. Replace<name>
with a name for this pod.The following example shows an SR-IOV pod spec:
apiVersion: v1 kind: Pod metadata: name: sample-pod annotations: k8s.v1.cni.cncf.io/networks: <name> 1 spec: containers: - name: sample-container image: <image> 2 command: ["sleep", "infinity"] resources: limits: memory: "1Gi" 3 cpu: "2" 4 requests: memory: "1Gi" cpu: "2"
- 1
- Replace
<name>
with the name of the SR-IOV network attachment definition CR. - 2
- Replace
<image>
with the name of thesample-pod
image. - 3
- To create the SR-IOV pod with guaranteed QoS, set
memory limits
equal tomemory requests
. - 4
- To create the SR-IOV pod with guaranteed QoS, set
cpu limits
equals tocpu requests
.
Create the sample SR-IOV pod by running the following command:
$ oc create -f <filename> 1
- 1
- Replace
<filename>
with the name of the file you created in the previous step.
Confirm that the
sample-pod
is configured with guaranteed QoS.$ oc describe pod sample-pod
Confirm that the
sample-pod
is allocated with exclusive CPUs.$ oc exec sample-pod -- cat /sys/fs/cgroup/cpuset/cpuset.cpus
Confirm that the SR-IOV device and CPUs that are allocated for the
sample-pod
are on the same NUMA node.$ oc exec sample-pod -- cat /sys/fs/cgroup/cpuset/cpuset.cpus
8.7. Using high performance multicast
You can use multicast on your Single Root I/O Virtualization (SR-IOV) hardware network.
8.7.1. Configuring high performance multicast
The OpenShift SDN default Container Network Interface (CNI) network provider supports multicast between Pods on the default network. This is best used for low-bandwidth coordination or service discovery, and not high-bandwidth applications. For applications such as streaming media, like Internet Protocol television (IPTV) and multipoint videoconferencing, you can utilize Single Root I/O Virtualization (SR-IOV) hardware to provide near-native performance.
When using additional SR-IOV interfaces for multicast:
- Multicast packages must be sent or received by a Pod through the additional SR-IOV interface.
- The physical network which connects the SR-IOV interfaces decides the multicast routing and topology, which is not controlled by OpenShift Container Platform.
8.7.2. Using an SR-IOV interface for multicast
The follow procedure creates an example SR-IOV interface for multicast.
Prerequisites
-
Install the OpenShift CLI (
oc
). -
You must log in to the cluster with a user that has the
cluster-admin
role.
Procedure
Create a SriovNetworkNodePolicy custom resource (CR):
apiVersion: sriovnetwork.openshift.io/v1 kind: SriovNetworkNodePolicy metadata: name: policy-example namespace: openshift-sriov-network-operator spec: resourceName: example nodeSelector: feature.node.kubernetes.io/network-sriov.capable: "true" numVfs: 4 nicSelector: vendor: "8086" pfNames: ['ens803f0'] rootDevices: ['0000:86:00.0']
Create a SriovNetwork CR:
apiVersion: sriovnetwork.openshift.io/v1 kind: SriovNetwork metadata: name: net-example namespace: openshift-sriov-network-operator spec: networkNamespace: default ipam: | 1 { "type": "host-local", 2 "subnet": "10.56.217.0/24", "rangeStart": "10.56.217.171", "rangeEnd": "10.56.217.181", "routes": [ {"dst": "224.0.0.0/5"}, {"dst": "232.0.0.0/5"} ], "gateway": "10.56.217.1" } resourceName: example
Create a Pod with multicast application:
apiVersion: v1 kind: Pod metadata: name: testpmd namespace: default annotations: k8s.v1.cni.cncf.io/networks: nic1 spec: containers: - name: example image: rhel7:latest securityContext: capabilities: add: ["NET_ADMIN"] 1 command: [ "sleep", "infinity"]
- 1
- The
NET_ADMIN
capability is required only if your application needs to assign the multicast IP address to the SR-IOV interface. Otherwise, it can be omitted.
8.8. Using virtual functions (VFs) with DPDK and RDMA modes
You can use Single Root I/O Virtualization (SR-IOV) network hardware with the Data Plane Development Kit (DPDK) and with remote direct memory access (RDMA).
8.8.1. Examples of using virtual functions in DPDK and RDMA modes
The Data Plane Development Kit (DPDK) is a Technology Preview feature only. Technology Preview features are not supported with Red Hat production service level agreements (SLAs) and might not be functionally complete. Red Hat does not recommend using them in production. These features provide early access to upcoming product features, enabling customers to test functionality and provide feedback during the development process.
For more information about the support scope of Red Hat Technology Preview features, see https://access.redhat.com/support/offerings/techpreview/.
Remote Direct Memory Access (RDMA) is a Technology Preview feature only. Technology Preview features are not supported with Red Hat production service level agreements (SLAs) and might not be functionally complete. Red Hat does not recommend using them in production. These features provide early access to upcoming product features, enabling customers to test functionality and provide feedback during the development process.
For more information about the support scope of Red Hat Technology Preview features, see https://access.redhat.com/support/offerings/techpreview/.
8.8.2. Prerequisites
-
Install the OpenShift CLI (
oc
). -
Log in as a user with
cluster-admin
privileges. - You must have installed the SR-IOV Network Operator.
8.8.3. Example use of virtual function (VF) in DPDK mode with Intel NICs
Procedure
Create the following SriovNetworkNodePolicy CR, and then save the YAML in the
intel-dpdk-node-policy.yaml
file.apiVersion: sriovnetwork.openshift.io/v1 kind: SriovNetworkNodePolicy metadata: name: intel-dpdk-node-policy namespace: openshift-sriov-network-operator spec: resourceName: intelnics nodeSelector: feature.node.kubernetes.io/network-sriov.capable: "true" priority: <priority> numVfs: <num> nicSelector: vendor: "8086" deviceID: "158b" pfNames: ["<pf_name>", ...] rootDevices: ["<pci_bus_id>", "..."] deviceType: vfio-pci 1
- 1
- Specify the driver type for the virtual functions to
vfio-pci
.
NotePlease refer to the
Configuring SR-IOV network devices
section for a detailed explanation on each option inSriovNetworkNodePolicy
.+ When applying the configuration specified in a SriovNetworkNodePolicy CR, the SR-IOV Operator may drain the nodes, and in some cases, reboot nodes. It may take several minutes for a configuration change to apply. Ensure that there are enough available nodes in your cluster to handle the evicted workload beforehand.
+ After the configuration update is applied, all the Pods in
openshift-sriov-network-operator
namespace will change to aRunning
status.Create the SriovNetworkNodePolicy CR by running the following command:
$ oc create -f intel-dpdk-node-policy.yaml
Create the following SriovNetwork CR, and then save the YAML in the
intel-dpdk-network.yaml
file.apiVersion: sriovnetwork.openshift.io/v1 kind: SriovNetwork metadata: name: intel-dpdk-network namespace: openshift-sriov-network-operator spec: networkNamespace: <target_namespace> ipam: "{}" 1 vlan: <vlan> resourceName: intelnics
- 1
- Specify an empty object
"{}"
for the ipam CNI plug-in. DPDK works in userspace mode and does not require an IP address.
NotePlease refer to the
Configuring SR-IOV additional network
section for a detailed explanation on each option inSriovNetwork
.Create the SriovNetworkNodePolicy CR by running the following command:
$ oc create -f intel-dpdk-network.yaml
Create the following Pod spec, and then save the YAML in the
intel-dpdk-pod.yaml
file.apiVersion: v1 kind: Pod metadata: name: dpdk-app namespace: <target_namespace> 1 annotations: k8s.v1.cni.cncf.io/networks: intel-dpdk-network spec: containers: - name: testpmd image: <DPDK_image> 2 securityContext: capabilities: add: ["IPC_LOCK"] 3 volumeMounts: - mountPath: /dev/hugepages 4 name: hugepage resources: limits: openshift.io/intelnics: "1" 5 memory: "1Gi" cpu: "4" 6 hugepages-1Gi: "4Gi" 7 requests: openshift.io/intelnics: "1" memory: "1Gi" cpu: "4" hugepages-1Gi: "4Gi" command: ["sleep", "infinity"] volumes: - name: hugepage emptyDir: medium: HugePages
- 1
- Specify the same
target_namespace
where the SriovNetwork CRintel-dpdk-network
is created. If you would like to create the Pod in a different namespace, changetarget_namespace
in both the Pod spec and the SriovNetowrk CR. - 2
- Specify the DPDK image which includes your application and the DPDK library used by application.
- 3
- Specify the
IPC_LOCK
capability which is required by the application to allocate hugepage memory inside container. - 4
- Mount a hugepage volume to the DPDK Pod under
/dev/hugepages
. The hugepage volume is backed by the emptyDir volume type with the medium beingHugepages
. - 5
- Optional: Specify the number of DPDK devices allocated to DPDK Pod. This resource request and limit, if not explicitly specified, will be automatically added by the SR-IOV network resource injector. The SR-IOV network resource injector is an admission controller component managed by the SR-IOV Operator. It is enabled by default and can be disabled by setting
enableInjector
option tofalse
in the defaultSriovOperatorConfig
CR. - 6
- Specify the number of CPUs. The DPDK Pod usually requires exclusive CPUs to be allocated from the kubelet. This is achieved by setting CPU Manager policy to
static
and creating a Pod withGuaranteed
QoS. - 7
- Specify hugepage size
hugepages-1Gi
orhugepages-2Mi
and the quantity of hugepages that will be allocated to the DPDK Pod. Configure2Mi
and1Gi
hugepages separately. Configuring1Gi
hugepage requires adding kernel arguments to Nodes. For example, adding kernel argumentsdefault_hugepagesz=1GB
,hugepagesz=1G
andhugepages=16
will result in16*1Gi
hugepages be allocated during system boot.
Create the DPDK Pod by running the following command:
$ oc create -f intel-dpdk-pod.yaml
8.8.4. Example use of a virtual function in DPDK mode with Mellanox NICs
Procedure
Create the following SriovNetworkNodePolicy CR, and then save the YAML in the
mlx-dpdk-node-policy.yaml
file.apiVersion: sriovnetwork.openshift.io/v1 kind: SriovNetworkNodePolicy metadata: name: mlx-dpdk-node-policy namespace: openshift-sriov-network-operator spec: resourceName: mlxnics nodeSelector: feature.node.kubernetes.io/network-sriov.capable: "true" priority: <priority> numVfs: <num> nicSelector: vendor: "15b3" deviceID: "1015" 1 pfNames: ["<pf_name>", ...] rootDevices: ["<pci_bus_id>", "..."] deviceType: netdevice 2 isRdma: true 3
- 1
- Specify the device hex code of the SR-IOV network device. The only allowed values for Mellanox cards are
1015
,1017
. - 2
- Specify the driver type for the virtual functions to
netdevice
. Mellanox SR-IOV VF can work in DPDK mode without using thevfio-pci
device type. VF device appears as a kernel network interface inside a container. - 3
- Enable RDMA mode. This is required by Mellanox cards to work in DPDK mode.
NotePlease refer to
Configuring SR-IOV network devices
section for detailed explanation on each option inSriovNetworkNodePolicy
.+ When applying the configuration specified in a SriovNetworkNodePolicy CR, the SR-IOV Operator may drain the nodes, and in some cases, reboot nodes. It may take several minutes for a configuration change to apply. Ensure that there are enough available nodes in your cluster to handle the evicted workload beforehand.
+ After the configuration update is applied, all the Pods in the
openshift-sriov-network-operator
namespace will change to aRunning
status.Create the SriovNetworkNodePolicy CR by running the following command:
$ oc create -f mlx-dpdk-node-policy.yaml
Create the following SriovNetwork CR, and then save the YAML in the
mlx-dpdk-network.yaml
file.apiVersion: sriovnetwork.openshift.io/v1 kind: SriovNetwork metadata: name: mlx-dpdk-network namespace: openshift-sriov-network-operator spec: networkNamespace: <target_namespace> ipam: |- 1 ... vlan: <vlan> resourceName: mlxnics
- 1
- Specify a configuration object for the ipam CNI plug-in as a YAML block scalar. The plug-in manages IP address assignment for the attachment definition.
NotePlease refer to
Configuring SR-IOV additional network
section for detailed explanation on each option inSriovNetwork
.Create the SriovNetworkNodePolicy CR by running the following command:
$ oc create -f mlx-dpdk-network.yaml
Create the following Pod spec, and then save the YAML in the
mlx-dpdk-pod.yaml
file.apiVersion: v1 kind: Pod metadata: name: dpdk-app namespace: <target_namespace> 1 annotations: k8s.v1.cni.cncf.io/networks: mlx-dpdk-network spec: containers: - name: testpmd image: <DPDK_image> 2 securityContext: capabilities: add: ["IPC_LOCK"] 3 volumeMounts: - mountPath: /dev/hugepages 4 name: hugepage resources: limits: openshift.io/mlxnics: "1" 5 memory: "1Gi" cpu: "4" 6 hugepages-1Gi: "4Gi" 7 requests: openshift.io/mlxnics: "1" memory: "1Gi" cpu: "4" hugepages-1Gi: "4Gi" command: ["sleep", "infinity"] volumes: - name: hugepage emptyDir: medium: HugePages
- 1
- Specify the same
target_namespace
where SriovNetwork CRmlx-dpdk-network
is created. If you would like to create the Pod in a different namespace, changetarget_namespace
in both Pod spec and SriovNetowrk CR. - 2
- Specify the DPDK image which includes your application and the DPDK library used by application.
- 3
- Specify the
IPC_LOCK
capability which is required by the application to allocate hugepage memory inside the container. - 4
- Mount the hugepage volume to the DPDK Pod under
/dev/hugepages
. The hugepage volume is backed by the emptyDir volume type with the medium beingHugepages
. - 5
- Optional: Specify the number of DPDK devices allocated to the DPDK Pod. This resource request and limit, if not explicitly specified, will be automatically added by SR-IOV network resource injector. The SR-IOV network resource injector is an admission controller component managed by SR-IOV Operator. It is enabled by default and can be disabled by setting the
enableInjector
option tofalse
in the defaultSriovOperatorConfig
CR. - 6
- Specify the number of CPUs. The DPDK Pod usually requires exclusive CPUs be allocated from kubelet. This is achieved by setting CPU Manager policy to
static
and creating a Pod withGuaranteed
QoS. - 7
- Specify hugepage size
hugepages-1Gi
orhugepages-2Mi
and the quantity of hugepages that will be allocated to DPDK Pod. Configure2Mi
and1Gi
hugepages separately. Configuring1Gi
hugepage requires adding kernel arguments to Nodes.
Create the DPDK Pod by running the following command:
$ oc create -f mlx-dpdk-pod.yaml
8.8.5. Example of a virtual function in RDMA mode with Mellanox NICs
RDMA over Converged Ethernet (RoCE) is the only supported mode when using RDMA on OpenShift Container Platform.
Procedure
Create the following SriovNetworkNodePolicy CR, and then save the YAML in the
mlx-rdma-node-policy.yaml
file.apiVersion: sriovnetwork.openshift.io/v1 kind: SriovNetworkNodePolicy metadata: name: mlx-rdma-node-policy namespace: openshift-sriov-network-operator spec: resourceName: mlxnics nodeSelector: feature.node.kubernetes.io/network-sriov.capable: "true" priority: <priority> numVfs: <num> nicSelector: vendor: "15b3" deviceID: "1015" 1 pfNames: ["<pf_name>", ...] rootDevices: ["<pci_bus_id>", "..."] deviceType: netdevice 2 isRdma: true 3
NotePlease refer to the
Configuring SR-IOV network devices
section for a detailed explanation on each option inSriovNetworkNodePolicy
.+ When applying the configuration specified in a SriovNetworkNodePolicy CR, the SR-IOV Operator may drain the nodes, and in some cases, reboot nodes. It may take several minutes for a configuration change to apply. Ensure that there are enough available nodes in your cluster to handle the evicted workload beforehand.
+ After the configuration update is applied, all the Pods in the
openshift-sriov-network-operator
namespace will change to aRunning
status.Create the SriovNetworkNodePolicy CR by running the following command:
$ oc create -f mlx-rdma-node-policy.yaml
Create the following SriovNetwork CR, and then save the YAML in the
mlx-rdma-network.yaml
file.apiVersion: sriovnetwork.openshift.io/v1 kind: SriovNetwork metadata: name: mlx-rdma-network namespace: openshift-sriov-network-operator spec: networkNamespace: <target_namespace> ipam: |- 1 ... vlan: <vlan> resourceName: mlxnics
- 1
- Specify a configuration object for the ipam CNI plug-in as a YAML block scalar. The plug-in manages IP address assignment for the attachment definition.
NotePlease refer to
Configuring SR-IOV additional network
section for detailed explanation on each option inSriovNetwork
.Create the SriovNetworkNodePolicy CR by running the following command:
$ oc create -f mlx-rdma-network.yaml
Create the following Pod spec, and then save the YAML in the
mlx-rdma-pod.yaml
file.apiVersion: v1 kind: Pod metadata: name: rdma-app namespace: <target_namespace> 1 annotations: k8s.v1.cni.cncf.io/networks: mlx-rdma-network spec: containers: - name: testpmd image: <RDMA_image> 2 securityContext: capabilities: add: ["IPC_LOCK"] 3 volumeMounts: - mountPath: /dev/hugepages 4 name: hugepage resources: limits: memory: "1Gi" cpu: "4" 5 hugepages-1Gi: "4Gi" 6 requests: memory: "1Gi" cpu: "4" hugepages-1Gi: "4Gi" command: ["sleep", "infinity"] volumes: - name: hugepage emptyDir: medium: HugePages
- 1
- Specify the same
target_namespace
where SriovNetwork CRmlx-rdma-network
is created. If you would like to create the Pod in a different namespace, changetarget_namespace
in both Pod spec and SriovNetowrk CR. - 2
- Specify the RDMA image which includes your application and RDMA library used by application.
- 3
- Specify the
IPC_LOCK
capability which is required by the application to allocate hugepage memory inside the container. - 4
- Mount the hugepage volume to RDMA Pod under
/dev/hugepages
. The hugepage volume is backed by the emptyDir volume type with the medium beingHugepages
. - 5
- Specify number of CPUs. The RDMA Pod usually requires exclusive CPUs be allocated from the kubelet. This is achieved by setting CPU Manager policy to
static
and create Pod withGuaranteed
QoS. - 6
- Specify hugepage size
hugepages-1Gi
orhugepages-2Mi
and the quantity of hugepages that will be allocated to the RDMA Pod. Configure2Mi
and1Gi
hugepages separately. Configuring1Gi
hugepage requires adding kernel arguments to Nodes.
Create the RDMA Pod by running the following command:
$ oc create -f mlx-rdma-pod.yaml