이 콘텐츠는 선택한 언어로 제공되지 않습니다.

Chapter 8. Hardware networks


8.1. About Single Root I/O Virtualization (SR-IOV) hardware networks

The Single Root I/O Virtualization (SR-IOV) specification is a standard for a type of PCI device assignment that can share a single device with multiple pods.

SR-IOV enables you to segment a compliant network device, recognized on the host node as a physical function (PF), into multiple virtual functions (VFs). The VF is used like any other network device. The SR-IOV device driver for the device determines how the VF is exposed in the container:

  • netdevice driver: A regular kernel network device in the netns of the container
  • vfio-pci driver: A character device mounted in the container

You can use SR-IOV network devices with additional networks on your OpenShift Container Platform cluster for application that require high bandwidth or low latency.

8.1.1. Components that manage SR-IOV network devices

The SR-IOV Network Operator creates and manages the components of the SR-IOV stack. It performs the following functions:

  • Orchestrates discovery and management of SR-IOV network devices
  • Generates NetworkAttachmentDefinition custom resources for the SR-IOV Container Network Interface (CNI)
  • Creates and updates the configuration of the SR-IOV network device plug-in
  • Creates node specific SriovNetworkNodeState custom resources
  • Updates the spec.interfaces field in each SriovNetworkNodeState custom resource

The Operator provisions the following components:

SR-IOV network configuration daemon
A DaemonSet that is deployed on worker nodes when the SR-IOV Operator starts. The daemon is responsible for discovering and initializing SR-IOV network devices in the cluster.
SR-IOV Operator webhook
A dynamic admission controller webhook that validates the Operator custom resource and sets appropriate default values for unset fields.
SR-IOV Network resources injector
A dynamic admission controller webhook that provides functionality for patching Kubernetes Pod specifications with requests and limits for custom network resources such as SR-IOV VFs.
SR-IOV network device plug-in
A device plug-in that discovers, advertises, and allocates SR-IOV network virtual function (VF) resources. Device plug-ins are used in Kubernetes to enable the use of limited resources, typically in physical devices. Device plug-ins give the Kubernetes scheduler awareness of resource availability, so that the scheduler can schedule Pods on nodes with sufficient resources.
SR-IOV CNI plug-in
A CNI plug-in that attaches VF interfaces allocated from the SR-IOV device plug-in directly into a Pod.
Note

The SR-IOV Network resources injector and SR-IOV Network Operator webhook are enabled by default and can be disabled by editing the default SriovOperatorConfig CR.

8.1.1.1. Supported devices

OpenShift Container Platform supports the following Network Interface Card (NIC) models:

  • Intel XXV710-DA2 25G card with vendor ID 0x8086 and device ID 0x158b
  • Mellanox MT27710 Family [ConnectX-4 Lx] 25G card with vendor ID 0x15b3 and device ID 0x1015
  • Mellanox MT27800 Family [ConnectX-5] 100G card with vendor ID 0x15b3 and device ID 0x1017

8.1.1.2. Example use of a virtual function in a Pod

You can run a remote direct memory access (RDMA) or a Data Plane Development Kit (DPDK) application in a Pod with SR-IOV VF attached.

This example shows a Pod using a virtual function (VF) in RDMA mode:

Pod spec that uses RDMA mode

apiVersion: v1
kind: Pod
metadata:
  name: rdma-app
  annotations:
    k8s.v1.cni.cncf.io/networks: sriov-rdma-mlnx
spec:
  containers:
  - name: testpmd
    image: <RDMA_image>
    imagePullPolicy: IfNotPresent
    securityContext:
     capabilities:
        add: ["IPC_LOCK"]
    command: ["sleep", "infinity"]

The following example shows a Pod with a VF in DPDK mode:

Pod spec that uses DPDK mode

apiVersion: v1
kind: Pod
metadata:
  name: dpdk-app
  annotations:
    k8s.v1.cni.cncf.io/networks: sriov-dpdk-net
spec:
  containers:
  - name: testpmd
    image: <DPDK_image>
    securityContext:
     capabilities:
        add: ["IPC_LOCK"]
    volumeMounts:
    - mountPath: /dev/hugepages
      name: hugepage
    resources:
      limits:
        memory: "1Gi"
        cpu: "2"
        hugepages-1Gi: "4Gi"
      requests:
        memory: "1Gi"
        cpu: "2"
        hugepages-1Gi: "4Gi"
    command: ["sleep", "infinity"]
  volumes:
  - name: hugepage
    emptyDir:
      medium: HugePages

An optional library is available to aid the application running in a container in gathering network information associated with a pod. This library is called 'app-netutil'. See the library’s source code in the app-netutil GitHub repo.

This library is intended to ease the integration of the SR-IOV VFs in DPDK mode into the container. The library provides both a GO API and a C API, as well as examples of using both languages.

There is also a sample Docker image, 'dpdk-app-centos', which can run one of the following DPDK sample applications based on an environmental variable in the pod-spec: l2fwd, l3wd or testpmd. This Docker image provides an example of integrating the 'app-netutil' into the container image itself. The library can also integrate into an init-container which collects the desired data and passes the data to an existing DPDK workload.

8.1.2. Next steps

8.2. Installing the SR-IOV Network Operator

You can install the Single Root I/O Virtualization (SR-IOV) Network Operator on your cluster to manage SR-IOV network devices and network attachments.

8.2.1. Installing SR-IOV Network Operator

As a cluster administrator, you can install the SR-IOV Network Operator by using the OpenShift Container Platform CLI or the web console.

8.2.1.1. CLI: Installing the SR-IOV Network Operator

As a cluster administrator, you can install the Operator using the CLI.

Prerequisites

  • A cluster installed on bare-metal hardware with nodes that have hardware that supports SR-IOV.
  • Install the OpenShift CLI (oc).
  • An account with cluster-admin privileges.

Procedure

  1. To create the openshift-sriov-network-operator namespace, enter the following command:

    $ cat << EOF| oc create -f -
    apiVersion: v1
    kind: Namespace
    metadata:
      name: openshift-sriov-network-operator
      labels:
        openshift.io/run-level: "1"
    EOF
  2. To create an OperatorGroup CR, enter the following command:

    $ cat << EOF| oc create -f -
    apiVersion: operators.coreos.com/v1
    kind: OperatorGroup
    metadata:
      name: sriov-network-operators
      namespace: openshift-sriov-network-operator
    spec:
      targetNamespaces:
      - openshift-sriov-network-operator
    EOF
  3. Subscribe to the SR-IOV Network Operator.

    1. Run the following command to get the OpenShift Container Platform major and minor version. It is required for the channel value in the next step.

      $ OC_VERSION=$(oc version -o yaml | grep openshiftVersion | \
          grep -o '[0-9]*[.][0-9]*' | head -1)
    2. To create a Subscription CR for the SR-IOV Network Operator, enter the following command:

      $ cat << EOF| oc create -f -
      apiVersion: operators.coreos.com/v1alpha1
      kind: Subscription
      metadata:
        name: sriov-network-operator-subscription
        namespace: openshift-sriov-network-operator
      spec:
        channel: "${OC_VERSION}"
        name: sriov-network-operator
        source: redhat-operators
        sourceNamespace: openshift-marketplace
      EOF
  4. To verify that the Operator is installed, enter the following command:

    $ oc get csv -n openshift-sriov-network-operator \
      -o custom-columns=Name:.metadata.name,Phase:.status.phase

    Example output

    Name                                        Phase
    sriov-network-operator.4.4.0-202006160135   Succeeded

8.2.1.2. Web console: Installing the SR-IOV Network Operator

As a cluster administrator, you can install the Operator using the web console.

Note

You must create the OperatorGroup CR by using the CLI.

Prerequisites

  • A cluster installed on bare-metal hardware with nodes that have hardware that supports SR-IOV.
  • Install the OpenShift CLI (oc).
  • An account with cluster-admin privileges.

Procedure

  1. Create a namespace for the SR-IOV Network Operator:

    1. In the OpenShift Container Platform web console, click Administration Namespaces.
    2. Click Create Namespace.
    3. In the Name field, enter openshift-sriov-network-operator, and then click Create.
    4. In the Filter by name field, enter openshift-sriov-network-operator.
    5. From the list of results, click openshift-sriov-network-operator, and then click YAML.
    6. Update the namespace by adding the following stanza to the namespace definition:

        labels:
          openshift.io/run-level: "1"
    7. Click Save.
  2. Install the SR-IOV Network Operator:

    1. In the OpenShift Container Platform web console, click Operators OperatorHub.
    2. Select SR-IOV Network Operator from the list of available Operators, and then click Install.
    3. On the Create Operator Subscription page, under A specific namespace on the cluster, select openshift-sriov-network-operator.
    4. Click Subscribe.
  3. Verify that the SR-IOV Network Operator is installed successfully:

    1. Navigate to the Operators Installed Operators page.
    2. Ensure that SR-IOV Network Operator is listed in the openshift-sriov-network-operator project with a Status of InstallSucceeded.

      Note

      During installation an Operator might display a Failed status. If the installation later succeeds with an InstallSucceeded message, you can ignore the Failed message.

      If the operator does not appear as installed, to troubleshoot further:

      • Inspect the Operator Subscriptions and Install Plans tabs for any failure or errors under Status.
      • Navigate to the Workloads Pods page and check the logs for Pods in the openshift-sriov-network-operator project.

8.2.2. Next steps

8.3. Configuring the SR-IOV Network Operator

The Single Root I/O Virtualization (SR-IOV) Network Operator manages the SR-IOV network devices and network attachments in your cluster.

8.3.1. Configuring the SR-IOV Network Operator

Important

Modifying the SR-IOV Network Operator configuration is not normally necessary. The default configuration is recommended for most use cases. Complete the steps to modify the relevant configuration only if the default behavior of the Operator is not compatible with your use case.

The SR-IOV Network Operator adds the SriovOperatorConfig.sriovnetwork.openshift.io CustomResourceDefinition resource. The operator automatically creates a SriovOperatorConfig custom resource (CR) named default in the openshift-sriov-network-operator namespace.

Note

The default CR contains the SR-IOV Network Operator configuration for your cluster. To change the operator configuration, you must modify this CR.

The SriovOperatorConfig CR provides several fields for configuring the operator:

  • enableInjector allows project administrators to enable or disable the Network Resources Injector DaemonSet.
  • enableOperatorWebhook allows project administrators to enable or disable the Operator Admission Controller webook DaemonSet.
  • configDaemonNodeSelector allows project administrators to schedule the SR-IOV Network Config Daemon on selected nodes.

8.3.1.1. About the Network Resources Injector

The Network Resources Injector is a Kubernetes Dynamic Admission Controller application. It provides the following capabilities:

  • Mutation of resource requests and limits in Pod specification to add an SR-IOV resource name according to an SR-IOV network attachment definition annotation.
  • Mutation of Pod specifications with downward API volume to expose pod annotations and labels to the running container as files under the /etc/podnetinfo path.

By default the Network Resources Injector is enabled by the SR-IOV operator and runs as a DaemonSet on all master nodes. The following is an example of Network Resources Injector Pods running in a cluster with three master nodes:

$ oc get pods -n openshift-sriov-network-operator

Example output

NAME                                      READY   STATUS    RESTARTS   AGE
network-resources-injector-5cz5p          1/1     Running   0          10m
network-resources-injector-dwqpx          1/1     Running   0          10m
network-resources-injector-lktz5          1/1     Running   0          10m

8.3.1.2. About the SR-IOV Operator admission controller webhook

The SR-IOV Operator Admission Controller webook is a Kubernetes Dynamic Admission Controller application. It provides the following capabilities:

  • Validation of the SriovNetworkNodePolicy CR when it is created or updated.
  • Mutation of the SriovNetworkNodePolicy CR by setting the default value for the priority and deviceType fields when the CR is created or updated.

By default the SR-IOV Operator Admission Controller webook is enabled by the operator and runs as a DaemonSet on all master nodes. The following is an example of the Operator Admission Controller webook Pods running in a cluster with three master nodes:

$ oc get pods -n openshift-sriov-network-operator

Example output

NAME                                      READY   STATUS    RESTARTS   AGE
operator-webhook-9jkw6                    1/1     Running   0          16m
operator-webhook-kbr5p                    1/1     Running   0          16m
operator-webhook-rpfrl                    1/1     Running   0          16m

8.3.1.3. About custom node selectors

The SR-IOV Network Config daemon discovers and configures the SR-IOV network devices on cluster nodes. By default, it is deployed to all the worker nodes in the cluster. You can use node labels to specify on which nodes the SR-IOV Network Config daemon runs.

8.3.1.4. Disabling or enabling the Network Resources Injector

To disable or enable the Network Resources Injector, which is enabled by default, complete the following procedure.

Prerequisites

  • Install the OpenShift CLI (oc).
  • Log in as a user with cluster-admin privileges.
  • You must have installed the SR-IOV Operator.

Procedure

  • Set the enableInjector field. Replace <value> with false to disable the feature or true to enable the feature.

    $ oc patch sriovoperatorconfig default \
      --type=merge -n openshift-sriov-network-operator \
      --patch '{ "spec": { "enableInjector": <value> } }'

8.3.1.5. Disabling or enabling the SR-IOV Operator admission controller webhook

To disable or enable the admission controller webhook, which is enabled by default, complete the following procedure.

Prerequisites

  • Install the OpenShift CLI (oc).
  • Log in as a user with cluster-admin privileges.
  • You must have installed the SR-IOV Operator.

Procedure

  • Set the enableOperatorWebhook field. Replace <value> with false to disable the feature or true to enable it:

    $ oc patch sriovoperatorconfig default --type=merge \
      -n openshift-sriov-network-operator \
      --patch '{ "spec": { "enableOperatorWebhook": <value> } }'

8.3.1.6. Configuring a custom NodeSelector for the SR-IOV Network Config daemon

The SR-IOV Network Config daemon discovers and configures the SR-IOV network devices on cluster nodes. By default, it is deployed to all the worker nodes in the cluster. You can use node labels to specify on which nodes the SR-IOV Network Config daemon runs.

To specify the nodes where the SR-IOV Network Config daemon is deployed, complete the following procedure.

Important

When you update the configDaemonNodeSelector field, the SR-IOV Network Config daemon is recreated on each selected node. While the daemon is recreated, cluster users are unable to apply any new SR-IOV Network node policy or create new SR-IOV pods.

Procedure

  • To update the node selector for the operator, enter the following command:

    $ oc patch sriovoperatorconfig default --type=json \
      -n openshift-sriov-network-operator \
      --patch '[{
          "op": "replace",
          "path": "/spec/configDaemonNodeSelector",
          "value": {<node-label>}
        }]'

    Replace <node-label> with a label to apply as in the following example: "node-role.kubernetes.io/worker": "".

8.3.2. Next steps

8.4. Configuring an SR-IOV network device

You can configure a Single Root I/O Virtualization (SR-IOV) device in your cluster.

8.4.1. Automated discovery of SR-IOV network devices

The SR-IOV Network Operator searches your cluster for SR-IOV capable network devices on worker nodes. The Operator creates and updates a SriovNetworkNodeState custom resource (CR) for each worker node that provides a compatible SR-IOV network device.

The CR is assigned the same name as the worker node. The status.interfaces list provides information about the network devices on a node.

Important

Do not modify a SriovNetworkNodeState CR. The Operator creates and manages these resources automatically.

8.4.1.1. Example SriovNetworkNodeState CR

The following YAML is an example of a SriovNetworkNodeState CR created by the SR-IOV Network Operator:

An SriovNetworkNodeState object

apiVersion: sriovnetwork.openshift.io/v1
kind: SriovNetworkNodeState
metadata:
  name: node-25 1
  namespace: openshift-sriov-network-operator
  ownerReferences:
  - apiVersion: sriovnetwork.openshift.io/v1
    blockOwnerDeletion: true
    controller: true
    kind: SriovNetworkNodePolicy
    name: default
spec:
  dpConfigVersion: "39824"
status:
  interfaces: 2
  - deviceID: "1017"
    driver: mlx5_core
    mtu: 1500
    name: ens785f0
    pciAddress: "0000:18:00.0"
    totalvfs: 8
    vendor: 15b3
  - deviceID: "1017"
    driver: mlx5_core
    mtu: 1500
    name: ens785f1
    pciAddress: "0000:18:00.1"
    totalvfs: 8
    vendor: 15b3
  - deviceID: 158b
    driver: i40e
    mtu: 1500
    name: ens817f0
    pciAddress: 0000:81:00.0
    totalvfs: 64
    vendor: "8086"
  - deviceID: 158b
    driver: i40e
    mtu: 1500
    name: ens817f1
    pciAddress: 0000:81:00.1
    totalvfs: 64
    vendor: "8086"
  - deviceID: 158b
    driver: i40e
    mtu: 1500
    name: ens803f0
    pciAddress: 0000:86:00.0
    totalvfs: 64
    vendor: "8086"
  syncStatus: Succeeded

1
The value of the name field is the same as the name of the worker node.
2
The interfaces stanza includes a list of all of the SR-IOV devices discovered by the Operator on the worker node.

8.4.2. Configuring SR-IOV network devices

The SR-IOV Network Operator adds the SriovNetworkNodePolicy.sriovnetwork.openshift.io CustomResourceDefinition to OpenShift Container Platform. You can configure an SR-IOV network device by creating a SriovNetworkNodePolicy custom resource (CR).

Note

When applying the configuration specified in a SriovNetworkNodePolicy CR, the SR-IOV Operator may drain the nodes, and in some cases, reboot nodes. It might take several minutes for a configuration change to apply. Ensure that there are enough available nodes in your cluster to handle the evicted workload beforehand.

Prerequisites

  • Install the OpenShift CLI (oc).
  • An account with cluster-admin privileges.
  • You must have installed the SR-IOV Operator.

Procedure

  1. Create the following SriovNetworkNodePolicy CR, and then save the YAML in the <name>-sriov-node-network.yaml file. Replace <name> with the name for this configuration.
apiVersion: sriovnetwork.openshift.io/v1
kind: SriovNetworkNodePolicy
metadata:
  name: <name> 1
  namespace: openshift-sriov-network-operator 2
spec:
  resourceName: <sriov_resource_name> 3
  nodeSelector:
    feature.node.kubernetes.io/network-sriov.capable: "true" 4
  priority: <priority> 5
  mtu: <mtu> 6
  numVfs: <num> 7
  nicSelector: 8
    vendor: "<vendor_code>" 9
    deviceID: "<device_id>" 10
    pfNames: ["<pf_name>", ...] 11
    rootDevices: ["<pci_bus_id>", "..."] 12
  deviceType: <device_type> 13
  isRdma: false 14
1
Specify a name for the CR object.
2
Specify the namespace where the SR-IOV Operator is installed.
3
Specify the resource name of the SR-IOV device plug-in. You can create multiple SriovNetworkNodePolicy CRs for a resource name.
4
Specify the node selector to select which nodes are configured. Only SR-IOV network devices on selected nodes are configured. The SR-IOV Container Network Interface (CNI) plug-in and device plug-in are deployed only on selected nodes.
5
Optional: Specify an integer value between 0 and 99. A smaller number gets higher priority, so a priority of 10 is higher than a priority of 99. The default value is 99.
6
Optional: Specify a value for the maximum transmission unit (MTU) of the virtual function. The maximum MTU value can vary for different NIC models.
7
Specify the number of the virtual functions (VF) to create for the SR-IOV physical network device. For an Intel Network Interface Card (NIC), the number of VFs cannot be larger than the total VFs supported by the device. For a Mellanox NIC, the number of VFs cannot be larger than 128.
8
The nicSelector mapping selects the Ethernet device for the Operator to configure. You do not need to specify values for all the parameters. It is recommended to identify the Ethernet adapter with enough precision to minimize the possibility of selecting an Ethernet device unintentionally. If you specify rootDevices, you must also specify a value for vendor, deviceID, or pfNames. If you specify both pfNames and rootDevices at the same time, ensure that they point to an identical device.
9
Optional: Specify the vendor hex code of the SR-IOV network device. The only allowed values are either 8086 or 15b3.
10
Optional: Specify the device hex code of SR-IOV network device. The only allowed values are 158b, 1015, 1017.
11
Optional: The parameter accepts an array of one or more physical function (PF) names for the Ethernet device.
12
The parameter accepts an array of one or more PCI bus addresses for the physical function of the Ethernet device. Provide the address in the following format: 0000:02:00.1.
13
Optional: Specify the driver type for the virtual functions. You can specify one of the following values: netdevice or vfio-pci. The default value is netdevice.
Note

For a Mellanox card to work in Data Plane Development Kit (DPDK) mode on bare metal nodes, use the netdevice driver type and set isRdma to true. For a Mellanox card to work in DPDK mode with Container-Native Virtualization (CNV), use the vfio-pci driver type and set isRdma to false.

14
Optional. Specify whether to enable remote direct memory access (RDMA) mode. The default value is false. Only RDMA over Converged Ethernet (RoCE) mode is supported on Mellanox Ethernet adapters.
Note

If isRDMA flag is set to true, you can continue to use the RDMA enabled VF as a normal network device. A device can be used in either mode.

  1. Create the SriovNetworkNodePolicy CR. Replace <name> with the name for this configuration.

    $ oc create -f <name>-sriov-node-network.yaml

    After applying the configuration update, all the Pods in sriov-network-operator namespace transition to the Running status.

  2. To verify that the SR-IOV network device is configured, enter the following command. Replace <node_name> with the name of a node with the SR-IOV network device that you just configured.

    $ oc get sriovnetworknodestates -n openshift-sriov-network-operator <node_name> -o jsonpath='{.status.syncStatus}'

8.4.3. Next steps

8.5. Configuring an SR-IOV network attachment

You can configure a network attachment for an Single Root I/O Virtualization (SR-IOV) device in the cluster.

8.5.1. Configuring SR-IOV additional network

You can configure an additional network that uses SR-IOV hardware by creating a SriovNetwork custom resource (CR). When you create a SriovNetwork CR, the SR-IOV Operator automatically creates a NetworkAttachmentDefinition CR.

Note

Do not modify or delete a SriovNetwork CR if it is attached to any Pods in the running state.

Prerequisites

  • Install the OpenShift CLI (oc).
  • Log in as a user with cluster-admin privileges.

Procedure

  1. Create the following SriovNetwork CR, and then save the YAML in the <name>-sriov-network.yaml file. Replace <name> with a name for this additional network.
apiVersion: sriovnetwork.openshift.io/v1
kind: SriovNetwork
metadata:
  name: <name> 1
  namespace: openshift-sriov-network-operator 2
spec:
  networkNamespace: <target_namespace> 3
  ipam: <ipam> 4
  vlan: <vlan> 5
  resourceName: <sriov_resource_name> 6
  linkState: <link_state> 7
  maxTxRate: <max_tx_rate> 8
  minTxRate: <min_rx_rate> 9
  vlanQoS: <vlan_qos> 10
  spoofChk: "<spoof_check>" 11
  trust: "<trust_vf>" 12
  capabilities: <capabilities> 13
1
Replace <name> with a name for the CR. The SR-IOV Network Operator creates a NetworkAttachmentDefinition CR with same name.
2
Specify the namespace where the SR-IOV Operator is installed.
3
Optional: Replace <target_namespace> with the namespace where the NetworkAttachmentDefinition CR is created. The default value is openshift-sriov-network-operator.
4
Optional: Replace <ipam> a configuration object for the ipam CNI plug-in as a YAML block scalar. The plug-in manages IP address assignment for the attachment definition.
5
Optional: Replace <vlan> with a Virtual LAN (VLAN) ID for the additional network. The integer value must be from 0 to 4095. The default value is 0.
6
Replace <sriov_resource_name> with the value for the .spec.resourceName parameter from the SriovNetworkNodePolicy CR that defines the SR-IOV hardware for this additional network.
7
Optional: Replace <link_state> with the link state of virtual function (VF). Allowed value are enable, disable and auto.
8
Optional: Replace <max_tx_rate> with a maximum transmission rate, in Mbps, for the VF.
9
Optional: Replace <min_tx_rate> with a minimum transmission rate, in Mbps, for the VF. This value should always be less than or equal to Maximum transmission rate.
Note

Intel NICs do not support the minTxRate parameter. For more information, see BZ#1772847.

10
Optional: Replace <vlan_qos> with an IEEE 802.1p priority level for the VF. The default value is 0.
11
Optional: Replace <spoof_check> with the spoof check mode of the VF. The allowed values are the strings "on" and "off".
Important

You must enclose the value you specify in quotes or the CR is rejected by the SR-IOV Network Operator.

12
Optional: Replace <trust_vf> with the trust mode of the VF. The allowed values are the strings "on" and "off".
Important

You must enclose the value you specify in quotes or the CR is rejected by the SR-IOV Network Operator.

13
Optional: Replace <capabilities> with the capabilities to configure for this network. You can specify "{ "ips": true }" to enable IP address support or "{ "mac": true }" to enable MAC address support.
  1. To create the CR object, enter the following command. Replace <name> with a name for this additional network.

    $ oc create -f <name>-sriov-network.yaml
  2. Optional: To confirm that the NetworkAttachmentDefinition CR associated with the SriovNetwork CR that you created in the previous step exists, enter the following command. Replace <namespace> with the namespace you specified in the SriovNetwork CR.

    $ oc get net-attach-def -n <namespace>

8.5.1.1. Configuration for ipam CNI plug-in

The ipam Container Network Interface (CNI) plug-in provides IP address management (IPAM) for other CNI plug-ins. You can configure ipam for either static IP address assignment or dynamic IP address assignment by using DHCP. The DHCP server you specify must be reachable from the additional network.

The following JSON configuration object describes the parameters that you can set.

8.5.1.1.1. Static IP address assignment configuration

The following JSON describes the configuration for static IP address assignment:

Static assignment configuration

{
  "ipam": {
    "type": "static",
    "addresses": [ 1
      {
        "address": "<address>", 2
        "gateway": "<gateway>" 3
      }
    ],
    "routes": [ 4
      {
        "dst": "<dst>" 5
        "gw": "<gw>" 6
      }
    ],
    "dns": { 7
      "nameservers": ["<nameserver>"], 8
      "domain": "<domain>", 9
      "search": ["<search_domain>"] 10
    }
  }
}

1
An array describing IP addresses to assign to the virtual interface. Both IPv4 and IPv6 IP addresses are supported.
2
An IP address that you specify.
3
The default gateway to route egress network traffic to.
4
An array describing routes to configure inside the Pod.
5
The IP address range in CIDR format.
6
The gateway where network traffic is routed.
7
Optional: DNS configuration.
8
An of array of one or more IP addresses for to send DNS queries to.
9
The default domain to append to a host name. For example, if the domain is set to example.com, a DNS lookup query for example-host is rewritten as example-host.example.com.
10
An array of domain names to append to an unqualified host name, such as example-host, during a DNS lookup query.
8.5.1.1.2. Dynamic IP address assignment configuration

The following JSON describes the configuration for dynamic IP address address assignment with DHCP.

Renewal of DHCP leases

A Pod obtains its original DHCP lease when it is created. The lease must be periodically renewed by a minimal DHCP server deployment running on the cluster.

The SR-IOV Network Operator does not create a DHCP server deployment; The Cluster Network Operator is responsible for creating the minimal DHCP server deployment.

To trigger the deployment of the DHCP server, you must create a shim network attachment by editing the Cluster Network Operator configuration, as in the following example:

Example shim network attachment definition

apiVersion: operator.openshift.io/v1
kind: Network
metadata:
  name: cluster
spec:
  ...
  additionalNetworks:
  - name: dhcp-shim
    namespace: default
    rawCNIConfig: |-
    {
      "name": "dhcp-shim",
      "cniVersion": "0.3.1",
      "type": "bridge",
      "master": "ens5",
      "ipam": {
        "type": "dhcp"
      }
    }

DHCP assignment configuration

{
  "ipam": {
    "type": "dhcp"
  }
}

8.5.1.1.3. Static IP address assignment configuration example

You can configure ipam for static IP address assignment:

{
  "ipam": {
    "type": "static",
      "addresses": [
        {
          "address": "191.168.1.7"
        }
      ]
  }
}
8.5.1.1.4. Dynamic IP address assignment configuration example using DHCP

You can configure ipam for DHCP:

{
  "ipam": {
    "type": "dhcp"
  }
}

8.5.1.2. Configuring static MAC and IP addresses on additional SR-IOV networks

You can configure static MAC and IP addresses on an SR-IOV network by specifying Container Network Interface (CNI) runtimeConfig data in a Pod annotation.

Prerequisites

  • Install the OpenShift CLI (oc).
  • Log in as a user with cluster-admin privileges when creating the SriovNetwork CR.

Procedure

  1. Create the following SriovNetwork CR, and then save the YAML in the <name>-sriov-network.yaml file. Replace <name> with a name for this additional network.

    apiVersion: sriovnetwork.openshift.io/v1
    kind: SriovNetwork
    metadata:
      name: <name> 1
      namespace: openshift-sriov-network-operator 2
    spec:
      networkNamespace: <target_namespace> 3
      ipam: '{ "type": "static" }' 4
      capabilities: '{ "mac": true, "ips": true }' 5
      resourceName: <sriov_resource_name> 6
    1
    Replace <name> with a name for the CR. The SR-IOV Network Operator creates a NetworkAttachmentDefinition CR with same name.
    2
    Specify the namespace where the SR-IOV Network Operator is installed.
    3
    Replace <target_namespace> with the namespace where the NetworkAttachmentDefinition CR is created.
    4
    Specify static type for the ipam CNI plug-in as a YAML block scalar.
    5
    Specify mac and ips capabilities to true.
    6
    Replace <sriov_resource_name> with the value for the spec.resourceName parameter from the SriovNetworkNodePolicy CR that defines the SR-IOV hardware for this additional network.
  2. Create the CR by running the following command:

    $ oc create -f <filename> 1
    1
    Replace <filename> with the name of the file you created in the previous step.
  3. Optional: Confirm that the NetworkAttachmentDefinition CR associated with the SriovNetwork CR that you created in the previous step exists by running the following command. Replace <namespace> with the namespace you specified in the SriovNetwork CR.

    $ oc get net-attach-def -n <namespace>
Note

Do not modify or delete a SriovNetwork Custom Resource (CR) if it is attached to any Pods in the running state.

  1. Create the following SR-IOV pod spec, and then save the YAML in the <name>-sriov-pod.yaml file. Replace <name> with a name for this pod.

    apiVersion: v1
    kind: Pod
    metadata:
      name: sample-pod
      annotations:
        k8s.v1.cni.cncf.io/networks: '[
    	{
    		"name": "<name>", 1
    		"mac": "20:04:0f:f1:88:01", 2
    		"ips": ["192.168.10.1/24", "2001::1/64"] 3
    	}
    ]'
    spec:
      containers:
      - name: sample-container
        image: <image>
        imagePullPolicy: IfNotPresent
        command: ["sleep", "infinity"]
    1
    Specify the name of the SR-IOV network attachment definition CR.
    2
    Specify the MAC address for the SR-IOV device that is allocated from the resource type defined in the SR-IOV network attachment definition CR.
    3
    Specify addresses for the SR-IOV device which is allocated from the resource type defined in the SR-IOV network attachment definition CR. Both IPv4 and IPv6 addresses are supported.
  2. Create the sample SR-IOV pod by running the following command:

    $ oc create -f <filename> 1
    1
    Replace <filename> with the name of the file you created in the previous step.
  3. Optional: Confirm that mac and ips addresses are applied to the SR-IOV device by running the following command. Replace <namespace> with the namespace you specified in the SriovNetwork CR.

    $ oc exec sample-pod -n <namespace> -- ip addr show

8.5.2. Next steps

8.6. Adding a Pod to an SR-IOV additional network

You can add a Pod to an existing Single Root I/O Virtualization (SR-IOV) network.

8.6.1. Adding a Pod to an additional network

You can add a Pod to an additional network. The Pod continues to send normal cluster-related network traffic over the default network.

When a Pod is created additional networks are attached to it. However, if a Pod already exists, you cannot attach additional networks to it.

Note

If a NetworkAttachmentDefinition is managed by the SR-IOV Network Operator, the SR-IOV Network Resource Injector adds the resource field to the Pod object automatically.

Important

When specifying an SR-IOV hardware network for a Deployment resource or a ReplicationController resource, you must specify the namespace of the NetworkAttachmentDefinition CR. For more information, see the following bugs: BZ#1846333 and BZ#1840962.

Prerequisites

  • The Pod must be in the same namespace as the additional network.
  • Install the OpenShift CLI (oc).
  • You must log in to the cluster.
  • You must have the SR-IOV Operator installed and a SriovNetwork CR defined.

Procedure

  1. Add an annotation to the Pod object. Only one of the following annotation formats can be used:

    1. To attach an additional network without any customization, add an annotation with the following format. Replace <network> with the name of the additional network to associate with the Pod:

      metadata:
        annotations:
          k8s.v1.cni.cncf.io/networks: <network>[,<network>,...] 1
      1
      To specify more than one additional network, separate each network with a comma. Do not include whitespace between the comma. If you specify the same additional network multiple times, that Pod will have multiple network interfaces attached to that network.
    2. To attach an additional network with customizations, add an annotation with the following format:

      metadata:
        annotations:
          k8s.v1.cni.cncf.io/networks: |-
            [
              {
                "name": "<network>", 1
                "namespace": "<namespace>", 2
                "default-route": ["<default-route>"] 3
              }
            ]
      1
      Specify the name of the additional network defined by a NetworkAttachmentDefinition CR.
      2
      Specify the namespace where the NetworkAttachmentDefinition CR is defined.
      3
      Optional: Specify an override for the default route, such as 192.168.17.1.
  2. To create the Pod, enter the following command. Replace <name> with the name of the Pod.

    $ oc create -f <name>.yaml
  3. Optional: To Confirm that the annotation exists in the Pod CR, enter the following command, replacing <name> with the name of the Pod.

    $ oc get pod <name> -o yaml

    In the following example, the example-pod Pod is attached to the net1 additional network:

    $ oc get pod example-pod -o yaml
    apiVersion: v1
    kind: Pod
    metadata:
      annotations:
        k8s.v1.cni.cncf.io/networks: macvlan-bridge
        k8s.v1.cni.cncf.io/networks-status: |- 1
          [{
              "name": "openshift-sdn",
              "interface": "eth0",
              "ips": [
                  "10.128.2.14"
              ],
              "default": true,
              "dns": {}
          },{
              "name": "macvlan-bridge",
              "interface": "net1",
              "ips": [
                  "20.2.2.100"
              ],
              "mac": "22:2f:60:a5:f8:00",
              "dns": {}
          }]
      name: example-pod
      namespace: default
    spec:
      ...
    status:
      ...
    1
    The k8s.v1.cni.cncf.io/networks-status parameter is a JSON array of objects. Each object describes the status of an additional network attached to the Pod. The annotation value is stored as a plain text value.

8.6.2. Creating a non-uniform memory access (NUMA) aligned SR-IOV pod

You can create a NUMA aligned SR-IOV pod by restricting SR-IOV and the CPU resources allocated from the same NUMA node with restricted or single-numa-node Topology Manager polices.

Prerequisites

  • Install the OpenShift CLI (oc).
  • Enable a LatencySensitive profile and configure the CPU Manager policy to static.

Procedure

  1. Create the following SR-IOV pod spec, and then save the YAML in the <name>-sriov-pod.yaml file. Replace <name> with a name for this pod.

    The following example shows an SR-IOV pod spec:

    apiVersion: v1
    kind: Pod
    metadata:
      name: sample-pod
      annotations:
        k8s.v1.cni.cncf.io/networks: <name> 1
    spec:
      containers:
      - name: sample-container
        image: <image> 2
        command: ["sleep", "infinity"]
        resources:
          limits:
            memory: "1Gi" 3
            cpu: "2" 4
          requests:
            memory: "1Gi"
            cpu: "2"
    1
    Replace <name> with the name of the SR-IOV network attachment definition CR.
    2
    Replace <image> with the name of the sample-pod image.
    3
    To create the SR-IOV pod with guaranteed QoS, set memory limits equal to memory requests.
    4
    To create the SR-IOV pod with guaranteed QoS, set cpu limits equals to cpu requests.
  2. Create the sample SR-IOV pod by running the following command:

    $ oc create -f <filename> 1
    1
    Replace <filename> with the name of the file you created in the previous step.
  3. Confirm that the sample-pod is configured with guaranteed QoS.

    $ oc describe pod sample-pod
  4. Confirm that the sample-pod is allocated with exclusive CPUs.

    $ oc exec sample-pod -- cat /sys/fs/cgroup/cpuset/cpuset.cpus
  5. Confirm that the SR-IOV device and CPUs that are allocated for the sample-pod are on the same NUMA node.

    $ oc exec sample-pod -- cat /sys/fs/cgroup/cpuset/cpuset.cpus

8.7. Using high performance multicast

You can use multicast on your Single Root I/O Virtualization (SR-IOV) hardware network.

8.7.1. Configuring high performance multicast

The OpenShift SDN default Container Network Interface (CNI) network provider supports multicast between Pods on the default network. This is best used for low-bandwidth coordination or service discovery, and not high-bandwidth applications. For applications such as streaming media, like Internet Protocol television (IPTV) and multipoint videoconferencing, you can utilize Single Root I/O Virtualization (SR-IOV) hardware to provide near-native performance.

When using additional SR-IOV interfaces for multicast:

  • Multicast packages must be sent or received by a Pod through the additional SR-IOV interface.
  • The physical network which connects the SR-IOV interfaces decides the multicast routing and topology, which is not controlled by OpenShift Container Platform.

8.7.2. Using an SR-IOV interface for multicast

The follow procedure creates an example SR-IOV interface for multicast.

Prerequisites

  • Install the OpenShift CLI (oc).
  • You must log in to the cluster with a user that has the cluster-admin role.

Procedure

  1. Create a SriovNetworkNodePolicy custom resource (CR):

    apiVersion: sriovnetwork.openshift.io/v1
    kind: SriovNetworkNodePolicy
    metadata:
      name: policy-example
      namespace: openshift-sriov-network-operator
    spec:
      resourceName: example
      nodeSelector:
        feature.node.kubernetes.io/network-sriov.capable: "true"
      numVfs: 4
      nicSelector:
        vendor: "8086"
        pfNames: ['ens803f0']
        rootDevices: ['0000:86:00.0']
  2. Create a SriovNetwork CR:

    apiVersion: sriovnetwork.openshift.io/v1
    kind: SriovNetwork
    metadata:
      name: net-example
      namespace: openshift-sriov-network-operator
    spec:
      networkNamespace: default
      ipam: | 1
        {
          "type": "host-local", 2
          "subnet": "10.56.217.0/24",
          "rangeStart": "10.56.217.171",
          "rangeEnd": "10.56.217.181",
          "routes": [
            {"dst": "224.0.0.0/5"},
            {"dst": "232.0.0.0/5"}
          ],
          "gateway": "10.56.217.1"
        }
      resourceName: example
    1 2
    If you choose to configure DHCP as IPAM, ensure that you provision the following default routes through your DHCP server: 224.0.0.0/5 and 232.0.0.0/5. This is to override the static multicast route set by the default network provider.
  3. Create a Pod with multicast application:

    apiVersion: v1
    kind: Pod
    metadata:
      name: testpmd
      namespace: default
      annotations:
        k8s.v1.cni.cncf.io/networks: nic1
    spec:
      containers:
      - name: example
        image: rhel7:latest
        securityContext:
          capabilities:
            add: ["NET_ADMIN"] 1
        command: [ "sleep", "infinity"]
    1
    The NET_ADMIN capability is required only if your application needs to assign the multicast IP address to the SR-IOV interface. Otherwise, it can be omitted.

8.8. Using virtual functions (VFs) with DPDK and RDMA modes

You can use Single Root I/O Virtualization (SR-IOV) network hardware with the Data Plane Development Kit (DPDK) and with remote direct memory access (RDMA).

8.8.1. Examples of using virtual functions in DPDK and RDMA modes

Important

The Data Plane Development Kit (DPDK) is a Technology Preview feature only. Technology Preview features are not supported with Red Hat production service level agreements (SLAs) and might not be functionally complete. Red Hat does not recommend using them in production. These features provide early access to upcoming product features, enabling customers to test functionality and provide feedback during the development process.

For more information about the support scope of Red Hat Technology Preview features, see https://access.redhat.com/support/offerings/techpreview/.

Important

Remote Direct Memory Access (RDMA) is a Technology Preview feature only. Technology Preview features are not supported with Red Hat production service level agreements (SLAs) and might not be functionally complete. Red Hat does not recommend using them in production. These features provide early access to upcoming product features, enabling customers to test functionality and provide feedback during the development process.

For more information about the support scope of Red Hat Technology Preview features, see https://access.redhat.com/support/offerings/techpreview/.

8.8.2. Prerequisites

  • Install the OpenShift CLI (oc).
  • Log in as a user with cluster-admin privileges.
  • You must have installed the SR-IOV Network Operator.

8.8.3. Example use of virtual function (VF) in DPDK mode with Intel NICs

Procedure

  1. Create the following SriovNetworkNodePolicy CR, and then save the YAML in the intel-dpdk-node-policy.yaml file.

    apiVersion: sriovnetwork.openshift.io/v1
    kind: SriovNetworkNodePolicy
    metadata:
      name: intel-dpdk-node-policy
      namespace: openshift-sriov-network-operator
    spec:
      resourceName: intelnics
      nodeSelector:
        feature.node.kubernetes.io/network-sriov.capable: "true"
      priority: <priority>
      numVfs: <num>
      nicSelector:
        vendor: "8086"
        deviceID: "158b"
        pfNames: ["<pf_name>", ...]
        rootDevices: ["<pci_bus_id>", "..."]
      deviceType: vfio-pci 1
    1
    Specify the driver type for the virtual functions to vfio-pci.
    Note

    Please refer to the Configuring SR-IOV network devices section for a detailed explanation on each option in SriovNetworkNodePolicy.

    + When applying the configuration specified in a SriovNetworkNodePolicy CR, the SR-IOV Operator may drain the nodes, and in some cases, reboot nodes. It may take several minutes for a configuration change to apply. Ensure that there are enough available nodes in your cluster to handle the evicted workload beforehand.

    + After the configuration update is applied, all the Pods in openshift-sriov-network-operator namespace will change to a Running status.

  2. Create the SriovNetworkNodePolicy CR by running the following command:

    $ oc create -f intel-dpdk-node-policy.yaml
  3. Create the following SriovNetwork CR, and then save the YAML in the intel-dpdk-network.yaml file.

    apiVersion: sriovnetwork.openshift.io/v1
    kind: SriovNetwork
    metadata:
      name: intel-dpdk-network
      namespace: openshift-sriov-network-operator
    spec:
      networkNamespace: <target_namespace>
      ipam: "{}" 1
      vlan: <vlan>
      resourceName: intelnics
    1
    Specify an empty object "{}" for the ipam CNI plug-in. DPDK works in userspace mode and does not require an IP address.
    Note

    Please refer to the Configuring SR-IOV additional network section for a detailed explanation on each option in SriovNetwork.

  4. Create the SriovNetworkNodePolicy CR by running the following command:

    $ oc create -f intel-dpdk-network.yaml
  5. Create the following Pod spec, and then save the YAML in the intel-dpdk-pod.yaml file.

    apiVersion: v1
    kind: Pod
    metadata:
      name: dpdk-app
      namespace: <target_namespace> 1
      annotations:
        k8s.v1.cni.cncf.io/networks: intel-dpdk-network
    spec:
      containers:
      - name: testpmd
        image: <DPDK_image> 2
        securityContext:
         capabilities:
            add: ["IPC_LOCK"] 3
        volumeMounts:
        - mountPath: /dev/hugepages 4
          name: hugepage
        resources:
          limits:
            openshift.io/intelnics: "1" 5
            memory: "1Gi"
            cpu: "4" 6
            hugepages-1Gi: "4Gi" 7
          requests:
            openshift.io/intelnics: "1"
            memory: "1Gi"
            cpu: "4"
            hugepages-1Gi: "4Gi"
        command: ["sleep", "infinity"]
      volumes:
      - name: hugepage
        emptyDir:
          medium: HugePages
    1
    Specify the same target_namespace where the SriovNetwork CR intel-dpdk-network is created. If you would like to create the Pod in a different namespace, change target_namespace in both the Pod spec and the SriovNetowrk CR.
    2
    Specify the DPDK image which includes your application and the DPDK library used by application.
    3
    Specify the IPC_LOCK capability which is required by the application to allocate hugepage memory inside container.
    4
    Mount a hugepage volume to the DPDK Pod under /dev/hugepages. The hugepage volume is backed by the emptyDir volume type with the medium being Hugepages.
    5
    Optional: Specify the number of DPDK devices allocated to DPDK Pod. This resource request and limit, if not explicitly specified, will be automatically added by the SR-IOV network resource injector. The SR-IOV network resource injector is an admission controller component managed by the SR-IOV Operator. It is enabled by default and can be disabled by setting enableInjector option to false in the default SriovOperatorConfig CR.
    6
    Specify the number of CPUs. The DPDK Pod usually requires exclusive CPUs to be allocated from the kubelet. This is achieved by setting CPU Manager policy to static and creating a Pod with Guaranteed QoS.
    7
    Specify hugepage size hugepages-1Gi or hugepages-2Mi and the quantity of hugepages that will be allocated to the DPDK Pod. Configure 2Mi and 1Gi hugepages separately. Configuring 1Gi hugepage requires adding kernel arguments to Nodes. For example, adding kernel arguments default_hugepagesz=1GB, hugepagesz=1G and hugepages=16 will result in 16*1Gi hugepages be allocated during system boot.
  6. Create the DPDK Pod by running the following command:

    $ oc create -f intel-dpdk-pod.yaml

8.8.4. Example use of a virtual function in DPDK mode with Mellanox NICs

Procedure

  1. Create the following SriovNetworkNodePolicy CR, and then save the YAML in the mlx-dpdk-node-policy.yaml file.

    apiVersion: sriovnetwork.openshift.io/v1
    kind: SriovNetworkNodePolicy
    metadata:
      name: mlx-dpdk-node-policy
      namespace: openshift-sriov-network-operator
    spec:
      resourceName: mlxnics
      nodeSelector:
        feature.node.kubernetes.io/network-sriov.capable: "true"
      priority: <priority>
      numVfs: <num>
      nicSelector:
        vendor: "15b3"
        deviceID: "1015" 1
        pfNames: ["<pf_name>", ...]
        rootDevices: ["<pci_bus_id>", "..."]
      deviceType: netdevice 2
      isRdma: true 3
    1
    Specify the device hex code of the SR-IOV network device. The only allowed values for Mellanox cards are 1015, 1017.
    2
    Specify the driver type for the virtual functions to netdevice. Mellanox SR-IOV VF can work in DPDK mode without using the vfio-pci device type. VF device appears as a kernel network interface inside a container.
    3
    Enable RDMA mode. This is required by Mellanox cards to work in DPDK mode.
    Note

    Please refer to Configuring SR-IOV network devices section for detailed explanation on each option in SriovNetworkNodePolicy.

    + When applying the configuration specified in a SriovNetworkNodePolicy CR, the SR-IOV Operator may drain the nodes, and in some cases, reboot nodes. It may take several minutes for a configuration change to apply. Ensure that there are enough available nodes in your cluster to handle the evicted workload beforehand.

    + After the configuration update is applied, all the Pods in the openshift-sriov-network-operator namespace will change to a Running status.

  2. Create the SriovNetworkNodePolicy CR by running the following command:

    $ oc create -f mlx-dpdk-node-policy.yaml
  3. Create the following SriovNetwork CR, and then save the YAML in the mlx-dpdk-network.yaml file.

    apiVersion: sriovnetwork.openshift.io/v1
    kind: SriovNetwork
    metadata:
      name: mlx-dpdk-network
      namespace: openshift-sriov-network-operator
    spec:
      networkNamespace: <target_namespace>
      ipam: |- 1
        ...
      vlan: <vlan>
      resourceName: mlxnics
    1
    Specify a configuration object for the ipam CNI plug-in as a YAML block scalar. The plug-in manages IP address assignment for the attachment definition.
    Note

    Please refer to Configuring SR-IOV additional network section for detailed explanation on each option in SriovNetwork.

  4. Create the SriovNetworkNodePolicy CR by running the following command:

    $ oc create -f mlx-dpdk-network.yaml
  5. Create the following Pod spec, and then save the YAML in the mlx-dpdk-pod.yaml file.

    apiVersion: v1
    kind: Pod
    metadata:
      name: dpdk-app
      namespace: <target_namespace> 1
      annotations:
        k8s.v1.cni.cncf.io/networks: mlx-dpdk-network
    spec:
      containers:
      - name: testpmd
        image: <DPDK_image> 2
        securityContext:
         capabilities:
            add: ["IPC_LOCK"] 3
        volumeMounts:
        - mountPath: /dev/hugepages 4
          name: hugepage
        resources:
          limits:
            openshift.io/mlxnics: "1" 5
            memory: "1Gi"
            cpu: "4" 6
            hugepages-1Gi: "4Gi" 7
          requests:
            openshift.io/mlxnics: "1"
            memory: "1Gi"
            cpu: "4"
            hugepages-1Gi: "4Gi"
        command: ["sleep", "infinity"]
      volumes:
      - name: hugepage
        emptyDir:
          medium: HugePages
    1
    Specify the same target_namespace where SriovNetwork CR mlx-dpdk-network is created. If you would like to create the Pod in a different namespace, change target_namespace in both Pod spec and SriovNetowrk CR.
    2
    Specify the DPDK image which includes your application and the DPDK library used by application.
    3
    Specify the IPC_LOCK capability which is required by the application to allocate hugepage memory inside the container.
    4
    Mount the hugepage volume to the DPDK Pod under /dev/hugepages. The hugepage volume is backed by the emptyDir volume type with the medium being Hugepages.
    5
    Optional: Specify the number of DPDK devices allocated to the DPDK Pod. This resource request and limit, if not explicitly specified, will be automatically added by SR-IOV network resource injector. The SR-IOV network resource injector is an admission controller component managed by SR-IOV Operator. It is enabled by default and can be disabled by setting the enableInjector option to false in the default SriovOperatorConfig CR.
    6
    Specify the number of CPUs. The DPDK Pod usually requires exclusive CPUs be allocated from kubelet. This is achieved by setting CPU Manager policy to static and creating a Pod with Guaranteed QoS.
    7
    Specify hugepage size hugepages-1Gi or hugepages-2Mi and the quantity of hugepages that will be allocated to DPDK Pod. Configure 2Mi and 1Gi hugepages separately. Configuring 1Gi hugepage requires adding kernel arguments to Nodes.
  6. Create the DPDK Pod by running the following command:

    $ oc create -f mlx-dpdk-pod.yaml

8.8.5. Example of a virtual function in RDMA mode with Mellanox NICs

RDMA over Converged Ethernet (RoCE) is the only supported mode when using RDMA on OpenShift Container Platform.

Procedure

  1. Create the following SriovNetworkNodePolicy CR, and then save the YAML in the mlx-rdma-node-policy.yaml file.

    apiVersion: sriovnetwork.openshift.io/v1
    kind: SriovNetworkNodePolicy
    metadata:
      name: mlx-rdma-node-policy
      namespace: openshift-sriov-network-operator
    spec:
      resourceName: mlxnics
      nodeSelector:
        feature.node.kubernetes.io/network-sriov.capable: "true"
      priority: <priority>
      numVfs: <num>
      nicSelector:
        vendor: "15b3"
        deviceID: "1015" 1
        pfNames: ["<pf_name>", ...]
        rootDevices: ["<pci_bus_id>", "..."]
      deviceType: netdevice 2
      isRdma: true 3
    1
    Specify the device hex code of SR-IOV network device. The only allowed values for Mellanox cards are 1015, 1017.
    2
    Specify the driver type for the virtual functions to netdevice.
    3
    Enable RDMA mode.
    Note

    Please refer to the Configuring SR-IOV network devices section for a detailed explanation on each option in SriovNetworkNodePolicy.

    + When applying the configuration specified in a SriovNetworkNodePolicy CR, the SR-IOV Operator may drain the nodes, and in some cases, reboot nodes. It may take several minutes for a configuration change to apply. Ensure that there are enough available nodes in your cluster to handle the evicted workload beforehand.

    + After the configuration update is applied, all the Pods in the openshift-sriov-network-operator namespace will change to a Running status.

  2. Create the SriovNetworkNodePolicy CR by running the following command:

    $ oc create -f mlx-rdma-node-policy.yaml
  3. Create the following SriovNetwork CR, and then save the YAML in the mlx-rdma-network.yaml file.

    apiVersion: sriovnetwork.openshift.io/v1
    kind: SriovNetwork
    metadata:
      name: mlx-rdma-network
      namespace: openshift-sriov-network-operator
    spec:
      networkNamespace: <target_namespace>
      ipam: |- 1
        ...
      vlan: <vlan>
      resourceName: mlxnics
    1
    Specify a configuration object for the ipam CNI plug-in as a YAML block scalar. The plug-in manages IP address assignment for the attachment definition.
    Note

    Please refer to Configuring SR-IOV additional network section for detailed explanation on each option in SriovNetwork.

  4. Create the SriovNetworkNodePolicy CR by running the following command:

    $ oc create -f mlx-rdma-network.yaml
  5. Create the following Pod spec, and then save the YAML in the mlx-rdma-pod.yaml file.

    apiVersion: v1
    kind: Pod
    metadata:
      name: rdma-app
      namespace: <target_namespace> 1
      annotations:
        k8s.v1.cni.cncf.io/networks: mlx-rdma-network
    spec:
      containers:
      - name: testpmd
        image: <RDMA_image> 2
        securityContext:
         capabilities:
            add: ["IPC_LOCK"] 3
        volumeMounts:
        - mountPath: /dev/hugepages 4
          name: hugepage
        resources:
          limits:
            memory: "1Gi"
            cpu: "4" 5
            hugepages-1Gi: "4Gi" 6
          requests:
            memory: "1Gi"
            cpu: "4"
            hugepages-1Gi: "4Gi"
        command: ["sleep", "infinity"]
      volumes:
      - name: hugepage
        emptyDir:
          medium: HugePages
    1
    Specify the same target_namespace where SriovNetwork CR mlx-rdma-network is created. If you would like to create the Pod in a different namespace, change target_namespace in both Pod spec and SriovNetowrk CR.
    2
    Specify the RDMA image which includes your application and RDMA library used by application.
    3
    Specify the IPC_LOCK capability which is required by the application to allocate hugepage memory inside the container.
    4
    Mount the hugepage volume to RDMA Pod under /dev/hugepages. The hugepage volume is backed by the emptyDir volume type with the medium being Hugepages.
    5
    Specify number of CPUs. The RDMA Pod usually requires exclusive CPUs be allocated from the kubelet. This is achieved by setting CPU Manager policy to static and create Pod with Guaranteed QoS.
    6
    Specify hugepage size hugepages-1Gi or hugepages-2Mi and the quantity of hugepages that will be allocated to the RDMA Pod. Configure 2Mi and 1Gi hugepages separately. Configuring 1Gi hugepage requires adding kernel arguments to Nodes.
  6. Create the RDMA Pod by running the following command:

    $ oc create -f mlx-rdma-pod.yaml
Red Hat logoGithubRedditYoutubeTwitter

자세한 정보

평가판, 구매 및 판매

커뮤니티

Red Hat 문서 정보

Red Hat을 사용하는 고객은 신뢰할 수 있는 콘텐츠가 포함된 제품과 서비스를 통해 혁신하고 목표를 달성할 수 있습니다.

보다 포괄적 수용을 위한 오픈 소스 용어 교체

Red Hat은 코드, 문서, 웹 속성에서 문제가 있는 언어를 교체하기 위해 최선을 다하고 있습니다. 자세한 내용은 다음을 참조하세요.Red Hat 블로그.

Red Hat 소개

Red Hat은 기업이 핵심 데이터 센터에서 네트워크 에지에 이르기까지 플랫폼과 환경 전반에서 더 쉽게 작업할 수 있도록 강화된 솔루션을 제공합니다.

© 2024 Red Hat, Inc.