이 콘텐츠는 선택한 언어로 제공되지 않습니다.

Chapter 6. Deploying SR-IOV technologies


You can achieve near bare metal performance with single root I/O virtualization (SR-IOV), by allowing instances from OpenStack direct access to a shared PCIe resource through virtual resources.

6.1. Prerequisites

Note

Do not manually edit values in /etc/tuned/cpu-partitioning-variables.conf that are modified by Director heat templates.

6.2. Configuring SR-IOV

Note

The CPU assignments, memory allocation and NIC configurations of the following examples might differ from your topology and use case.

  1. Generate the built-in ComputeSriov to define nodes in the OpenStack cluster that run NeutronSriovAgent, NeutronSriovHostConfig and default compute services.

    # openstack overcloud roles generate \
    -o /home/stack/templates/roles_data.yaml \
    Controller ComputeSriov
    Copy to Clipboard
  2. To prepare the SR-IOV containers, include the neutron-sriov.yaml and roles_data.yaml files when you generate the overcloud_images.yaml file.

    SERVICES=\
    /usr/share/openstack-tripleo-heat-templates/environments/services
    
    openstack tripleo container image prepare \
    --namespace=registry.redhat.io/rhosp15-rhel8 \
    --push-destination=192.168.24.1:8787 \
    --prefix=openstack- \
    --tag-from-label {version}-{release} \
    -e ${SERVICES}/neutron-sriov.yaml \
    --roles-file /home/stack/templates/roles_data.yaml \
    --output-env-file=/home/stack/templates/overcloud_images.yaml \
    --output-images-file=/home/stack/local_registry_images.yaml
    Copy to Clipboard
    Note

    The push-destination IP address is the address that you previously set with the local_ip parameter in the undercloud.conf configuration file.

    For more information on container image preparation, see Director Installation and Usage.

  3. To apply the KernelAgs and TunedProfile parameters, include the host-config-and-reboot.yaml file from /usr/share/openstack-tripleo-heat-templates/environments with your deployment script.

    openstack overcloud deploy --templates \
    … \
    -e /usr/share/openstack-tripleo-heat-templates/environments/host-config-and-reboot.yaml \
    ...
    Copy to Clipboard
  4. Configure the parameters for the SR-IOV nodes under parameter_defaults appropriately for your cluster, and your hardware configuration. These settings typically belong in the network-environment.yaml file.

      NeutronNetworkType: 'vlan'
      NeutronNetworkVLANRanges:
        - tenant:22:22
        - tenant:25:25
      NeutronTunnelTypes: ''
    Copy to Clipboard
  5. In the same file, configure role specific parameters for SR-IOV compute nodes.

    Note

    The numvfs parameter replaces the NeutronSriovNumVFs parameter in the network configuration templates. Red Hat does not support modification of the NeutronSriovNumVFs parameter or the numvfs parameter after deployment. If you modify either parameter after deployment, it might cause a disruption for the running instances that have an SR-IOV port on that physical function (PF). In this case, you must hard reboot these instances to make the SR-IOV PCI device available again.

      ComputeSriovParameters:
        IsolCpusList: "1-19,21-39"
        KernelArgs: "default_hugepagesz=1GB hugepagesz=1G hugepages=32 iommu=pt intel_iommu=on isolcpus=1-19,21-39"
        TunedProfileName: "cpu-partitioning"
        NeutronBridgeMappings:
          - tenant:br-link0
        NeutronPhysicalDevMappings:
          - tenant:p7p1
          - tenant:p7p2
        NeutronSriovNumVFs:
          - p7p1:5
          - p7p2:5
        NovaPCIPassthrough:
          - devname: "p7p1"
            physical_network: "tenant"
          - devname: "p7p2"
            physical_network: "tenant"
        NovaVcpuPinSet: '1-19,21-39'
        NovaReservedHostMemory: 4096
    Copy to Clipboard
  6. Configure the SR-IOV-enabled interfaces in the compute.yaml network configuration template. To create SR-IOV virtual functions (VFs), configure the interfaces as standalone NICs:

                 - type: interface
                    name: p7p3
                    mtu: 9000
                    use_dhcp: false
                    defroute: false
                    nm_controlled: true
                    hotplug: true
    
                  - type: interface
                    name: p7p4
                    mtu: 9000
                    use_dhcp: false
                    defroute: false
                    nm_controlled: true
                    hotplug: true
    Copy to Clipboard
  7. Ensure that the list of default filters includes the value AggregateInstanceExtraSpecsFilter.

    NovaSchedulerDefaultFilters: ['AvailabilityZoneFilter','ComputeFilter','ComputeCapabilitiesFilter','ImagePropertiesFilter','ServerGroupAntiAffinityFilter','ServerGroupAffinityFilter','PciPassthroughFilter','AggregateInstanceExtraSpecsFilter']
    Copy to Clipboard
  8. Deploy the overcloud.
TEMPLATES_HOME="/usr/share/openstack-tripleo-heat-templates"
CUSTOM_TEMPLATES="/home/stack/templates"

openstack overcloud deploy --templates \
  -r ${CUSTOM_TEMPLATES}/roles_data.yaml \
  -e ${TEMPLATES_HOME}/environments/host-config-and-reboot.yaml \
  -e ${TEMPLATES_HOME}/environments/services/neutron-ovs.yaml \
  -e ${TEMPLATES_HOME}/environments/services/neutron-sriov.yaml \
  -e ${CUSTOM_TEMPLATES}/network-environment.yaml
Copy to Clipboard

6.3. NIC Partitioning (Technology Preview)

This feature is available in this release as a Technology Preview, and therefore is not fully supported by Red Hat. It should only be used for testing, and should not be deployed in a production environment. For more information about Technology Preview features, see Scope of Coverage Details.

You can configure single root I/O virtualization (SR-IOV) so that an Red Hat OpenStack Platform host can use virtual functions (VFs).

When you partition a single, high-speed NIC into multiple VFs, you can use the NIC for both control and data plane traffic. You can then apply a QoS (Quality of Service) priority value to VF interfaces as desired.

Procedure

Ensure that you complete the following steps when creating the templates for an overcloud deployment:

  1. Use the interface type sriov_pf in an os-net-config role file to configure a physical function (PF) that the host can use.

            - type: sriov_pf
              name: <interface name>
              use_dhcp: false
              numvfs: <number of vfs>
              promisc: <true/false> #optional (Defaults to true)
    Copy to Clipboard
    Note

    The numvfs parameter replaces the NeutronSriovNumVFs parameter in the network configuration templates. Red Hat does not support modification of the NeutronSriovNumVFs parameter or the numvfs parameter after deployment. If you modify either parameter after deployment, it might cause a disruption for the running instances that have an SR-IOV port on that PF. In this case, you must hard reboot these instances to make the SR-IOV PCI device available again.

  1. Use the interface type sriov_pf to configure VFs in a bond that the host can use.

                   - type: linux_bond
                     name: internal_bond
                     bonding_options: mode=active-backup
                     use_dhcp: false
                     members:
                     - type: sriov_vf
                       device: nic7
                       vfid: 1
                     - type: sriov_vf
                       device: nic8
                       vfid: 1
    
                   - type: vlan
                     vlan_id:
                       get_param: InternalApiNetworkVlanID
                     device: internal_bond
                     addresses:
                     - ip_netmask:
                         get_param: InternalApiIpSubnet
    Copy to Clipboard
    • The VLAN tag must be unique across all VFs that belong to a common PF device. You must assign VLAN tags to an interface type:

      • linux_bond
      • ovs_bridge
      • ovs_dpdk_port
    • The applicable VF ID range starts at zero, and ends at the total number of VFs minus one.
  2. To reserve VFs for VMs, use the NovaPCIPassthrough parameter. You must assign a regex value to the address parameter to identify the VFs that you want to pass through to Nova, to be used by the virtual instances and not the host.

    You can obtain these values from lspci. You might need to pre-emptively boot a compute node into a Linux environment to obtain this information.

    The lspci command returns the address of each device in the format <bus>:<device>:<slot>. Enter these address values in the NovaPCIPassthrough parameter in the following format:

      NovaPCIPassthrough:
        - physical_network: "sriovnet2"
          address: {"domain": ".*", "bus": "06", "slot": "11", "function": "[5-7]"}
        - physical_network: "sriovnet2"
          address: {"domain": ".*", "bus": "06", "slot": "10", "function": "[5]"}
    Copy to Clipboard
  3. Ensure that IOMMU is enabled on all nodes that require NIC partitioning. For example, if you want NIC Partitioning for compute nodes, enable IOMMU using the KernelArgs parameter for that role:

    parameter_defaults:
      ComputeParameters:
        KernelArgs: "intel_iommu=on iommu=pt"
    Copy to Clipboard

Validation

  1. Check the number of VFs.

    [root@overcloud-compute-0 heat-admin]# cat /sys/class/net/p4p1/device/sriov_numvfs
    10
    [root@overcloud-compute-0 heat-admin]# cat /sys/class/net/p4p2/device/sriov_numvfs
    10
    Copy to Clipboard
  2. Check Linux bonds.

    [root@overcloud-compute-0 heat-admin]# cat /proc/net/bonding/intapi_bond
    Ethernet Channel Bonding Driver: v3.7.1 (April 27, 2011)
    
    Bonding Mode: fault-tolerance (active-backup)
    Primary Slave: None
    Currently Active Slave: p4p1_1
    MII Status: up
    MII Polling Interval (ms): 0
    Up Delay (ms): 0
    Down Delay (ms): 0
    
    Slave Interface: p4p1_1
    MII Status: up
    Speed: 10000 Mbps
    Duplex: full
    Link Failure Count: 0
    Permanent HW addr: 16:b4:4c:aa:f0:a8
    Slave queue ID: 0
    
    Slave Interface: p4p2_1
    MII Status: up
    Speed: 10000 Mbps
    Duplex: full
    Link Failure Count: 0
    Permanent HW addr: b6:be:82:ac:51:98
    Slave queue ID: 0
    [root@overcloud-compute-0 heat-admin]# cat /proc/net/bonding/st_bond
    Ethernet Channel Bonding Driver: v3.7.1 (April 27, 2011)
    
    Bonding Mode: fault-tolerance (active-backup)
    Primary Slave: None
    Currently Active Slave: p4p1_3
    MII Status: up
    MII Polling Interval (ms): 0
    Up Delay (ms): 0
    Down Delay (ms): 0
    
    Slave Interface: p4p1_3
    MII Status: up
    Speed: 10000 Mbps
    Duplex: full
    Link Failure Count: 0
    Permanent HW addr: 9a:86:b7:cc:17:e4
    Slave queue ID: 0
    
    Slave Interface: p4p2_3
    MII Status: up
    Speed: 10000 Mbps
    Duplex: full
    Link Failure Count: 0
    Permanent HW addr: d6:07:f8:78:dd:5b
    Slave queue ID: 0
    Copy to Clipboard
  3. List OVS bonds

    [root@overcloud-compute-0 heat-admin]# ovs-appctl bond/show
    ---- bond_prov ----
    bond_mode: active-backup
    bond may use recirculation: no, Recirc-ID : -1
    bond-hash-basis: 0
    updelay: 0 ms
    downdelay: 0 ms
    lacp_status: off
    lacp_fallback_ab: false
    active slave mac: f2:ad:c7:00:f5:c7(dpdk2)
    
    slave dpdk2: enabled
      active slave
      may_enable: true
    
    slave dpdk3: enabled
      may_enable: true
    
    ---- bond_tnt ----
    bond_mode: active-backup
    bond may use recirculation: no, Recirc-ID : -1
    bond-hash-basis: 0
    updelay: 0 ms
    downdelay: 0 ms
    lacp_status: off
    lacp_fallback_ab: false
    active slave mac: b2:7e:b8:75:72:e8(dpdk0)
    
    slave dpdk0: enabled
      active slave
      may_enable: true
    
    slave dpdk1: enabled
      may_enable: true
    Copy to Clipboard
  4. Show OVS connections.

    [root@overcloud-compute-0 heat-admin]# ovs-vsctl show
    cec12069-9d4c-4fa8-bfe4-decfdf258f49
        Manager "ptcp:6640:127.0.0.1"
            is_connected: true
        Bridge br-tenant
            fail_mode: standalone
            Port br-tenant
                Interface br-tenant
                    type: internal
            Port bond_tnt
                Interface "dpdk0"
                    type: dpdk
                    options: {dpdk-devargs="0000:82:02.2"}
                Interface "dpdk1"
                    type: dpdk
                    options: {dpdk-devargs="0000:82:04.2"}
        Bridge "sriov2"
            Controller "tcp:127.0.0.1:6633"
                is_connected: true
            fail_mode: secure
            Port "phy-sriov2"
                Interface "phy-sriov2"
                    type: patch
                    options: {peer="int-sriov2"}
            Port "sriov2"
                Interface "sriov2"
                    type: internal
        Bridge br-int
            Controller "tcp:127.0.0.1:6633"
                is_connected: true
            fail_mode: secure
            Port "int-sriov2"
                Interface "int-sriov2"
                    type: patch
                    options: {peer="phy-sriov2"}
            Port br-int
                Interface br-int
                    type: internal
            Port "vhu93164679-22"
                tag: 4
                Interface "vhu93164679-22"
                    type: dpdkvhostuserclient
                    options: {vhost-server-path="/var/lib/vhost_sockets/vhu93164679-22"}
            Port "vhu5d6b9f5a-0d"
                tag: 3
                Interface "vhu5d6b9f5a-0d"
                    type: dpdkvhostuserclient
                    options: {vhost-server-path="/var/lib/vhost_sockets/vhu5d6b9f5a-0d"}
            Port patch-tun
                Interface patch-tun
                    type: patch
                    options: {peer=patch-int}
            Port "int-sriov1"
                Interface "int-sriov1"
                    type: patch
                    options: {peer="phy-sriov1"}
            Port int-br-vfs
                Interface int-br-vfs
                    type: patch
                    options: {peer=phy-br-vfs}
        Bridge br-vfs
            Controller "tcp:127.0.0.1:6633"
                is_connected: true
            fail_mode: secure
            Port phy-br-vfs
                Interface phy-br-vfs
                    type: patch
                    options: {peer=int-br-vfs}
            Port bond_prov
                Interface "dpdk3"
                    type: dpdk
                    options: {dpdk-devargs="0000:82:04.5"}
                Interface "dpdk2"
                    type: dpdk
                    options: {dpdk-devargs="0000:82:02.5"}
            Port br-vfs
                Interface br-vfs
                    type: internal
        Bridge "sriov1"
            Controller "tcp:127.0.0.1:6633"
                is_connected: true
            fail_mode: secure
            Port "sriov1"
                Interface "sriov1"
                    type: internal
            Port "phy-sriov1"
                Interface "phy-sriov1"
                    type: patch
                    options: {peer="int-sriov1"}
        Bridge br-tun
            Controller "tcp:127.0.0.1:6633"
                is_connected: true
            fail_mode: secure
            Port br-tun
                Interface br-tun
                    type: internal
            Port patch-int
                Interface patch-int
                    type: patch
                    options: {peer=patch-tun}
            Port "vxlan-0a0a7315"
                Interface "vxlan-0a0a7315"
                    type: vxlan
                    options: {df_default="true", in_key=flow, local_ip="10.10.115.10", out_key=flow, remote_ip="10.10.115.21"}
        ovs_version: "2.10.0"
    Copy to Clipboard

If you used NovaPCIPassthrough to pass VFs to instances, test by deploying an SR-IOV instance.

6.4. Configuring Hardware Offload (Technology Preview)

This feature is available in this release as a Technology Preview, and therefore is not fully supported by Red Hat. It should only be used for testing, and should not be deployed in a production environment. For more information about Technology Preview features, see Scope of Coverage Details.

Open vSwitch (OVS) hardware offload incorporates single root I/O virtualization (SR-IOV), and has some similar configuration steps.

6.4.1. Enabling OVS hardware offload

To enable OVS hardware offload, complete the following steps.

  1. Generate the ComputeSriov role:

    openstack overcloud roles generate -o roles_data.yaml Controller ComputeSriov
    Copy to Clipboard
  2. Configure the physical_network parameter to match your environment.

    • For VLAN, set the physical_network parameter to the name of the network you create in neutron after deployment. This value should also be in NeutronBridgeMappings.
    • For VXLAN, set the physical_network parameter to the string value null.
    • Ensure the OvsHwOffload parameter under role specific parameters has a value of true.

      Example:

      parameter_defaults:
        ComputeSriovParameters:
          IsolCpusList: 2-9,21-29,11-19,31-39
          KernelArgs: "default_hugepagesz=1GB hugepagesz=1G hugepages=128 intel_iommu=on iommu=pt"
          OvsHwOffload: true
          TunedProfileName: "cpu-partitioning"
          NeutronBridgeMappings:
            - tenant:br-tenant
          NeutronPhysicalDevMappings:
            - tenant:p7p1
            - tenant:p7p2
          NovaPCIPassthrough:
            - devname: "p7p1"
              physical_network: "null"
            - devname: "p7p2"
              physical_network: "null"
          NovaReservedHostMemory: 4096
          NovaVcpuPinSet: 1-9,21-29,11-19,31-39
      Copy to Clipboard
  3. Ensure that the list of default filters includes the value NUMATopologyFilter:

      NovaSchedulerDefaultFilters: [\'RetryFilter',\'AvailabilityZoneFilter',\'ComputeFilter',\'ComputeCapabilitiesFilter',\'ImagePropertiesFilter',\'ServerGroupAntiAffinityFilter',\'ServerGroupAffinityFilter',\'PciPassthroughFilter',\'NUMATopologyFilter']
    Copy to Clipboard
  4. Configure one or more network interfaces intended for hardware offload in the compute-sriov.yaml configuration file:

    Note

    Do not use the NeutronSriovNumVFs parameter when configuring Open vSwitch hardware offload. The number of virtual functions will be specified using the numvfs parameter in a network configuration file used by os-net-config.

      - type: ovs_bridge
        name: br-tenant
        mtu: 9000
        members:
        - type: sriov_pf
          name: p7p1
          numvfs: 5
          mtu: 9000
          primary: true
          promisc: true
          use_dhcp: false
          link_mode: switchdev
    Copy to Clipboard
    Note

    Do not configure Mellanox network interfaces as a nic-config interface type ovs-vlan because this prevents tunnel endpoints such as VXLAN from passing traffic due to driver limitations.

  5. Include the following files during the deployment of the overcloud:

    • ovs-hw-offload.yaml
    • host-config-and-reboot.yaml

      TEMPLATES_HOME=”/usr/share/openstack-tripleo-heat-templates”
      CUSTOM_TEMPLATES=”/home/stack/templates”
      
      openstack overcloud deploy --templates \
        -r ${CUSTOME_TEMPLATES}/roles_data.yaml \
        -e ${TEMPLATES_HOME}/environments/ovs-hw-offload.yaml \
        -e ${TEMPLATES_HOME}/environments/host-config-and-reboot.yaml \
        -e ${CUSTOME_TEMPLATES}/network-environment.yaml \
        -e ${CUSTOME_TEMPLATES}/neutron-ovs.yaml
      Copy to Clipboard

6.4.2. Verifying OVS hardware offload

  1. Confirm that a pci device has its mode configured as switchdev:

    # devlink dev eswitch show pci/0000:03:00.0
    pci/0000:03:00.0: mode switchdev inline-mode none encap enable
    Copy to Clipboard
  2. Verify offload is enabled in OVS:

    # ovs-vsctl get Open_vSwitch . other_config:hw-offload
    “true”
    Copy to Clipboard

6.5. Deploying an instance for SR-IOV

Red Hat recommends using host aggregates to separate high performance compute hosts. For information on creating host aggregates and associated flavors for scheduling, see Creating host aggregates.

Note

You should use host aggregates to separate CPU pinned instances from unpinned instances. Instances that do not use CPU pinning do not fulfill the resourcing requirements of instances that use CPU pinning.

To deploy an instance for single root I/O virtualization (SR-IOV), perform the following steps:

  1. Create a flavor.

    # openstack flavor create <flavor> --ram <MB> --disk <GB> --vcpus <#>
    Copy to Clipboard
  2. Create the network.

    # openstack network create net1 --provider-physical-network tenant --provider-network-type vlan --provider-segment <VLAN-ID>
    # openstack subnet create subnet1 --network net1 --subnet-range 192.0.2.0/24 --dhcp
    Copy to Clipboard
  3. Create the port.

    • Use vnic-type direct to create an SR-IOV virtual function (VF) port.

      # openstack port create --network net1 --vnic-type direct sriov_port
      Copy to Clipboard
    • Use the following to create a virtual function with hardware offload.

      # openstack port create --network net1 --vnic-type direct --binding-profile '{"capabilities": ["switchdev"]} sriov_hwoffload_port
      Copy to Clipboard
    • Use vnic-type direct-physical to create an SR-IOV PF port.

      # openstack port create --network net1 --vnic-type direct-physical sriov_port
      Copy to Clipboard
  4. Deploy an instance.

    # openstack server create --flavor <flavor> --image <image> --nic port-id=<id> <instance name>
    Copy to Clipboard

6.6. Creating host aggregates

For increased performance, Red Hat recommends deploying guests using cpu pinning and huge pages. You can schedule high performance instances on a subset of hosts by matching aggregate metadata with flavor metadata.

  1. Ensure that the AggregateInstanceExtraSpecsFilter value is included in the scheduler_default_filters parameter in the nova.conf configuration file. You can set this configuration through the heat parameter NovaSchedulerDefaultFilters under role-specific parameters before deployment.

      ComputeOvsDpdkSriovParameters:
        NovaSchedulerDefaultFilters: ['AggregateInstanceExtraSpecsFilter', 'RetryFilter','AvailabilityZoneFilter','ComputeFilter','ComputeCapabilitiesFilter','ImagePropertiesFilter','ServerGroupAntiAffinityFilter','ServerGroupAffinityFilter','PciPassthroughFilter','NUMATopologyFilter']
    Copy to Clipboard
    Note

    This parameter can be added to heat templates and the original deployment script re-run to add this to the configuration of an exiting cluster.

  2. Create an aggregate group for single root I/O virtualization (SR-IOV) and add relevant hosts. Define metadata, for example, sriov=true, that matches defined flavor metadata.

    # openstack aggregate create sriov_group
    # openstack aggregate add host sriov_group compute-sriov-0.localdomain
    # openstack aggregate set sriov_group sriov=true
    Copy to Clipboard
  3. Create a flavor.

    # openstack flavor create <flavor> --ram <MB> --disk <GB> --vcpus <#>
    Copy to Clipboard
  4. Set additional flavor properties. Note that the defined metadata, sriov=true, matches the defined metadata on the SR-IOV aggregate.

    openstack flavor set --property sriov=true --property hw:cpu_policy=dedicated --property hw:mem_page_size=1GB <flavor>
    Copy to Clipboard
맨 위로 이동
Red Hat logoGithubredditYoutubeTwitter

자세한 정보

평가판, 구매 및 판매

커뮤니티

Red Hat 문서 정보

Red Hat을 사용하는 고객은 신뢰할 수 있는 콘텐츠가 포함된 제품과 서비스를 통해 혁신하고 목표를 달성할 수 있습니다. 최신 업데이트를 확인하세요.

보다 포괄적 수용을 위한 오픈 소스 용어 교체

Red Hat은 코드, 문서, 웹 속성에서 문제가 있는 언어를 교체하기 위해 최선을 다하고 있습니다. 자세한 내용은 다음을 참조하세요.Red Hat 블로그.

Red Hat 소개

Red Hat은 기업이 핵심 데이터 센터에서 네트워크 에지에 이르기까지 플랫폼과 환경 전반에서 더 쉽게 작업할 수 있도록 강화된 솔루션을 제공합니다.

Theme

© 2025 Red Hat