Chapter 6. Deploying SR-IOV technologies
In your Red Hat OpenStack Platform NFV deployment, you can achieve higher performance with single root I/O virtualization (SR-IOV), when you configure direct access from your instances to a shared PCIe resource through virtual resources.
6.1. Prerequisites Copy linkLink copied to clipboard!
- For details on how to install and configure the undercloud before deploying the overcloud, see the Director Installation and Usage Guide.
Do not manually edit any values in /etc/tuned/cpu-partitioning-variables.conf that director heat templates modify.
6.2. Configuring SR-IOV Copy linkLink copied to clipboard!
The following CPU assignments, memory allocation, and NIC configurations are examples, and might be different from your use case.
Generate the built-in
ComputeSriovrole to define nodes in the OpenStack cluster that runNeutronSriovAgent,NeutronSriovHostConfig, and default compute services.# openstack overcloud roles generate \ -o /home/stack/templates/roles_data.yaml \ Controller ComputeSriovTo prepare the SR-IOV containers, include the
neutron-sriov.yamlandroles_data.yamlfiles when you generate theovercloud_images.yamlfile.sudo openstack tripleo container image prepare \ --roles-file ~/templates/roles_data.yaml \ -e /usr/share/openstack-tripleo-heat-templates/environments/services/neutron-sriov.yaml \ -e ~/containers-prepare-parameter.yaml \ --output-env-file=/home/stack/templates/overcloud_images.yamlFor more information on container image preparation, see Director Installation and Usage.
Configure the parameters for the SR-IOV nodes under
parameter_defaultsappropriately for your cluster, and your hardware configuration. Typically, you add these settings to thenetwork-environment.yamlfile.NeutronNetworkType: 'vlan' NeutronNetworkVLANRanges: - tenant:22:22 - tenant:25:25 NeutronTunnelTypes: ''In the same file, configure role specific parameters for SR-IOV compute nodes.
NoteThe
numvfsparameter replaces theNeutronSriovNumVFsparameter in the network configuration templates. Red Hat does not support modification of theNeutronSriovNumVFsparameter or thenumvfsparameter after deployment. If you modify either parameter after deployment, it might cause a disruption for the running instances that have an SR-IOV port on that physical function (PF). In this case, you must hard reboot these instances to make the SR-IOV PCI device available again. TheNovaVcpuPinSetparameter is now deprecated, and is replaced byNovaComputeCpuDedicatedSetfor dedicated, pinned workflows.ComputeSriovParameters: IsolCpusList: "1-19,21-39" KernelArgs: "default_hugepagesz=1GB hugepagesz=1G hugepages=32 iommu=pt intel_iommu=on isolcpus=1-19,21-39" TunedProfileName: "cpu-partitioning" NeutronBridgeMappings: - tenant:br-link0 NeutronPhysicalDevMappings: - tenant:p7p1 - tenant:p7p2 NovaPCIPassthrough: - devname: "p7p1" physical_network: "tenant" - devname: "p7p2" physical_network: "tenant" NovaComputeCpuDedicatedSet: '1-19,21-39' NovaReservedHostMemory: 4096Configure the SR-IOV enabled interfaces in the
compute.yamlnetwork configuration template. To create SR-IOV virtual functions (VFs), configure the interfaces as standalone NICs:- type: sriov_pf name: p7p3 mtu: 9000 numvfs: 10 use_dhcp: false defroute: false nm_controlled: true hotplug: true promisc: false - type: sriov_pf name: p7p4 mtu: 9000 numvfs: 10 use_dhcp: false defroute: false nm_controlled: true hotplug: true promisc: falseEnsure that the list of default filters includes the value
AggregateInstanceExtraSpecsFilter.NovaSchedulerDefaultFilters: ['AvailabilityZoneFilter','ComputeFilter','ComputeCapabilitiesFilter','ImagePropertiesFilter','ServerGroupAntiAffinityFilter','ServerGroupAffinityFilter','PciPassthroughFilter','AggregateInstanceExtraSpecsFilter']-
Run the
overcloud_deploy.shscript.
6.3. NIC partitioning Copy linkLink copied to clipboard!
This feature is available in this release as a Technology Preview, and therefore is not fully supported by Red Hat. It should only be used for testing, and should not be deployed in a production environment. For more information about Technology Preview features, see Scope of Coverage Details.
You can configure single root I/O virtualization (SR-IOV) so that an Red Hat OpenStack Platform host can use virtual functions (VFs).
When you partition a single, high-speed NIC into multiple VFs, you can use the NIC for both control and data plane traffic. You can then apply a QoS (Quality of Service) priority value to VF interfaces as desired.
Procedure
Ensure that you complete the following steps when creating the templates for an overcloud deployment:
Use the interface type
sriov_pfin anos-net-configrole file to configure a physical function that the host can use.- type: sriov_pf name: <interface name> use_dhcp: false numvfs: <number of vfs> promisc: <true/false> #optional (Defaults to true)NoteThe
numvfsparameter replaces theNeutronSriovNumVFsparameter in the network configuration templates. Red Hat does not support modification of theNeutronSriovNumVFsparameter or thenumvfsparameter after deployment. If you modify either parameter after deployment, it might cause a disruption for the running instances that have an SR-IOV port on that physical function (PF). In this case, you must hard reboot these instances to make the SR-IOV PCI device available again.Use the interface type
sriov_vfto configure virtual functions in a bond that the host can use.- type: linux_bond name: internal_bond bonding_options: mode=active-backup use_dhcp: false members: - type: sriov_vf device: nic7 vfid: 1 - type: sriov_vf device: nic8 vfid: 1 - type: vlan vlan_id: get_param: InternalApiNetworkVlanID device: internal_bond addresses: - ip_netmask: get_param: InternalApiIpSubnetThe VLAN tag must be unique across all VFs that belong to a common PF device. You must assign VLAN tags to an interface type:
- linux_bond
- ovs_bridge
- ovs_dpdk_port
- The applicable VF ID range starts at zero, and ends at the maximum number of VFs minus one.
To reserve virtual functions for VMs, use the
NovaPCIPassthroughparameter. You must assign a regex value to theaddressparameter to identify the VFs that you want to pass through to Nova, to be used by virtual instances, and not by the host.You can obtain these values from
lspci, so, if necessary, boot a compute node into a Linux environment to obtain this information.The
lspcicommand returns the address of each device in the format<bus>:<device>:<slot>. Enter these address values in theNovaPCIPassthroughparameter in the following format:NovaPCIPassthrough: - physical_network: "sriovnet2" address: {"domain": ".*", "bus": "06", "slot": "11", "function": "[5-7]"} - physical_network: "sriovnet2" address: {"domain": ".*", "bus": "06", "slot": "10", "function": "[5]"}Ensure that
IOMMUis enabled on all nodes that require NIC partitioning. For example, if you want NIC Partitioning for compute nodes, enable IOMMU using theKernelArgsparameter for that role:parameter_defaults: ComputeParameters: KernelArgs: "intel_iommu=on iommu=pt"
Validation
Check the number of VFs.
[root@overcloud-compute-0 heat-admin]# cat /sys/class/net/p4p1/device/sriov_numvfs 10 [root@overcloud-compute-0 heat-admin]# cat /sys/class/net/p4p2/device/sriov_numvfs 10Check Linux bonds.
[root@overcloud-compute-0 heat-admin]# cat /proc/net/bonding/intapi_bond Ethernet Channel Bonding Driver: v3.7.1 (April 27, 2011) Bonding Mode: fault-tolerance (active-backup) Primary Slave: None Currently Active Slave: p4p1_1 MII Status: up MII Polling Interval (ms): 0 Up Delay (ms): 0 Down Delay (ms): 0 Slave Interface: p4p1_1 MII Status: up Speed: 10000 Mbps Duplex: full Link Failure Count: 0 Permanent HW addr: 16:b4:4c:aa:f0:a8 Slave queue ID: 0 Slave Interface: p4p2_1 MII Status: up Speed: 10000 Mbps Duplex: full Link Failure Count: 0 Permanent HW addr: b6:be:82:ac:51:98 Slave queue ID: 0 [root@overcloud-compute-0 heat-admin]# cat /proc/net/bonding/st_bond Ethernet Channel Bonding Driver: v3.7.1 (April 27, 2011) Bonding Mode: fault-tolerance (active-backup) Primary Slave: None Currently Active Slave: p4p1_3 MII Status: up MII Polling Interval (ms): 0 Up Delay (ms): 0 Down Delay (ms): 0 Slave Interface: p4p1_3 MII Status: up Speed: 10000 Mbps Duplex: full Link Failure Count: 0 Permanent HW addr: 9a:86:b7:cc:17:e4 Slave queue ID: 0 Slave Interface: p4p2_3 MII Status: up Speed: 10000 Mbps Duplex: full Link Failure Count: 0 Permanent HW addr: d6:07:f8:78:dd:5b Slave queue ID: 0List OVS bonds.
[root@overcloud-compute-0 heat-admin]# ovs-appctl bond/show ---- bond_prov ---- bond_mode: active-backup bond may use recirculation: no, Recirc-ID : -1 bond-hash-basis: 0 updelay: 0 ms downdelay: 0 ms lacp_status: off lacp_fallback_ab: false active slave mac: f2:ad:c7:00:f5:c7(dpdk2) slave dpdk2: enabled active slave may_enable: true slave dpdk3: enabled may_enable: true ---- bond_tnt ---- bond_mode: active-backup bond may use recirculation: no, Recirc-ID : -1 bond-hash-basis: 0 updelay: 0 ms downdelay: 0 ms lacp_status: off lacp_fallback_ab: false active slave mac: b2:7e:b8:75:72:e8(dpdk0) slave dpdk0: enabled active slave may_enable: true slave dpdk1: enabled may_enable: trueShow OVS connections.
[root@overcloud-compute-0 heat-admin]# ovs-vsctl show cec12069-9d4c-4fa8-bfe4-decfdf258f49 Manager "ptcp:6640:127.0.0.1" is_connected: true Bridge br-tenant fail_mode: standalone Port br-tenant Interface br-tenant type: internal Port bond_tnt Interface "dpdk0" type: dpdk options: {dpdk-devargs="0000:82:02.2"} Interface "dpdk1" type: dpdk options: {dpdk-devargs="0000:82:04.2"} Bridge "sriov2" Controller "tcp:127.0.0.1:6633" is_connected: true fail_mode: secure Port "phy-sriov2" Interface "phy-sriov2" type: patch options: {peer="int-sriov2"} Port "sriov2" Interface "sriov2" type: internal Bridge br-int Controller "tcp:127.0.0.1:6633" is_connected: true fail_mode: secure Port "int-sriov2" Interface "int-sriov2" type: patch options: {peer="phy-sriov2"} Port br-int Interface br-int type: internal Port "vhu93164679-22" tag: 4 Interface "vhu93164679-22" type: dpdkvhostuserclient options: {vhost-server-path="/var/lib/vhost_sockets/vhu93164679-22"} Port "vhu5d6b9f5a-0d" tag: 3 Interface "vhu5d6b9f5a-0d" type: dpdkvhostuserclient options: {vhost-server-path="/var/lib/vhost_sockets/vhu5d6b9f5a-0d"} Port patch-tun Interface patch-tun type: patch options: {peer=patch-int} Port "int-sriov1" Interface "int-sriov1" type: patch options: {peer="phy-sriov1"} Port int-br-vfs Interface int-br-vfs type: patch options: {peer=phy-br-vfs} Bridge br-vfs Controller "tcp:127.0.0.1:6633" is_connected: true fail_mode: secure Port phy-br-vfs Interface phy-br-vfs type: patch options: {peer=int-br-vfs} Port bond_prov Interface "dpdk3" type: dpdk options: {dpdk-devargs="0000:82:04.5"} Interface "dpdk2" type: dpdk options: {dpdk-devargs="0000:82:02.5"} Port br-vfs Interface br-vfs type: internal Bridge "sriov1" Controller "tcp:127.0.0.1:6633" is_connected: true fail_mode: secure Port "sriov1" Interface "sriov1" type: internal Port "phy-sriov1" Interface "phy-sriov1" type: patch options: {peer="int-sriov1"} Bridge br-tun Controller "tcp:127.0.0.1:6633" is_connected: true fail_mode: secure Port br-tun Interface br-tun type: internal Port patch-int Interface patch-int type: patch options: {peer=patch-tun} Port "vxlan-0a0a7315" Interface "vxlan-0a0a7315" type: vxlan options: {df_default="true", in_key=flow, local_ip="10.10.115.10", out_key=flow, remote_ip="10.10.115.21"} ovs_version: "2.10.0"
If you used NovaPCIPassthrough to pass VFs to instances, test by deploying an SR-IOV instance.
The following bond modes are supported:
- balance-slb
- active-backup
6.4. Configuring OVS hardware offload Copy linkLink copied to clipboard!
This feature is available in this release as a Technology Preview, and therefore is not fully supported by Red Hat. It should only be used for testing, and should not be deployed in a production environment. For more information about Technology Preview features, see Scope of Coverage Details.
The procedure for OVS hardware offload configuration shares many of the same steps as configuring SR-IOV.
Procedure
Generate the
ComputeSriovrole:openstack overcloud roles generate -o roles_data.yaml Controller ComputeSriovConfigure the
physical_networkparameter to match your environment.-
For VLAN, set the
physical_networkparameter to the name of the network you create in neutron after deployment. This value should also be inNeutronBridgeMappings. -
For VXLAN, set the
physical_networkparameter to the string valuenull. Ensure the
OvsHwOffloadparameter under role specific parameters has a value oftrue.Example:
parameter_defaults: ComputeSriovParameters: IsolCpusList: 2-9,21-29,11-19,31-39 KernelArgs: "default_hugepagesz=1GB hugepagesz=1G hugepages=128 intel_iommu=on iommu=pt" OvsHwOffload: true TunedProfileName: "cpu-partitioning" NeutronBridgeMappings: - tenant:br-tenant NeutronPhysicalDevMappings: - tenant:p7p1 - tenant:p7p2 NovaPCIPassthrough: - devname: "p7p1" physical_network: "null" - devname: "p7p2" physical_network: "null" NovaReservedHostMemory: 4096 NovaComputeCpuDedicatedSet: 1-9,21-29,11-19,31-39
-
For VLAN, set the
Ensure that the list of default filters includes
NUMATopologyFilter:NovaSchedulerDefaultFilters: [\'RetryFilter',\'AvailabilityZoneFilter',\'ComputeFilter',\'ComputeCapabilitiesFilter',\'ImagePropertiesFilter',\'ServerGroupAntiAffinityFilter',\'ServerGroupAffinityFilter',\'PciPassthroughFilter',\'NUMATopologyFilter']Configure one or more network interfaces intended for hardware offload in the
compute-sriov.yamlconfiguration file:NoteDo not use the
NeutronSriovNumVFsparameter when configuring Open vSwitch hardware offload. Thenumvfsparameter specifies the number of VFs in a network configuration file used byos-net-config.- type: ovs_bridge name: br-tenant mtu: 9000 members: - type: sriov_pf name: p7p1 numvfs: 5 mtu: 9000 primary: true promisc: true use_dhcp: false link_mode: switchdevNoteDo not configure Mellanox network interfaces as a nic-config interface type
ovs-vlanbecause this prevents tunnel endpoints such as VXLAN from passing traffic due to driver limitations.Include the
ovs-hw-offload.yamlfile in theovercloud deploycommand:TEMPLATES_HOME=”/usr/share/openstack-tripleo-heat-templates” CUSTOM_TEMPLATES=”/home/stack/templates” openstack overcloud deploy --templates \ -r ${CUSTOM_TEMPLATES}/roles_data.yaml \ -e ${TEMPLATES_HOME}/environments/ovs-hw-offload.yaml \ -e ${CUSTOM_TEMPLATES}/network-environment.yaml \ -e ${CUSTOM_TEMPLATES}/neutron-ovs.yaml
6.4.1. Verifying OVS hardware offload Copy linkLink copied to clipboard!
Confirm that a PCI device is in
switchdevmode:# devlink dev eswitch show pci/0000:03:00.0 pci/0000:03:00.0: mode switchdev inline-mode none encap enableVerify if offload is enabled in OVS:
# ovs-vsctl get Open_vSwitch . other_config:hw-offload “true”
6.5. Deploying an instance for SR-IOV Copy linkLink copied to clipboard!
Use host aggregates to separate high performance compute hosts. For information on creating host aggregates and associated flavors for scheduling see Creating host aggregates.
Pinned CPU instances can be located on the same Compute node as unpinned instances. For more information, see Configuring CPU pinning on the Compute node in the Instances and Images Guide.
Deploy an instance for single root I/O virtualization (SR-IOV) by performing the following steps:
Create a flavor.
# openstack flavor create <flavor> --ram <MB> --disk <GB> --vcpus <#>TipYou can specify the NUMA affinity policy for PCI passthrough devices and SR-IOV interfaces by adding the extra spec
hw:pci_numa_affinity_policyto your flavor. For more information, see Update flavor metadata in the Instance and Images Guide.Create the network.
# openstack network create net1 --provider-physical-network tenant --provider-network-type vlan --provider-segment <VLAN-ID> # openstack subnet create subnet1 --network net1 --subnet-range 192.0.2.0/24 --dhcpCreate the port.
Use vnic-type
directto create an SR-IOV virtual function (VF) port.# openstack port create --network net1 --vnic-type direct sriov_portUse the following command to create a virtual function with hardware offload.
# openstack port create --network net1 --vnic-type direct --binding-profile '{"capabilities": ["switchdev"]} sriov_hwoffload_portUse vnic-type
direct-physicalto create an SR-IOV PF port.# openstack port create --network net1 --vnic-type direct-physical sriov_port
Deploy an instance.
# openstack server create --flavor <flavor> --image <image> --nic port-id=<id> <instance name>
6.6. Creating host aggregates Copy linkLink copied to clipboard!
For better performance, deploy guests that have cpu pinning and hugepages. You can schedule high performance instances on a subset of hosts by matching aggregate metadata with flavor metadata.
Procedure
Ensure that the
AggregateInstanceExtraSpecsFiltervalue is included in thescheduler_default_filtersparameter in thenova.conffile. This configuration can be set through the heat parameterNovaSchedulerDefaultFiltersunder role-specific parameters before deployment.ComputeOvsDpdkSriovParameters: NovaSchedulerDefaultFilters: ['AggregateInstanceExtraSpecsFilter', 'RetryFilter','AvailabilityZoneFilter','ComputeFilter','ComputeCapabilitiesFilter','ImagePropertiesFilter','ServerGroupAntiAffinityFilter','ServerGroupAffinityFilter','PciPassthroughFilter','NUMATopologyFilter']NoteTo add this parameter to the configuration of an exiting cluster, you can add it to the heat templates, and run the original deployment script again.
Create an aggregate group for SR-IOV, and add relevant hosts. Define metadata, for example,
sriov=true, that matches defined flavor metadata.# openstack aggregate create sriov_group # openstack aggregate add host sriov_group compute-sriov-0.localdomain # openstack aggregate set --property sriov=true sriov_groupCreate a flavor.
# openstack flavor create <flavor> --ram <MB> --disk <GB> --vcpus <#>Set additional flavor properties. Note that the defined metadata,
sriov=true, matches the defined metadata on the SR-IOV aggregate.openstack flavor set --property aggregate_instance_extra_specs:sriov=true --property hw:cpu_policy=dedicated --property hw:mem_page_size=1GB <flavor>