Dieser Inhalt ist in der von Ihnen ausgewählten Sprache nicht verfügbar.
Chapter 7. Deploying SR-IOV technologies
In your Red Hat OpenStack Platform NFV deployment, you can achieve higher performance with single root I/O virtualization (SR-IOV), when you configure direct access from your instances to a shared PCIe resource through virtual resources.
7.1. Configuring SR-IOV
To deploy Red Hat OpenStack Platform (RHOSP) with single root I/O virtualization (SR-IOV), configure the shared PCIe resources that have SR-IOV capabilities that instances can request direct access to.
The following CPU assignments, memory allocation, and NIC configurations are examples, and might be different from your use case.
Prerequisites
- For details on how to install and configure the undercloud before deploying the overcloud, see the Director Installation and Usage guide. Note- Do not manually edit any values in - /etc/tuned/cpu-partitioning-variables.confthat director heat templates modify.
- 
						Access to the undercloud host and credentials for the stackuser.
Procedure
- 
						Log in to the undercloud as the stackuser.
- Source the - stackrcfile:- source ~/stackrc - [stack@director ~]$ source ~/stackrc- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow 
- Generate a new roles data file named - roles_data_compute_sriov.yamlthat includes the- Controllerand- ComputeSriovroles:- openstack overcloud roles \ generate -o /home/stack/templates/roles_data_compute_sriov.yaml \ Controller ComputeSriov - (undercloud)$ openstack overcloud roles \ generate -o /home/stack/templates/roles_data_compute_sriov.yaml \ Controller ComputeSriov- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow - ComputeSriovis a custom role provided with your RHOSP installation that includes the- NeutronSriovAgentand- NeutronSriovHostConfigservices, in addition to the default compute services.
- To prepare the SR-IOV containers, include the - neutron-sriov.yamland- roles_data_compute_sriov.yamlfiles when you generate the- overcloud_images.yamlfile.- sudo openstack tripleo container image prepare \ --roles-file ~/templates/roles_data_compute_sriov.yaml \ -e /usr/share/openstack-tripleo-heat-templates/environments/services/neutron-sriov.yaml \ -e ~/containers-prepare-parameter.yaml \ --output-env-file=/home/stack/templates/overcloud_images.yaml - $ sudo openstack tripleo container image prepare \ --roles-file ~/templates/roles_data_compute_sriov.yaml \ -e /usr/share/openstack-tripleo-heat-templates/environments/services/neutron-sriov.yaml \ -e ~/containers-prepare-parameter.yaml \ --output-env-file=/home/stack/templates/overcloud_images.yaml- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow - For more information on container image preparation, see Preparing container images in the Director Installation and Usage guide. 
- Create a copy of the - /usr/share/openstack-tripleo-heat-templates/environments/network-environment.yamlfile in your environment file directory:- cp /usr/share/openstack-tripleo-heat-templates/environments/network-environment.yaml /home/stack/templates/network-environment-sriov.yaml - $ cp /usr/share/openstack-tripleo-heat-templates/environments/network-environment.yaml /home/stack/templates/network-environment-sriov.yaml- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow 
- Add the following parameters under - parameter_defaultsin your- network-environment-sriov.yamlfile to configure the SR-IOV nodes for your cluster and your hardware configuration:- NeutronNetworkType: 'vlan' NeutronNetworkVLANRanges: - tenant:22:22 - tenant:25:25 NeutronTunnelTypes: ''- NeutronNetworkType: 'vlan' NeutronNetworkVLANRanges: - tenant:22:22 - tenant:25:25 NeutronTunnelTypes: ''- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow 
- To determine the - vendor_idand- product_idfor each PCI device type, use one of the following commands on the physical server that has the PCI cards:- To return the - vendor_idand- product_idfrom a deployed overcloud, use the following command:- lspci -nn -s <pci_device_address> - # lspci -nn -s <pci_device_address> 3b:00.0 Ethernet controller [0200]: Intel Corporation Ethernet Controller X710 for 10GbE SFP+ [<vendor_id>: <product_id>] (rev 02)- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow 
- To return the - vendor_idand- product_idof a physical function (PF) if you have not yet deployed the overcloud, use the following command:- openstack baremetal introspection data save <baremetal_node_name> | jq '.inventory.interfaces[] | .name, .vendor, .product' - (undercloud) [stack@undercloud-0 ~]$ openstack baremetal introspection data save <baremetal_node_name> | jq '.inventory.interfaces[] | .name, .vendor, .product'- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow 
 
- Configure role specific parameters for SR-IOV compute nodes in your - network-environment-sriov.yamlfile:- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow Note- The - NovaVcpuPinSetparameter is now deprecated, and is replaced by- NovaComputeCpuDedicatedSetfor dedicated, pinned workloads.
- Configure the PCI passthrough devices for the SR-IOV compute nodes in your - network-environment-sriov.yamlfile:- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow - 
								Replace <vendor_id>with the vendor ID of the PCI device.
- 
								Replace <product_id>with the product ID of the PCI device.
- 
								Replace <NIC_address>with the address of the PCI device. For information about how to configure theaddressparameter, see Guidelines for configuring NovaPCIPassthrough in the Configuring the Compute Service for Instance Creation guide.
- Replace - <physical_network>with the name of the physical network the PCI device is located on.Note- Do not use the - devnameparameter when you configure PCI passthrough because the device name of a NIC can change. To create a Networking service (neutron) port on a PF, specify the- vendor_id, the- product_id, and the PCI device address in- NovaPCIPassthrough, and create the port with the- --vnic-type direct-physicaloption. To create a Networking service port on a virtual function (VF), specify the- vendor_idand- product_idin- NovaPCIPassthrough, and create the port with the- --vnic-type directoption. The values of the- vendor_idand- product_idparameters might be different between physical function (PF) and VF contexts. For more information about how to configure- NovaPCIPassthrough, see Guidelines for configuring NovaPCIPassthrough in the Configuring the Compute Service for Instance Creation guide.
 
- 
								Replace 
- Configure the SR-IOV enabled interfaces in the - compute.yamlnetwork configuration template. To create SR-IOV VFs, configure the interfaces as standalone NICs:- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow Note- The - numvfsparameter replaces the- NeutronSriovNumVFsparameter in the network configuration templates. Red Hat does not support modification of the- NeutronSriovNumVFsparameter or the- numvfsparameter after deployment. If you modify either parameter after deployment, it might cause a disruption for the running instances that have an SR-IOV port on that PF. In this case, you must hard reboot these instances to make the SR-IOV PCI device available again.
- Ensure that the list of default filters includes the value - AggregateInstanceExtraSpecsFilter:- NovaSchedulerDefaultFilters: ['AvailabilityZoneFilter','ComputeFilter','ComputeCapabilitiesFilter','ImagePropertiesFilter','Serve rGroupAntiAffinityFilter','ServerGroupAffinityFilter','PciPassthroughFilter','AggregateInstanceExt raSpecsFilter'] - NovaSchedulerDefaultFilters: ['AvailabilityZoneFilter','ComputeFilter','ComputeCapabilitiesFilter','ImagePropertiesFilter','Serve rGroupAntiAffinityFilter','ServerGroupAffinityFilter','PciPassthroughFilter','AggregateInstanceExt raSpecsFilter']- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow 
- 
						Run the overcloud_deploy.shscript.
7.2. Configuring NIC partitioning
You can reduce the number of NICs that you need for each host by configuring single root I/O virtualization (SR-IOV) virtual functions (VFs) for Red Hat OpenStack Platform (RHOSP) management networks and provider networks. When you partition a single, high-speed NIC into multiple VFs, you can use the NIC for both control and data plane traffic. This feature has been validated on Intel Fortville NICs, and Mellanox CX-5 NICs.
Procedure
- Open the NIC config file for your chosen role.
- Add an entry for the interface type - sriov_pfto configure a physical function that the host can use:- - type: sriov_pf name: <interface_name> use_dhcp: false numvfs: <number_of_vfs> promisc: <true/false>- - type: sriov_pf name: <interface_name> use_dhcp: false numvfs: <number_of_vfs> promisc: <true/false>- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow - 
								Replace <interface_name>with the name of the interface.
- 
								Replace <number_of_vfs>with the number of VFs.
- 
								Optional: Replace <true/false>withtrueto set promiscuous mode, orfalseto disable promiscuous mode. The default value istrue.
 Note- The - numvfsparameter replaces the- NeutronSriovNumVFsparameter in the network configuration templates. Red Hat does not support modification of the- NeutronSriovNumVFsparameter or the- numvfsparameter after deployment. If you modify either parameter after deployment, it might cause a disruption for the running instances that have an SR-IOV port on that physical function (PF). In this case, you must hard reboot these instances to make the SR-IOV PCI device available again.
- 
								Replace 
- Add an entry for the interface type - sriov_vfto configure virtual functions that the host can use:- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow - 
								Replace <bond_type>with the required bond type, for example,linux_bond. You can apply VLAN tags on the bond for other bonds, such asovs_bond.
- Replace - <bonding_option>with one of the following supported bond modes:- 
										active-backup
- Balance-slbNote- LACP bonds are not supported. 
 
- 
										
- Specify the - sriov_vfas the interface type to bond in the- memberssection.Note- If you are using an OVS bridge as the interface type, you can configure only one OVS bridge on the - sriov_vfof a- sriov_pfdevice. More than one OVS bridge on a single- sriov_pfdevice can result in packet duplication across VFs, and decreased performance.
- 
								Replace <pf_device_name>with the name of the PF device.
- 
								If you use a linux_bond, you must assign VLAN tags. If you set a VLAN tag, ensure that you set a unique tag for each VF associated with a singlesriov_pfdevice. You cannot have two VFs from the same PF on the same VLAN.
- 
								Replace <vf_id>with the ID of the VF. The applicable VF ID range starts at zero, and ends at the maximum number of VFs minus one.
- Disable spoof checking.
- 
								Apply VLAN tags on the sriov_vfforlinux_bondover VFs.
 
- 
								Replace 
- To reserve VFs for instances, include the - NovaPCIPassthroughparameter in an environment file, for example:- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow - Director identifies the host VFs, and derives the PCI addresses of the VFs that are available to the instance. 
- Enable - IOMMUon all nodes that require NIC partitioning. For example, if you want NIC Partitioning for Compute nodes, enable IOMMU using the- KernelArgsparameter for that role:- parameter_defaults: ComputeParameters: KernelArgs: "intel_iommu=on iommu=pt"- parameter_defaults: ComputeParameters: KernelArgs: "intel_iommu=on iommu=pt"- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow Note- When you first add the - KernelArgsparameter to the configuration of a role, the overcloud nodes are automatically rebooted. If required, you can disable the automatic rebooting of nodes and instead perform node reboots manually after each overcloud deployment.- For more information, see Configuring manual node reboot to define KernelArgs in the Configuring the Compute Service for Instance Creation guide. 
- Add your role file and environment files to the stack with your other environment files and deploy the overcloud: - openstack overcloud deploy --templates \ -r os-net-config.yaml - (undercloud)$ openstack overcloud deploy --templates \ -r os-net-config.yaml -e [your environment files] \ -e /home/stack/templates/<compute_environment_file>.yaml- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow 
Validation
- Log in to the overcloud Compute node as - heat-adminand check the number of VFs:- [heat-admin@overcloud-compute-0 heat-admin]$ sudo cat /sys/class/net/p4p1/device/sriov_numvfs 10 [heat-admin@overcloud-compute-0 heat-admin]$ sudo cat /sys/class/net/p4p2/device/sriov_numvfs 10 - [heat-admin@overcloud-compute-0 heat-admin]$ sudo cat /sys/class/net/p4p1/device/sriov_numvfs 10 [heat-admin@overcloud-compute-0 heat-admin]$ sudo cat /sys/class/net/p4p2/device/sriov_numvfs 10- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow 
- Show OVS connections: - Copy to Clipboard Copied! - Toggle word wrap Toggle overflow 
- Log in to your OVS-DPDK SR-IOV Compute node as - heat-adminand check Linux bonds:- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow 
- List OVS bonds: - Copy to Clipboard Copied! - Toggle word wrap Toggle overflow 
				If you used NovaPCIPassthrough to pass VFs to instances, test by Deploying an instance for SR-IOV.
			
7.3. Example configurations for NIC partitions
Linux bond over VFs
					The following example configures a Linux bond over VFs, disables spoofcheck, and applies VLAN tags to sriov_vf:
				
OVS bridge on VFs
The following example configures an OVS bridge on VFs:
OVS user bridge on VFs
					The following example configures an OVS user bridge on VFs and applies VLAN tags to ovs_user_bridge:
				
7.4. Configuring OVS hardware offload
The procedure for OVS hardware offload configuration shares many of the same steps as configuring SR-IOV.
					Since Red Hat OpenStack Platform 16.2.3, to offload traffic from Compute nodes with OVS hardware offload and ML2/OVS, you must set the disable_packet_marking parameter to true in the openvswitch_agent.ini configuration file, and then restart the neutron_ovs_agent container.
				
+
cat /var/lib/config-data/puppet-generated/neutron/\ etc/neutron/plugins/ml2/openvswitch_agent.ini [ovs] disable_packet_marking=True
cat /var/lib/config-data/puppet-generated/neutron/\
etc/neutron/plugins/ml2/openvswitch_agent.ini
  [ovs]
  disable_packet_marking=TrueProcedure
- Generate an overcloud role for OVS hardware offload that is based on the Compute role: - openstack overcloud roles generate -o roles_data.yaml \ Controller Compute:ComputeOvsHwOffload - openstack overcloud roles generate -o roles_data.yaml \ Controller Compute:ComputeOvsHwOffload- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow 
- 
						Optional: Change the HostnameFormatDefault: '%stackname%-compute-%index%'name for theComputeOvsHwOffloadrole.
- 
						Add the OvsHwOffloadparameter under role-specific parameters with a value oftrue.
- 
						To configure neutron to use the iptables/hybrid firewall driver implementation, include the line: NeutronOVSFirewallDriver: iptables_hybrid. For more information aboutNeutronOVSFirewallDriver, see Using the Open vSwitch Firewall in the Advanced Overcloud Customization Guide.
- Configure the - physical_networkparameter to match your environment.- 
								For VLAN, set the physical_networkparameter to the name of the network you create in neutron after deployment. This value should also be inNeutronBridgeMappings.
- For VXLAN, set the - physical_networkparameter to- null.- Example: - Copy to Clipboard Copied! - Toggle word wrap Toggle overflow 
- 
								Replace <vendor-id>with the vendor ID of the physical NIC.
- 
								Replace <product-id>with the product ID of the NIC VF.
- Replace - <address>with the address of the physical NIC.- For more information about how to configure - NovaPCIPassthrough, see Guidelines for configuring NovaPCIPassthrough in the Configuring the Compute Service for Instance Creation guide.
 
- 
								For VLAN, set the 
- Ensure that the list of default filters includes - NUMATopologyFilter:- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow Note- Optional: For details on how to troubleshoot and configure OVS Hardware Offload issues in RHOSP 16.2 with Mellanox ConnectX5 NICs, see Troubleshooting Hardware Offload. 
- Configure one or more network interfaces intended for hardware offload in the - compute-sriov.yamlconfiguration file:- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow Note- 
									Do not use the NeutronSriovNumVFsparameter when configuring Open vSwitch hardware offload. The number of virtual functions is specified using thenumvfsparameter in a network configuration file used byos-net-config. Red Hat does not support modifying thenumvfssetting during update or redeployment.
- 
									Do not configure Mellanox network interfaces as a nic-config interface type ovs-vlanbecause this prevents tunnel endpoints such as VXLAN from passing traffic due to driver limitations.
 
- 
									Do not use the 
- Include the - ovs-hw-offload.yamlfile in the- overcloud deploycommand:- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow 
Verification
- Confirm that a PCI device is in - switchdevmode:- devlink dev eswitch show pci/0000:03:00.0 - # devlink dev eswitch show pci/0000:03:00.0 pci/0000:03:00.0: mode switchdev inline-mode none encap enable- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow 
- Verify if offload is enabled in OVS: - ovs-vsctl get Open_vSwitch . other_config:hw-offload - # ovs-vsctl get Open_vSwitch . other_config:hw-offload “true”- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow 
7.5. Tuning examples for OVS hardware offload
For optimal performance you must complete additional configuration steps.
Adjusting the number of channels for each network interface to improve performance
					A channel includes an interrupt request (IRQ) and the set of queues that trigger the IRQ. When you set the mlx5_core driver to switchdev mode, the mlx5_core driver defaults to one combined channel, which might not deliver optimal performance.
				
Procedure
- On the PF representors, enter the following command to adjust the number of CPUs available to the host. Replace $(nproc) with the number of CPUs you want to make available: - sudo ethtool -L enp3s0f0 combined $(nproc) - $ sudo ethtool -L enp3s0f0 combined $(nproc)- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow 
CPU pinning
To prevent performance degradation from cross-NUMA operations, locate NICs, their applications, the VF guest, and OVS in the same NUMA node. For more information, see Configuring CPU pinning on Compute nodes in the Configuring the Compute Service for Instance Creation guide.
7.6. Configuring components of OVS hardware offload
A reference for configuring and troubleshooting the components of OVS HW Offload with Mellanox smart NICs.
Nova
					Configure the Nova scheduler to use the NovaPCIPassthrough filter with the NUMATopologyFilter and DerivePciWhitelistEnabled parameters. When you enable OVS HW Offload, the Nova scheduler operates similarly to SR-IOV passthrough for instance spawning.
				
Neutron
					When you enable OVS HW Offload, use the devlink cli tool to set the NIC e-switch mode to switchdev. Switchdev mode establishes representor ports on the NIC that are mapped to the VFs.
				
Procedure
- To allocate a port from a - switchdev-enabled NIC, log in as an admin user, create a neutron port with a- binding-profilevalue of- capabilities, and disable port security:- openstack port create --network private --vnic-type=direct --binding-profile '{"capabilities": ["switchdev"]}' direct_port1 --disable-port-security- $ openstack port create --network private --vnic-type=direct --binding-profile '{"capabilities": ["switchdev"]}' direct_port1 --disable-port-security- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow 
- Pass this port information when you create the instance. - You associate the representor port with the instance VF interface and connect the representor port to OVS bridge - br-intfor one-time OVS data path processing. A VF port representor functions like a software version of a physical “patch panel” front-end.- For more information about new instance creation, see Deploying an instance for SR-IOV. 
OVS
In an environment with hardware offload configured, the first packet transmitted traverses the OVS kernel path, and this packet journey establishes the ml2 OVS rules for incoming and outgoing traffic for the instance traffic. When the flows of the traffic stream are established, OVS uses the traffic control (TC) Flower utility to push these flows on the NIC hardware.
Procedure
- Use director to apply the following configuration on OVS: - sudo ovs-vsctl set Open_vSwitch . other_config:hw-offload=true - $ sudo ovs-vsctl set Open_vSwitch . other_config:hw-offload=true- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow 
- Restart to enable HW Offload.
Traffic Control (TC) subsystems
					When you enable the hw-offload flag, OVS uses the TC data path. TC Flower is an iproute2 utility that writes data path flows on hardware. This ensures that the flow is programmed on both the hardware and software data paths, for redundancy.
				
Procedure
- Apply the following configuration. This is the default option if you do not explicitly configure - tc-policy:- sudo ovs-vsctl set Open_vSwitch . other_config:tc-policy=none - $ sudo ovs-vsctl set Open_vSwitch . other_config:tc-policy=none- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow 
- Restart OVS.
NIC PF and VF drivers
Mlx5_core is the PF and VF driver for the Mellanox ConnectX-5 NIC. The mlx5_core driver performs the following tasks:
- Creates routing tables on hardware.
- Manages network flow management.
- 
						Configures the Ethernet switch device driver model, switchdev.
- Creates block devices.
Procedure
- Use the following - devlinkcommands to query the mode of the PCI device.- sudo devlink dev eswitch set pci/0000:03:00.0 mode switchdev sudo devlink dev eswitch show pci/0000:03:00.0 - $ sudo devlink dev eswitch set pci/0000:03:00.0 mode switchdev $ sudo devlink dev eswitch show pci/0000:03:00.0 pci/0000:03:00.0: mode switchdev inline-mode none encap enable- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow 
NIC firmware
The NIC firmware performs the following tasks:
- Maintains routing tables and rules.
- Fixes the pipelines of the tables.
- Manages hardware resources.
- Creates VFs.
The firmware works with the driver for optimal performance.
Although the NIC firmware is non-volatile and persists after you reboot, you can modify the configuration during run time.
Procedure
- Apply the following configuration on the interfaces, and the representor ports, to ensure that TC Flower pushes the flow programming at the port level: - sudo ethtool -K enp3s0f0 hw-tc-offload on - $ sudo ethtool -K enp3s0f0 hw-tc-offload on- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow 
					Ensure that you keep the firmware updated.Yum or dnf updates might not complete the firmware update. For more information, see your vendor documentation.
				
7.7. Troubleshooting OVS hardware offload
Prerequisites
- Linux Kernel 4.13 or newer
- OVS 2.8 or newer
- RHOSP 12 or newer
- Iproute 4.12 or newer
- Mellanox NIC firmware, for example FW ConnectX-5 16.21.0338 or newer
For more information about supported prerequisites, see see the Red Hat Knowledgebase solution Network Adapter Fast Datapath Feature Support Matrix.
Configuring the network in an OVS HW offload deployment
In a HW offload deployment, you can choose one of the following scenarios for your network configuration according to your requirements:
- You can base guest VMs on VXLAN and VLAN by using either the same set of interfaces attached to a bond, or a different set of NICs for each type.
- You can bond two ports of a Mellanox NIC by using Linux bond.
- You can host tenant VXLAN networks on VLAN interfaces on top of a Mellanox Linux bond.
Ensure that individual NICs and bonds are members of an ovs-bridge.
Refer to the below example network configuration:
The following bonding configurations are supported:
- active-backup - mode=1
- active-active or balance-xor - mode=2
- 802.3ad (LACP) - mode=4
The following bonding configuration is not supported:
- xmit_hash_policy=layer3+4
Verifying the interface configuration
Verify the interface configuration with the following procedure.
Procedure
- 
						During deployment, use the host network configuration tool os-net-configto enablehw-tc-offload.
- 
						Enable hw-tc-offloadon thesriov_configservice any time you reboot the Compute node.
- Set the - hw-tc-offloadparameter to- onfor the NICs that are attached to the bond:.- ethtool -k ens1f0 | grep tc-offload - [root@overcloud-computesriov-0 ~]# ethtool -k ens1f0 | grep tc-offload hw-tc-offload: on- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow 
Verifying the interface mode
Verify the interface mode with the following procedure.
Procedure
- 
						Set the eswitch mode to switchdevfor the interfaces you use for HW offload.
- 
						Use the host network configuration tool os-net-configto enableeswitchduring deployment.
- Enable - eswitchon the- sriov_configservice any time you reboot the Compute node.- devlink dev eswitch show pci/$(ethtool -i ens1f0 | grep bus-info | cut -d ':' -f 2,3,4 | awk '{$1=$1};1')- [root@overcloud-computesriov-0 ~]# devlink dev eswitch show pci/$(ethtool -i ens1f0 | grep bus-info | cut -d ':' -f 2,3,4 | awk '{$1=$1};1')- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow 
					The driver of the PF interface is set to "mlx5e_rep", to show that it is a representor of the e-switch uplink port. This does not affect the functionality.
				
Verifying the offload state in OVS
Verify the offload state in OVS with the following procedure.
- Enable hardware offload in OVS in the Compute node. - ovs-vsctl get Open_vSwitch . other_config:hw-offload - [root@overcloud-computesriov-0 ~]# ovs-vsctl get Open_vSwitch . other_config:hw-offload "true"- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow 
Verifying the name of the VF representor port
					To ensure consistent naming of VF representor ports, os-net-config uses udev rules to rename the ports in the <PF-name>_<VF_id> format.
				
Procedure
- After deployment, verify that the VF representor ports are named correctly. - Copy to Clipboard Copied! - Toggle word wrap Toggle overflow 
Examining network traffic flow
HW offloaded network flow functions in a similar way to physical switches or routers with application-specific integrated circuit (ASIC) chips. You can access the ASIC shell of a switch or router to examine the routing table and for other debugging. The following procedure uses a Broadcom chipset from a Cumulus Linux switch as an example. Replace the values that are appropriate to your environment.
Procedure
- To get Broadcom chip table content, use the - bcmcmdcommand.- root@dni-7448-26:~# cl-bcmcmd l2 show mac=00:02:00:00:00:08 vlan=2000 GPORT=0x2 modid=0 port=2/xe1 mac=00:02:00:00:00:09 vlan=2000 GPORT=0x2 modid=0 port=2/xe1 Hit - root@dni-7448-26:~# cl-bcmcmd l2 show mac=00:02:00:00:00:08 vlan=2000 GPORT=0x2 modid=0 port=2/xe1 mac=00:02:00:00:00:09 vlan=2000 GPORT=0x2 modid=0 port=2/xe1 Hit- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow 
- Inspect the Traffic Control (TC) Layer. - Copy to Clipboard Copied! - Toggle word wrap Toggle overflow 
- 
						Examine the in_hwflags and the statistics in this output. The wordhardwareindicates that the hardware processes the network traffic. If you usetc-policy=none, you can check this output or a tcpdump to investigate when hardware or software handles the packets. You can see a corresponding log message indmesgor inovs-vswitch.logwhen the driver is unable to offload packets.
- For Mellanox, as an example, the log entries resemble syndrome messages in - dmesg.- [13232.860484] mlx5_core 0000:3b:00.0: mlx5_cmd_check:756:(pid 131368): SET_FLOW_TABLE_ENTRY(0x936) op_mod(0x0) failed, status bad parameter(0x3), syndrome (0x6b1266) - [13232.860484] mlx5_core 0000:3b:00.0: mlx5_cmd_check:756:(pid 131368): SET_FLOW_TABLE_ENTRY(0x936) op_mod(0x0) failed, status bad parameter(0x3), syndrome (0x6b1266)- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow - In this example, the error code (0x6b1266) represents the following behavior: - 0x6B1266 | set_flow_table_entry: pop vlan and forward to uplink is not allowed - 0x6B1266 | set_flow_table_entry: pop vlan and forward to uplink is not allowed- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow 
Validating systems
Validate your system with the following procedure.
Procedure
- Ensure SR-IOV and VT-d are enabled on the system.
- 
						Enable IOMMU in Linux by adding intel_iommu=onto kernel parameters, for example, using GRUB.
Limitations
You cannot use the OVS firewall driver with HW offload because the connection tracking properties of the flows are unsupported in the offload path in OVS 2.11.
7.8. Debugging hardware offload flow
				You can use the following procedure if you encounter the following message in the ovs-vswitch.log file:
			
2020-01-31T06:22:11.257Z|00473|dpif_netlink(handler402)|ERR|failed to offload flow: Operation not supported: p6p1_5
2020-01-31T06:22:11.257Z|00473|dpif_netlink(handler402)|ERR|failed to offload flow: Operation not supported: p6p1_5Procedure
- To enable logging on the offload modules and to get additional log information for this failure, use the following commands on the Compute node: - ovs-appctl vlog/set dpif_netlink:file:dbg # Module name changed recently (check based on the version used ovs-appctl vlog/set netdev_tc_offloads:file:dbg [OR] ovs-appctl vlog/set netdev_offload_tc:file:dbg ovs-appctl vlog/set tc:file:dbg - ovs-appctl vlog/set dpif_netlink:file:dbg # Module name changed recently (check based on the version used ovs-appctl vlog/set netdev_tc_offloads:file:dbg [OR] ovs-appctl vlog/set netdev_offload_tc:file:dbg ovs-appctl vlog/set tc:file:dbg- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow 
- Inspect the - ovs-vswitchdlogs again to see additional details about the issue.- In the following example logs, the offload failed because of an unsupported attribute mark. - 2020-01-31T06:22:11.218Z|00471|dpif_netlink(handler402)|DBG|system@ovs-system: put[create] ufid:61bd016e-eb89-44fc-a17e-958bc8e45fda recirc_id(0),dp_hash(0/0),skb_priority(0/0),in_port(7),skb_mark(0),ct_state(0/0),ct_zone(0/0),ct_mark(0/0),ct_label(0/0),eth(src=fa:16:3e:d2:f5:f3,dst=fa:16:3e:c4:a3:eb),eth_type(0x0800),ipv4(src=10.1.1.8/0.0.0.0,dst=10.1.1.31/0.0.0.0,proto=1/0,tos=0/0x3,ttl=64/0,frag=no),icmp(type=0/0,code=0/0), actions:set(tunnel(tun_id=0x3d,src=10.10.141.107,dst=10.10.141.124,ttl=64,tp_dst=4789,flags(df|key))),6 2020-01-31T06:22:11.253Z|00472|netdev_tc_offloads(handler402)|DBG|offloading attribute pkt_mark isn't supported 2020-01-31T06:22:11.257Z|00473|dpif_netlink(handler402)|ERR|failed to offload flow: Operation not supported: p6p1_5 - 2020-01-31T06:22:11.218Z|00471|dpif_netlink(handler402)|DBG|system@ovs-system: put[create] ufid:61bd016e-eb89-44fc-a17e-958bc8e45fda recirc_id(0),dp_hash(0/0),skb_priority(0/0),in_port(7),skb_mark(0),ct_state(0/0),ct_zone(0/0),ct_mark(0/0),ct_label(0/0),eth(src=fa:16:3e:d2:f5:f3,dst=fa:16:3e:c4:a3:eb),eth_type(0x0800),ipv4(src=10.1.1.8/0.0.0.0,dst=10.1.1.31/0.0.0.0,proto=1/0,tos=0/0x3,ttl=64/0,frag=no),icmp(type=0/0,code=0/0), actions:set(tunnel(tun_id=0x3d,src=10.10.141.107,dst=10.10.141.124,ttl=64,tp_dst=4789,flags(df|key))),6 2020-01-31T06:22:11.253Z|00472|netdev_tc_offloads(handler402)|DBG|offloading attribute pkt_mark isn't supported 2020-01-31T06:22:11.257Z|00473|dpif_netlink(handler402)|ERR|failed to offload flow: Operation not supported: p6p1_5- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow 
Debugging Mellanox NICs
Mellanox has provided a system information script, similar to a Red Hat SOS report.
https://github.com/Mellanox/linux-sysinfo-snapshot/blob/master/sysinfo-snapshot.py
When you run this command, you create a zip file of the relevant log information, which is useful for support cases.
Procedure
- You can run this system information script with the following command: - ./sysinfo-snapshot.py --asap --asap_tc --ibdiagnet --openstack - # ./sysinfo-snapshot.py --asap --asap_tc --ibdiagnet --openstack- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow 
You can also install Mellanox Firmware Tools (MFT), mlxconfig, mlxlink and the OpenFabrics Enterprise Distribution (OFED) drivers.
Useful CLI commands
					Use the ethtool utility with the following options to gather diagnostic information:
				
- ethtool -l <uplink representor> : View the number of channels
- ethtool -I <uplink/VFs> : Check statistics
- ethtool -i <uplink rep> : View driver information
- ethtool -g <uplink rep> : Check ring sizes
- ethtool -k <uplink/VFs> : View enabled features
				Use the tcpdump utility at the representor and PF ports to similarly check traffic flow.
			
- Any changes you make to the link state of the representor port, affect the VF link state also.
- Representor port statistics present VF statistics also.
Use the below commands to get useful diagnostic information:
7.9. Deploying an instance for SR-IOV
Use host aggregates to separate high performance compute hosts. For information on creating host aggregates and associated flavors for scheduling see Creating host aggregates.
Pinned CPU instances can be located on the same Compute node as unpinned instances. For more information, see Configuring CPU pinning on Compute nodes in the Configuring the Compute Service for Instance Creation guide.
Deploy an instance for single root I/O virtualization (SR-IOV) by performing the following steps:
Procedure
- Create a flavor. - openstack flavor create <flavor> --ram <MB> --disk <GB> --vcpus <#> - $ openstack flavor create <flavor> --ram <MB> --disk <GB> --vcpus <#>- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow Tip- You can specify the NUMA affinity policy for PCI passthrough devices and SR-IOV interfaces by adding the extra spec - hw:pci_numa_affinity_policyto your flavor. For more information, see Flavor metadata in the Configuring the Compute Service for Instance Creation guide.
- Create the network. - openstack network create net1 --provider-physical-network tenant --provider-network-type vlan --provider-segment <VLAN-ID> openstack subnet create subnet1 --network net1 --subnet-range 192.0.2.0/24 --dhcp - $ openstack network create net1 --provider-physical-network tenant --provider-network-type vlan --provider-segment <VLAN-ID> $ openstack subnet create subnet1 --network net1 --subnet-range 192.0.2.0/24 --dhcp- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow 
- Create the port. - Use vnic-type - directto create an SR-IOV virtual function (VF) port.- openstack port create --network net1 --vnic-type direct sriov_port - $ openstack port create --network net1 --vnic-type direct sriov_port- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow 
- Use the following command to create a virtual function with hardware offload. You must be an admin user to set - --binding-profile.- openstack port create --network net1 --vnic-type direct --binding-profile '{"capabilities": ["switchdev"]} sriov_hwoffload_port- $ openstack port create --network net1 --vnic-type direct --binding-profile '{"capabilities": ["switchdev"]} sriov_hwoffload_port- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow 
- Use vnic-type - direct-physicalto create an SR-IOV physical function (PF) port that is dedicated to a single instance. This PF port is a Networking service (neutron) port but is not controlled by the Networking service, and is not visible as a network adapter because it is a PCI device that is passed through to the instance.- openstack port create --network net1 --vnic-type direct-physical sriov_port - $ openstack port create --network net1 --vnic-type direct-physical sriov_port- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow 
 
- Deploy an instance. - openstack server create --flavor <flavor> --image <image> --nic port-id=<id> <instance name> - $ openstack server create --flavor <flavor> --image <image> --nic port-id=<id> <instance name>- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow 
7.10. Creating host aggregates
For better performance, deploy guests that have CPU pinning and huge pages. You can schedule high performance instances on a subset of hosts by matching aggregate metadata with flavor metadata.
Procedure
- You can configure the - AggregateInstanceExtraSpecsFiltervalue, and other necessary filters, through the heat parameter- NovaSchedulerEnabledFiltersunder- parameter_defaultsin your deployment templates.- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow Note- To add this parameter to the configuration of an existing cluster, you can add it to the heat templates, and run the original deployment script again. 
- Create an aggregate group for SR-IOV, and add relevant hosts. Define metadata, for example, - sriov=true, that matches defined flavor metadata.- openstack aggregate create sriov_group openstack aggregate add host sriov_group compute-sriov-0.localdomain openstack aggregate set --property sriov=true sriov_group - # openstack aggregate create sriov_group # openstack aggregate add host sriov_group compute-sriov-0.localdomain # openstack aggregate set --property sriov=true sriov_group- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow 
- Create a flavor. - openstack flavor create <flavor> --ram <MB> --disk <GB> --vcpus <#> - # openstack flavor create <flavor> --ram <MB> --disk <GB> --vcpus <#>- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow 
- Set additional flavor properties. Note that the defined metadata, - sriov=true, matches the defined metadata on the SR-IOV aggregate.- openstack flavor set --property sriov=true --property hw:cpu_policy=dedicated --property hw:mem_page_size=1GB <flavor> - # openstack flavor set --property sriov=true --property hw:cpu_policy=dedicated --property hw:mem_page_size=1GB <flavor>- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow