Chapter 4. Configure DPDK Accelerated Open vSwitch (OVS) for Networking
This chapter covers DPDK with Open vSwitch installation and tuning within the Red Hat OpenStack Platform environment.
See Planning Your OVS-DPDK Deployment to understand the parameters used to configure OVS-DPDK.
This guide provides examples for CPU assignments, memory allocation, and NIC configurations that may vary from your topology and use case. See the Network Functions Virtualization Product Guide and the Network Functions Virtualization Planning Guide to understand the hardware and configuration options.
Do not edit or change isolated_cores
or other values in etc/tuned/cpu-partitioning-variables.conf
that are modified by these director heat templates.
In the following procedures, you need to:
-
Update the appropriate
network-environment.yaml
file to include parameters for kernel arguments and DPDK arguments. -
Update the
compute.yaml
file to include the bridge for DPDK interface parameters. -
Update the
controller.yaml
file to include the same bridge details for DPDK interface parameters. -
Run the
overcloud_deploy.sh
script to deploy the overcloud with the DPDK parameters.
For deployments that use hugepages, you also need to configure reserved_huge_pages. See How to set reserved_huge_pages in /etc/nova/nova.conf in Red Hat OpenStack Platform 10 for details.
Before you begin the procedure, ensure that you have the following:
- Red Hat OpenStack Platform 10 with Red Hat Enterprise Linux 7.5
- OVS-DPDK 2.9
- Tested NIC. For a list of tested NICs for NFV, see Tested NICs.
Red Hat OpenStack Platform 10 with OVS 2.9 operates in OVS client mode for OVS-DPDK deployments.
4.1. Naming Conventions
We recommend that you follow a consistent naming convention when you use custom roles in your OpenStack deployment, especially with multiple nodes. This naming convention can assist you when creating the following files and configurations:
instackenv.json
- To differentiate between nodes with different hardware or NIC capabilities."name":"computeovsdpdk-0"
roles_data.yaml
- To differentiate between compute-based roles that support DPDK.`ComputeOvsDpdk`
network-environment.yaml
- To ensure that you match the custom role to the correct flavor name.`OvercloudComputeOvsDpdkFlavor: computeovsdpdk`
-
nic-config
file names - To differentiate NIC yaml files for compute nodes that support DPDK interfaces. Flavor creation - To help you match a flavor and
capabilities:profile
value to the appropriate bare metal node and custom role.# openstack flavor create --id auto --ram 4096 --disk 40 --vcpus 4 computeovsdpdk # openstack flavor set --property "cpu_arch"="x86_64" --property "capabilities:boot_option"="local" --property "capabilities:profile"="computeovsdpdk" computeovsdpdk
Bare metal node - To ensure that you match the bare metal node with the appropriate hardware and
capability:profile
value.# openstack baremetal node update computeovsdpdk-0 add properties/capabilities='profile:computeovsdpdk,boot_option:local'
The flavor name does not have to match the capabilities:profile
value for the flavor, but the flavor capabilities:profile
value must match the bare metal node properties/capabilities='profile
value. All three use computeovsdpdk
in this example.
Ensure that all your nodes used for a custom role and profile have the same CPU, RAM, and PCI hardware topology.
4.2. Configure Two-Port OVS-DPDK Data Plane Bonding with VLAN Tunnelling
This section covers the procedures to configure and deploy OVS-DPDK with two data plane ports in an OVS-DPDK bond, with control plane Linux bonding for your OpenStack environment.
4.2.1. Modify first-boot.yaml
Modify the first-boot.yaml
file to set up OVS and DPDK parameters and to configure tuned
for CPU affinity.
If you have included the following lines in the first-boot.yaml
file in a previous deployment, remove these lines for Red Hat OpenStack Platform 10 with Open vSwitch 2.9.
ovs_service_path="/usr/lib/systemd/system/ovs-vswitchd.service" grep -q "RuntimeDirectoryMode=.*" $ovs_service_path if [ "$?" -eq 0 ]; then sed -i 's/RuntimeDirectoryMode=.*/RuntimeDirectoryMode=0775/' $ovs_service_path else echo "RuntimeDirectoryMode=0775" >> $ovs_service_path fi grep -Fxq "Group=qemu" $ovs_service_path if [ ! "$?" -eq 0 ]; then echo "Group=qemu" >> $ovs_service_path fi grep -Fxq "UMask=0002" $ovs_service_path if [ ! "$?" -eq 0 ]; then echo "UMask=0002" >> $ovs_service_path fi ovs_ctl_path='/usr/share/openvswitch/scripts/ovs-ctl' grep -q "umask 0002 \&\& start_daemon \"\$OVS_VSWITCHD_PRIORITY\"" $ovs_ctl_path if [ ! "$?" -eq 0 ]; then sed -i 's/start_daemon \"\$OVS_VSWITCHD_PRIORITY.*/umask 0002 \&\& start_daemon \"$OVS_VSWITCHD_PRIORITY\" \"$OVS_VSWITCHD_WRAPPER\" \"$@\"/' $ovs_ctl_path fi
Add additional resources.
resources: userdata: type: OS::Heat::MultipartMime properties: parts: - config: {get_resource: set_dpdk_params} - config: {get_resource: install_tuned} - config: {get_resource: compute_kernel_args}
Set the DPDK parameters.
set_dpdk_params: type: OS::Heat::SoftwareConfig properties: config: str_replace: template: | #!/bin/bash set -x get_mask() { local list=$1 local mask=0 declare -a bm max_idx=0 for core in $(echo $list | sed 's/,/ /g') do index=$(($core/32)) bm[$index]=0 if [ $max_idx -lt $index ]; then max_idx=$(($index)) fi done for ((i=$max_idx;i>=0;i--)); do bm[$i]=0 done for core in $(echo $list | sed 's/,/ /g') do index=$(($core/32)) temp=$((1<<$(($core % 32)))) bm[$index]=$((${bm[$index]} | $temp)) done printf -v mask "%x" "${bm[$max_idx]}" for ((i=$max_idx-1;i>=0;i--)); do printf -v hex "%08x" "${bm[$i]}" mask+=$hex done printf "%s" "$mask" } FORMAT=$COMPUTE_HOSTNAME_FORMAT if [[ -z $FORMAT ]] ; then FORMAT="compute" ; else # Assumption: only %index% and %stackname% are the variables in Host name format FORMAT=$(echo $FORMAT | sed 's/\%index\%//g' | sed 's/\%stackname\%//g') ; fi if [[ $(hostname) == *$FORMAT* ]] ; then # 42477 is the kolla hugetlbfs gid value. getent group hugetlbfs >/dev/null || \ groupadd hugetlbfs -g 42477 && groupmod -g 42477 hugetlbfs pmd_cpu_mask=$( get_mask $PMD_CORES ) host_cpu_mask=$( get_mask $LCORE_LIST ) socket_mem=$(echo $SOCKET_MEMORY | sed s/\'//g ) ovs-vsctl --no-wait set Open_vSwitch . other_config:dpdk-init=true ovs-vsctl --no-wait set Open_vSwitch . other_config:dpdk-socket-mem=$socket_mem ovs-vsctl --no-wait set Open_vSwitch . other_config:pmd-cpu-mask=$pmd_cpu_mask ovs-vsctl --no-wait set Open_vSwitch . other_config:dpdk-lcore-mask=$host_cpu_mask fi params: $COMPUTE_HOSTNAME_FORMAT: {get_param: ComputeHostnameFormat} $LCORE_LIST: {get_param: HostCpusList} $PMD_CORES: {get_param: NeutronDpdkCoreList} $SOCKET_MEMORY: {get_param: NeutronDpdkSocketMemory}
Set the
tuned
configuration to provide CPU affinity.install_tuned: type: OS::Heat::SoftwareConfig properties: config: str_replace: template: | #!/bin/bash FORMAT=$COMPUTE_HOSTNAME_FORMAT if [[ -z $FORMAT ]] ; then FORMAT="compute" ; else # Assumption: only %index% and %stackname% are the variables in Host name format FORMAT=$(echo $FORMAT | sed 's/\%index\%//g' | sed 's/\%stackname\%//g') ; fi if [[ $(hostname) == *$FORMAT* ]] ; then # Install the tuned package yum install -y tuned-profiles-cpu-partitioning tuned_conf_path="/etc/tuned/cpu-partitioning-variables.conf" if [ -n "$TUNED_CORES" ]; then grep -q "^isolated_cores" $tuned_conf_path if [ "$?" -eq 0 ]; then sed -i 's/^isolated_cores=.*/isolated_cores=$TUNED_CORES/' $tuned_conf_path else echo "isolated_cores=$TUNED_CORES" >> $tuned_conf_path fi tuned-adm profile cpu-partitioning fi fi params: $COMPUTE_HOSTNAME_FORMAT: {get_param: ComputeHostnameFormat} $TUNED_CORES: {get_param: HostIsolatedCoreList}
Set the kernel arguments.
compute_kernel_args: type: OS::Heat::SoftwareConfig properties: config: str_replace: template: | #!/bin/bash FORMAT=$COMPUTE_HOSTNAME_FORMAT if [[ -z $FORMAT ]] ; then FORMAT="compute" ; else # Assumption: only %index% and %stackname% are the variables in Host name format FORMAT=$(echo $FORMAT | sed 's/\%index\%//g' | sed 's/\%stackname\%//g') ; fi if [[ $(hostname) == *$FORMAT* ]] ; then sed 's/^\(GRUB_CMDLINE_LINUX=".*\)"/\1 $KERNEL_ARGS isolcpus=$TUNED_CORES"/g' -i /etc/default/grub ; grub2-mkconfig -o /etc/grub2.cfg reboot fi params: $KERNEL_ARGS: {get_param: ComputeKernelArgs} $COMPUTE_HOSTNAME_FORMAT: {get_param: ComputeHostnameFormat} $TUNED_CORES: {get_param: HostIsolatedCoreList}
4.2.2. Modify network-environment.yaml
Add the custom resources for OVS-DPDK under
resource_registry
.resource_registry: # Specify the relative/absolute path to the config files you want to use for override the default. OS::TripleO::Compute::Net::SoftwareConfig: nic-configs/compute.yaml OS::TripleO::Controller::Net::SoftwareConfig: nic-configs/controller.yaml OS::TripleO::NodeUserData: first-boot.yaml
Under
parameter_defaults
, disable the tunnel type (set the value to""
), and set the network type tovlan
.NeutronTunnelTypes: '' NeutronNetworkType: 'vlan'
Under
parameter_defaults
, map the physical network to the virtual bridge.NeutronBridgeMappings: 'tenant:br-link0'
Under
parameter_defaults
, set the OpenStack Networking ML2 and Open vSwitch VLAN mapping range.NeutronNetworkVLANRanges: 'tenant:22:22,tenant:25:25'
This example sets the VLAN ranges on the physical network.
Under
parameter_defaults
, set the OVS-DPDK configuration parameters.NoteNeutronDPDKCoreList
andNeutronDPDKMemoryChannels
are the required settings for this procedure. Attempting to deploy DPDK without appropriate values causes the deployment to fail or lead to unstable deployments.Provide a list of cores that can be used as DPDK poll mode drivers (PMDs) in the format -
[allowed_pattern: "'[0-9,-]+'"]
.NeutronDpdkCoreList: "'2,22,3,23'"
NoteYou must assign at least one CPU (with sibling thread) on each NUMA node with or without DPDK NICs present for DPDK PMD to avoid failures in creating guest instances.
To optimize OVS-DPDK performance, consider the following options:
-
Select CPUs associated with the NUMA node of the DPDK interface. Use
cat /sys/class/net/<interface>/device/numa_node
to list the NUMA node associated with an interface and uselscpu
to list the CPUs associated with that NUMA node. -
Group CPU siblings together (in case of hyper-threading). Use
cat /sys/devices/system/cpu/<cpu>/topology/thread_siblings_list
to find the sibling of a CPU. - Reserve CPU 0 for the host process.
- Isolate CPUs assigned to PMD so that the host process does not use these CPUs.
Use
NovaVcpuPinset
to exclude CPUs assigned to PMD from Compute scheduling.Provide the number of memory channels in the format -
[allowed_pattern: "[0-9]+"]
.NeutronDpdkMemoryChannels: "4"
Set the memory pre-allocated from the hugepage pool for each socket.
NeutronDpdkSocketMemory: "'3072,1024'"
This is a comma-separated string, in ascending order of the CPU socket. This example assumes a 2 NUMA node configuration and sets socket 0 to pre-allocate 1024 MB of huge pages, and sets socket 1 to pre-allocate 1024 MB. If you have a single NUMA node system, set this value to 1024,0.
Set the DPDK driver type for OVS bridges.
NeutronDpdkDriverType: "vfio-pci"
Under
parameter_defaults
, set the vhost-user socket directory for OVS.NeutronVhostuserSocketDir: "/var/lib/vhost_sockets"
Under
parameter_defaults
, reserve the RAM for the host processes.NovaReservedHostMemory: 4096
Under
parameter_defaults
, set a comma-separated list or range of physical CPU cores to reserve for virtual machine processes.NovaVcpuPinSet: "4-19,24-39"
Under
parameter_defaults
, list the the applicable filters.Nova scheduler applies these filters in the order they are listed. List the most restrictive filters first to make the filtering process for the nodes more efficient.
NovaSchedulerDefaultFilters: - "RetryFilter" - "AvailabilityZoneFilter" - "RamFilter" - "ComputeFilter" - "ComputeCapabilitiesFilter" - "ImagePropertiesFilter" - "ServerGroupAntiAffinityFilter" - "ServerGroupAffinityFilter" - "PciPassthroughFilter" - "NUMATopologyFilter" - "AggregateInstanceExtraSpecsFilter"
Under
parameter_defaults
, add theComputeKernelArgs
parameters to add these parameters to the defaultgrub
file at first boot.ComputeKernelArgs: "default_hugepagesz=1GB hugepagesz=1G hugepages=32 iommu=pt intel_iommu=on"
NoteThese huge pages are consumed by the virtual machines, and also by OVS-DPDK using the
NeutronDpdkSocketMemory
parameter as shown in this procedure. The number of huge pages available for the virtual machines is theboot
parameter minus theNeutronDpdkSocketMemory
.You need to add
hw:mem_page_size=1GB
to the flavor you associate with the DPDK instance. If you do not do this, the instance does not get a DHCP allocation.Under
parameter_defaults
, set a list or range of physical CPU cores to be tuned.The given argument is appended to the tuned
cpu-partitioning
profile.HostIsolatedCoreList: "2-19,22-39"
Under
parameters_default
, set the logical OVS-DPDK cores list. These cores must be mutually exclusive from the list of cores inNeutronDpdkCoreList
andNovaVcpuPinSet
.HostCpusList: "'0,20,1,21'"
4.2.3. Modify controller.yaml
Create a separate provisioning interface.
network_config: - type: interface name: nic1 use_dhcp: false defroute: false - type: interface name: nic2 addresses: - ip_netmask: list_join: - '/' - - {get_param: ControlPlaneIp} - {get_param: ControlPlaneSubnetCidr} routes: - ip_netmask: 169.254.169.254/32 next_hop: {get_param: EC2MetadataIp}
Create the control plane Linux bond for an isolated network.
type: linux_bond name: bond_api bonding_options: "mode=active-backup" use_dhcp: false dns_servers: {get_param: DnsServers} members: - type: interface name: nic3 primary: true - type: interface name: nic4
Assign VLANs to this Linux bond.
- type: vlan vlan_id: {get_param: InternalApiNetworkVlanID} device: bond_api addresses: - ip_netmask: {get_param: InternalApiIpSubnet} - type: vlan vlan_id: {get_param: TenantNetworkVlanID} device: bond_api addresses: - ip_netmask: {get_param: TenantIpSubnet} - type: vlan vlan_id: {get_param: StorageNetworkVlanID} device: bond_api addresses: - ip_netmask: {get_param: StorageIpSubnet} - type: vlan vlan_id: {get_param: StorageMgmtNetworkVlanID} device: bond_api addresses: - ip_netmask: {get_param: StorageMgmtIpSubnet} - type: vlan vlan_id: {get_param: ExternalNetworkVlanID} device: bond_api addresses: - ip_netmask: {get_param: ExternalIpSubnet} routes: - default: true next_hop: {get_param: ExternalInterfaceDefaultRoute}
Create the OVS bridge for access to neutron-dhcp-agent and neutron-metadata-agent services.
- type: ovs_bridge name: br-link0 use_dhcp: false mtu: 9000 members: - type: ovs_bond name: bond0 use_dhcp: true members: - type: interface name: nic7 mtu: 9000 - type: interface name: nic8 mtu: 9000
4.2.4. Modify compute.yaml
Modify the default compute.yaml
file and make the following changes:
Create a separate provisioning interface.
network_config: - type: interface name: nic1 use_dhcp: false defroute: false - type: interface name: nic2 use_dhcp: false addresses: - ip_netmask: list_join: - '/' - - {get_param: ControlPlaneIp} - {get_param: ControlPlaneSubnetCidr} routes: - ip_netmask: 169.254.169.254/32 next_hop: {get_param: EC2MetadataIp} - default: true next_hop: {get_param: ControlPlaneDefaultRoute}
Create the control plane Linux bond for an isolated network.
- type: linux_bond name: bond_api bonding_options: "mode=active-backup" use_dhcp: false dns_servers: {get_param: DnsServers} members: - type: interface name: nic3 primary: true - type: interface name: nic4
Assign VLANs to this Linux bond.
- type: vlan vlan_id: {get_param: InternalApiNetworkVlanID} device: bond_api addresses: - ip_netmask: {get_param: InternalApiIpSubnet} - type: vlan vlan_id: {get_param: TenantNetworkVlanID} device: bond_api addresses: - ip_netmask: {get_param: TenantIpSubnet} - type: vlan vlan_id: {get_param: StorageNetworkVlanID} device: bond_api addresses: - ip_netmask: {get_param: StorageIpSubnet}
Set a bridge with two DPDK ports in an OVS-DPDK data plane bond to link to the controller.
- type: ovs_user_bridge name: br-link use_dhcp: false members: - type: ovs_dpdk_bond name: dpdkbond0 mtu: 9000 ovs_extra: - set interface dpdk0 mtu_request=$MTU - set interface dpdk1 mtu_request=$MTU - set interface dpdk0 options:n_rxq=2 - set interface dpdk1 options:n_rxq=2 members: - type: ovs_dpdk_port name: dpdk0 members: - type: interface name: nic7 - type: ovs_dpdk_port name: dpdk1 members: - type: interface name: nic8
NoteTo include multiple DPDK devices, repeat the
type
code section for each DPDK device you want to add.NoteWhen using OVS-DPDK, all bridges on the same Compute node should be of type
ovs_user_bridge
. The director may accept the configuration, but Red Hat OpenStack Platform does not support mixingovs_bridge
andovs_user_bridge
on the same node.
4.2.5. Run the overcloud_deploy.sh Script
The following example defines the openstack overcloud deploy
command for the OVS-DPDK environment within a bash script:
#!/bin/bash openstack overcloud deploy \ --templates \ -e /usr/share/openstack-tripleo-heat-templates/environments/network-isolation.yaml \ -e /usr/share/openstack-tripleo-heat-templates/environments/neutron-ovs-dpdk.yaml \ -e /usr/share/openstack-tripleo-heat-templates/environments/ovs-dpdk-permissions.yaml \ -e /home/stack/ospd-10-vlan-dpdk-two-ports-ctlplane-dataplane-bonding/network-environment.yaml
-
/usr/share/openstack-tripleo-heat-templates/environments/neutron-ovs-dpdk.yaml
is the location of the defaultneutron-ovs-dpdk.yaml
file, which enables the OVS-DPDK parameters for the Compute role. -
/home/stack/<relative-directory>/network-environment.yaml
is the path for thenetwork-environment.yaml
file. Use this file to overwrite the default values from theneutron-ovs-dpdk.yaml
file.
This configuration of OVS-DPDK does not support security groups and live migrations.
4.3. Configure Single-Port OVS-DPDK with VXLAN Tunnelling
This section covers the procedures to configure single-port OVS-DPDK with control plane Linux bonding and VXLAN tunnelling for your OpenStack environment.
4.3.1. Modify first-boot.yaml
Modify the first-boot.yaml
file to set up OVS and DPDK parameters and to configure tuned
for CPU affinity.
If you have included the following lines in the first-boot.yaml
file in a previous deployment, remove these lines for Red Hat OpenStack Platform 10 with Open vSwitch 2.9.
ovs_service_path="/usr/lib/systemd/system/ovs-vswitchd.service" grep -q "RuntimeDirectoryMode=.*" $ovs_service_path if [ "$?" -eq 0 ]; then sed -i 's/RuntimeDirectoryMode=.*/RuntimeDirectoryMode=0775/' $ovs_service_path else echo "RuntimeDirectoryMode=0775" >> $ovs_service_path fi grep -Fxq "Group=qemu" $ovs_service_path if [ ! "$?" -eq 0 ]; then echo "Group=qemu" >> $ovs_service_path fi grep -Fxq "UMask=0002" $ovs_service_path if [ ! "$?" -eq 0 ]; then echo "UMask=0002" >> $ovs_service_path fi ovs_ctl_path='/usr/share/openvswitch/scripts/ovs-ctl' grep -q "umask 0002 \&\& start_daemon \"\$OVS_VSWITCHD_PRIORITY\"" $ovs_ctl_path if [ ! "$?" -eq 0 ]; then sed -i 's/start_daemon \"\$OVS_VSWITCHD_PRIORITY.*/umask 0002 \&\& start_daemon \"$OVS_VSWITCHD_PRIORITY\" \"$OVS_VSWITCHD_WRAPPER\" \"$@\"/' $ovs_ctl_path fi
Add additional resources.
resources: userdata: type: OS::Heat::MultipartMime properties: parts: - config: {get_resource: set_dpdk_params} - config: {get_resource: install_tuned} - config: {get_resource: compute_kernel_args}
Set the DPDK parameters.
set_dpdk_params: type: OS::Heat::SoftwareConfig properties: config: str_replace: template: | #!/bin/bash set -x get_mask() { local list=$1 local mask=0 declare -a bm max_idx=0 for core in $(echo $list | sed 's/,/ /g') do index=$(($core/32)) bm[$index]=0 if [ $max_idx -lt $index ]; then max_idx=$(($index)) fi done for ((i=$max_idx;i>=0;i--)); do bm[$i]=0 done for core in $(echo $list | sed 's/,/ /g') do index=$(($core/32)) temp=$((1<<$(($core % 32)))) bm[$index]=$((${bm[$index]} | $temp)) done printf -v mask "%x" "${bm[$max_idx]}" for ((i=$max_idx-1;i>=0;i--)); do printf -v hex "%08x" "${bm[$i]}" mask+=$hex done printf "%s" "$mask" } FORMAT=$COMPUTE_HOSTNAME_FORMAT if [[ -z $FORMAT ]] ; then FORMAT="compute" ; else # Assumption: only %index% and %stackname% are the variables in Host name format FORMAT=$(echo $FORMAT | sed 's/\%index\%//g' | sed 's/\%stackname\%//g') ; fi if [[ $(hostname) == *$FORMAT* ]] ; then # 42477 is the kolla hugetlbfs gid value. getent group hugetlbfs >/dev/null || \ groupadd hugetlbfs -g 42477 && groupmod -g 42477 hugetlbfs pmd_cpu_mask=$( get_mask $PMD_CORES ) host_cpu_mask=$( get_mask $LCORE_LIST ) socket_mem=$(echo $SOCKET_MEMORY | sed s/\'//g ) ovs-vsctl --no-wait set Open_vSwitch . other_config:dpdk-init=true ovs-vsctl --no-wait set Open_vSwitch . other_config:dpdk-socket-mem=$socket_mem ovs-vsctl --no-wait set Open_vSwitch . other_config:pmd-cpu-mask=$pmd_cpu_mask ovs-vsctl --no-wait set Open_vSwitch . other_config:dpdk-lcore-mask=$host_cpu_mask fi params: $COMPUTE_HOSTNAME_FORMAT: {get_param: ComputeHostnameFormat} $LCORE_LIST: {get_param: HostCpusList} $PMD_CORES: {get_param: NeutronDpdkCoreList} $SOCKET_MEMORY: {get_param: NeutronDpdkSocketMemory}
Set the
tuned
configuration to provide CPU affinity.install_tuned: type: OS::Heat::SoftwareConfig properties: config: str_replace: template: | #!/bin/bash FORMAT=$COMPUTE_HOSTNAME_FORMAT if [[ -z $FORMAT ]] ; then FORMAT="compute" ; else # Assumption: only %index% and %stackname% are the variables in Host name format FORMAT=$(echo $FORMAT | sed 's/\%index\%//g' | sed 's/\%stackname\%//g') ; fi if [[ $(hostname) == *$FORMAT* ]] ; then # Install the tuned package yum install -y tuned-profiles-cpu-partitioning tuned_conf_path="/etc/tuned/cpu-partitioning-variables.conf" if [ -n "$TUNED_CORES" ]; then grep -q "^isolated_cores" $tuned_conf_path if [ "$?" -eq 0 ]; then sed -i 's/^isolated_cores=.*/isolated_cores=$TUNED_CORES/' $tuned_conf_path else echo "isolated_cores=$TUNED_CORES" >> $tuned_conf_path fi tuned-adm profile cpu-partitioning fi fi params: $COMPUTE_HOSTNAME_FORMAT: {get_param: ComputeHostnameFormat} $TUNED_CORES: {get_param: HostIsolatedCoreList}
Set the kernel arguments.
compute_kernel_args: type: OS::Heat::SoftwareConfig properties: config: str_replace: template: | #!/bin/bash FORMAT=$COMPUTE_HOSTNAME_FORMAT if [[ -z $FORMAT ]] ; then FORMAT="compute" ; else # Assumption: only %index% and %stackname% are the variables in Host name format FORMAT=$(echo $FORMAT | sed 's/\%index\%//g' | sed 's/\%stackname\%//g') ; fi if [[ $(hostname) == *$FORMAT* ]] ; then sed 's/^\(GRUB_CMDLINE_LINUX=".*\)"/\1 $KERNEL_ARGS isolcpus=$TUNED_CORES"/g' -i /etc/default/grub ; grub2-mkconfig -o /etc/grub2.cfg reboot fi params: $KERNEL_ARGS: {get_param: ComputeKernelArgs} $COMPUTE_HOSTNAME_FORMAT: {get_param: ComputeHostnameFormat} $TUNED_CORES: {get_param: HostIsolatedCoreList}
4.3.2. Modify network-environment.yaml
Add the custom resources for OVS-DPDK under
resource_registry
.resource_registry: # Specify the relative/absolute path to the config files you want to use for override the default. OS::TripleO::Compute::Net::SoftwareConfig: nic-configs/compute.yaml OS::TripleO::Controller::Net::SoftwareConfig: nic-configs/controller.yaml OS::TripleO::NodeUserData: first-boot.yaml
Under
parameter_defaults
, set the tunnel type and the tenant type tovxlan
.NeutronTunnelTypes: 'vxlan' NeutronNetworkType: 'vxlan'
Under
parameter_defaults
, set the OVS-DPDK configuration parameters.NoteNeutronDPDKCoreList
andNeutronDPDKMemoryChannels
are the required settings for this procedure. Attempting to deploy DPDK without appropriate values causeS the deployment to fail or lead to unstable deployments.Provide a list of cores that can be used as DPDK poll mode drivers (PMDs) in the format -
[allowed_pattern: "'[0-9,-]+'"]
.NeutronDpdkCoreList: "'2,22,3,23'"
NoteYou must assign at least one CPU (with sibling thread) on each NUMA node with or without DPDK NICs present for DPDK PMD to avoid failures in creating guest instances.
To optimize OVS-DPDK performance, consider the following options:
-
Select CPUs associated with the NUMA node of the DPDK interface. Use
cat /sys/class/net/<interface>/device/numa_node
to list the NUMA node associated with an interface and uselscpu
to list the CPUs associated with that NUMA node. -
Group CPU siblings together (in case of hyper-threading). Use
cat /sys/devices/system/cpu/<cpu>/topology/thread_siblings_list
to find the sibling of a CPU. - Reserve CPU 0 for the host process.
- Isolate CPUs assigned to PMD so that the host process does not use these CPUs.
Use
NovaVcpuPinset
to exclude CPUs assigned to PMD from Compute scheduling.Provide the number of memory channels in the format -
[allowed_pattern: "[0-9]+"]
.NeutronDpdkMemoryChannels: "4"
Set the memory pre-allocated from the hugepage pool for each socket.
NeutronDpdkSocketMemory: "'3072,1024'"
This is a comma-separated string, in ascending order of the CPU socket. If you have a single NUMA node system, set this value to 3072,0.
Set the DPDK driver type for OVS bridges.
NeutronDpdkDriverType: "vfio-pci"
Under
parameter_defaults
, set the vhost-user socket directory for OVS.NeutronVhostuserSocketDir: "/var/lib/vhost_sockets"
Under
parameter_defaults
, reserve the RAM for the host processes.NovaReservedHostMemory: 4096
Under
parameter_defaults
, set a comma-separated list or range of physical CPU cores to reserve for virtual machine processes.NovaVcpuPinSet: "4-19,24-39"
Under
parameter_defaults
, list the the applicable filters.Nova scheduler applies these filters in the order they are listed. List the most restrictive filters first to make the filtering process for the nodes more efficient.
NovaSchedulerDefaultFilters: - "RetryFilter" - "AvailabilityZoneFilter" - "RamFilter" - "ComputeFilter" - "ComputeCapabilitiesFilter" - "ImagePropertiesFilter" - "ServerGroupAntiAffinityFilter" - "ServerGroupAffinityFilter" - "PciPassthroughFilter" - "NUMATopologyFilter" - "AggregateInstanceExtraSpecsFilter"
Under
parameter_defaults
, add theComputeKernelArgs
parameters to add these parameters to the defaultgrub
file at first boot.ComputeKernelArgs: "default_hugepagesz=1GB hugepagesz=1G hugepages=32 iommu=pt intel_iommu=on"
NoteThese huge pages are consumed by the virtual machines, and also by OVS-DPDK using the
NeutronDpdkSocketMemory
parameter as shown in this procedure. The number of huge pages available for the virtual machines is theboot
parameter minus theNeutronDpdkSocketMemory
.You need to add
hw:mem_page_size=1GB
to the flavor you associate with the DPDK instance. If you do not do this, the instance does not get a DHCP allocation.Under
parameter_defaults
, set a list or range of physical CPU cores to be tuned.The given argument is appended to the tuned
cpu-partitioning
profile.HostIsolatedCoreList: "2-19,22-39"
Under
parameters_default
, set the logical OVS-DPDK cores list. These cores must be mutually exclusive from the list of cores inNeutronDpdkCoreList
andNovaVcpuPinSet
.HostCpusList: "'0,20,22-39'"
4.3.3. Modify controller.yaml
Create a separate provisioning interface.
network_config: - type: interface name: nic1 use_dhcp: false defroute: false - type: interface name: nic2 addresses: - ip_netmask: list_join: - '/' - - {get_param: ControlPlaneIp} - {get_param: ControlPlaneSubnetCidr} routes: - ip_netmask: 169.254.169.254/32 next_hop: {get_param: EC2MetadataIp}
Create the control plane Linux bond for an isolated network.
- type: linux_bond name: bond_api bonding_options: "mode=active-backup" use_dhcp: false dns_servers: {get_param: DnsServers} members: - type: interface name: nic3 primary: true - type: interface name: nic4
Assign VLANs to this Linux bond.
- type: vlan vlan_id: {get_param: InternalApiNetworkVlanID} device: bond_api addresses: - ip_netmask: {get_param: InternalApiIpSubnet} - type: vlan vlan_id: {get_param: StorageNetworkVlanID} device: bond_api addresses: - ip_netmask: {get_param: StorageIpSubnet} - type: vlan vlan_id: {get_param: StorageMgmtNetworkVlanID} device: bond_api addresses: - ip_netmask: {get_param: StorageMgmtIpSubnet} - type: vlan vlan_id: {get_param: ExternalNetworkVlanID} device: bond_api addresses: - ip_netmask: {get_param: ExternalIpSubnet} routes: - default: true next_hop: {get_param: ExternalInterfaceDefaultRoute}
Create the OVS bridge for access to neutron-dhcp-agent and neutron-metadata-agent services.
- type: ovs_bridge name: br-link0 use_dhcp: false mtu: 9000 members: - type: ovs_bond name: bond0 use_dhcp: true members: - type: interface name: nic7 mtu: 9000 - type: interface name: nic8 mtu: 9000 - type: vlan vlan_id: {get_param: TenantNetworkVlanID} device: bond0 mtu: 9000 addresses: - ip_netmask: {get_param: TenantIpSubnet}
4.3.4. Modify compute.yaml
Create the compute-ovs-dpdk.yaml
file from the default compute.yaml
file and make the following changes:
Create a separate provisioning interface.
network_config: - type: interface name: nic1 use_dhcp: false defroute: false - type: interface name: nic2 use_dhcp: false addresses: - ip_netmask: list_join: - '/' - - {get_param: ControlPlaneIp} - {get_param: ControlPlaneSubnetCidr} routes: - ip_netmask: 169.254.169.254/32 next_hop: {get_param: EC2MetadataIp} - default: true next_hop: {get_param: ControlPlaneDefaultRoute}
Create the control plane Linux bond for an isolated network.
- type: linux_bond name: bond_api bonding_options: "mode=active-backup" use_dhcp: false dns_servers: {get_param: DnsServers} members: - type: interface name: nic3 primary: true - type: interface name: nic4
Assign VLANs to this Linux bond.
- type: vlan vlan_id: {get_param: InternalApiNetworkVlanID} device: bond_api addresses: - ip_netmask: {get_param: InternalApiIpSubnet} - type: vlan vlan_id: {get_param: StorageNetworkVlanID} device: bond_api addresses: - ip_netmask: {get_param: StorageIpSubnet}
Set a bridge with a DPDK port to link to the controller.
- type: ovs_user_bridge name: br-link0 use_dhcp: false ovs_extra: - str_replace: template: set port br-link0 tag=_VLAN_TAG_ params: _VLAN_TAG_: {get_param: TenantNetworkVlanID} addresses: - ip_netmask: {get_param: TenantIpSubnet} members: - type: ovs_dpdk_bond name: dpdkbond0 mtu: 9000 ovs_extra: - set interface dpdk0 mtu_request=$MTU - set interface dpdk1 mtu_request=$MTU - set interface dpdk0 options:n_rxq=2 - set interface dpdk1 options:n_rxq=2 members: - type: ovs_dpdk_port name: dpdk0 members: - type: interface name: nic7 - type: ovs_dpdk_port name: dpdk1 members: - type: interface name: nic8
NoteTo include multiple DPDK devices, repeat the
type
code section for each DPDK device you want to add.NoteWhen using OVS-DPDK, all bridges on the same Compute node should be of type
ovs_user_bridge
. The director may accept the configuration, but Red Hat OpenStack Platform does not support mixingovs_bridge
andovs_user_bridge
on the same node.
4.3.5. Run the overcloud_deploy.sh Script
The following example defines the openstack overcloud deploy
command for the OVS-DPDK environment within a Bash script:
#!/bin/bash openstack overcloud deploy \ --templates \ -e /usr/share/openstack-tripleo-heat-templates/environments/network-isolation.yaml \ -e /usr/share/openstack-tripleo-heat-templates/environments/neutron-ovs-dpdk.yaml \ -e /usr/share/openstack-tripleo-heat-templates/environments/ovs-dpdk-permissions.yaml \ -e /home/stack/ospd-10-vxlan-dpdk-single-port-ctlplane-bonding/network-environment.yaml
-
/usr/share/openstack-tripleo-heat-templates/environments/neutron-ovs-dpdk.yaml
is the location of the defaultneutron-ovs-dpdk.yaml
file, which enables the OVS-DPDK parameters for the Compute role. -
/home/stack/<relative-directory>/network-environment.yaml
is the path for thenetwork-environment.yaml
file. Use this file to overwrite the default values from theneutron-ovs-dpdk.yaml
file.
This configuration of OVS-DPDK does not support security groups and live migrations.
4.4. Set the MTU Value for OVS-DPDK Interfaces
Red Hat OpenStack Platform supports jumbo frames for OVS-DPDK. To set the MTU value for jumbo frames you must:
-
Set the global MTU value for networking in the
network-environment.yaml
file. -
Set the physical DPDK port MTU value in the
compute.yaml
file. This value is also used by the vhost user interface. - Set the MTU value within any guest instances on the Compute node to ensure that you have a comparable MTU value from end to end in your configuration.
VXLAN packets include an extra 50 bytes in the header. Calculate your MTU requirements based on these additional header bytes. For example, an MTU value of 9000 means the VXLAN tunnel MTU value is 8950 to account for these extra bytes.
You do not need any special configuration for the physical NIC since the NIC is controlled by the DPDK PMD and has the same MTU value set by the compute.yaml
file. You cannot set an MTU value larger than the maximum value supported by the physical NIC.
To set the MTU value for OVS-DPDK interfaces:
Set the
NeutronGlobalPhysnetMtu
parameter in thenetwork-environment.yaml
file.parameter_defaults: # Global MTU configuration on Neutron NeutronGlobalPhysnetMtu: 9000
NoteEnsure that the NeutronDpdkSocketMemory value in the
network-environment.yaml
file is large enough to support jumbo frames. See Memory Parameters for details.Set the MTU value on the bridge to the Compute node in the
controller.yaml
file.- type: ovs_bridge name: br-link0 use_dhcp: false mtu: 9000 members: - type: ovs_bond name: bond0 use_dhcp: true members: - type: interface name: nic7 mtu: 9000 - type: interface name: nic8 mtu: 9000 - type: vlan vlan_id: {get_param: TenantNetworkVlanID} device: bond0 mtu: 9000 addresses: - ip_netmask: {get_param: TenantIpSubnet}
To set the MTU values for the OVS-DPDK interfaces and bonds in the compute.yaml
file:
- type: ovs_user_bridge name: br-link0 use_dhcp: false ovs_extra: - str_replace: template: set port br-link0 tag=VLAN_TAG params: VLAN_TAG: {get_param: TenantNetworkVlanID} addresses: - ip_netmask: {get_param: TenantIpSubnet} members: - type: ovs_dpdk_bond name: dpdkbond0 mtu: 9000 ovs_extra: - set interface dpdk0 mtu_request=$MTU - set interface dpdk1 mtu_request=$MTU - set interface dpdk0 options:n_rxq=2 - set interface dpdk1 options:n_rxq=2 members: - type: ovs_dpdk_port name: dpdk0 members: - type: interface name: nic7 - type: ovs_dpdk_port name: dpdk1 members: - type: interface name: nic8
4.5. Set Multiqueue for OVS-DPDK Interfaces
To set the number of queues for an OVS-DPDK port on the Compute node, modify the compute.yaml
file as follows:
- type: ovs_user_bridge name: br-link0 use_dhcp: false ovs_extra: - str_replace: template: set port br-link0 tag=VLAN_TAG params: VLAN_TAG: {get_param: TenantNetworkVlanID} addresses: - ip_netmask: {get_param: TenantIpSubnet} members: - type: ovs_dpdk_bond name: dpdkbond0 mtu: 9000 ovs_extra: - set interface dpdk0 mtu_request=$MTU - set interface dpdk1 mtu_request=$MTU - set interface dpdk0 options:n_rxq=2 - set interface dpdk1 options:n_rxq=2 members: - type: ovs_dpdk_port name: dpdk0 members: - type: interface name: nic7 - type: ovs_dpdk_port name: dpdk1 members: - type: interface name: nic8
4.6. Known Limitations
There are certain limitations when configuring OVS-DPDK with Red Hat OpenStack Platform 10 for the NFV use case:
- Use Linux bonds for control plane networks. Ensure both PCI devices used in the bond are on the same NUMA node for optimum performance. Neutron Linux bridge configuration is not supported by Red Hat.
- Huge pages are required for every instance running on the hosts with OVS-DPDK. If huge pages are not present in the guest, the interface will appear but not function.
There is a performance degradation of services that use tap devices, because these devices do not support DPDK. For example, services such as DVR, FWaaS, and LBaaS use tap devices.
-
With OVS-DPDK, you can enable DVR with
netdev datapath
, but this has poor performance and is not suitable for a production environment. DVR uses kernel namespace and tap devices to perform the routing. - To ensure the DVR routing performs well with OVS-DPDK, you need to use a controller such as ODL which implements routing as OpenFlow rules. With OVS-DPDK, OpenFlow routing removes the bottleneck introduced by the Linux kernel interfaces so that the full performance of datapath is maintained.
-
With OVS-DPDK, you can enable DVR with
-
When using OVS-DPDK, all bridges should be of type
ovs_user_bridge
on the Compute node. The director may accept the configuration, but Red Hat OpenStack Platform does not support mixingovs_bridge
andovs_user_bridge
.
4.7. Create a Flavor and Deploy an Instance for OVS-DPDK
After you have completed configuring OVS-DPDK for your Red Hat OpenStack Platform deployment with NFV, you can create a flavor and deploy an instance with the following steps:
Create an aggregate group and add a host to it for OVS-DPDK. Define metadata, for example,
"aggregate_instance_extra_specs:dpdk"="true"
, that matches flavor metadata.# openstack aggregate create dpdk_group # openstack aggregate set --property \ "aggregate_instance_extra_specs:dpdk"="true" dpdk_group # openstack aggregate add host dpdk compute-ovs-dpdk-0.localdomain
Create a flavor.
# openstack flavor create <flavor --ram <MB> --disk <GB> --vcpus <#>
Set additional flavor properties. Note that the defined metadata,
"aggregate_instance_extra_specs:dpdk"=true"
, matches the defined metadata on the DPDK aggregate.# openstack flavor set --property "aggregate_instance_extra_specs:dpdk"="true" \ --property hw:cpu_policy=dedicated \ --property hw:mem_page_size=large <flavor>
Create the network.
# openstack network create net1 --provider-physical-network tenant --provider-network-type vlan --provider-segment <VLAN-ID>
Create the subnet.
# openstack subnet create subnet1 --network net1 --subnet-range 192.0.2.0/24 --dhcp
Deploy an instance.
# openstack server create --flavor <flavor> --image <glance_image> --nic net-id=net1 <name>
You have now deployed an instance for the OVS-DPDK with NFV use case.
4.7.1. Optimizing Performance with Emulator Thread Pinning
To improve performance, you can pin the Qemu emulator thread to an alternate core.
Determine which cores are used as vCPUs for your instance.
# virsh dumpxml dpdk_vm | grep cpuset <vcpupin vcpu='0' cpuset='2'/> <vcpupin vcpu='1' cpuset='18'/> <vcpupin vcpu='2' cpuset='1'/> <vcpupin vcpu='3' cpuset='17'/> <emulatorpin cpuset='1-2,17-18'/>
Select the core you want to pin the emulator thread to. Ensure the selected core is from the NovaVcpuPinSet.
#virsh emulatorpin <vm-name> --cpulist 2
NoteThe pCPU associated with the emulator pin thread consumes one vCPU (two threads if hyperthreading is enabled) from the
NovaVcpuPinSet
.
4.8. Troubleshooting the Configuration
This section describes the steps to troubleshoot the DPDK-OVS configuration.
Review the bridge configuration and confirm that the bridge was created with the
datapath_type=netdev
. For example:# ovs-vsctl list bridge br0 _uuid : bdce0825-e263-4d15-b256-f01222df96f3 auto_attach : [] controller : [] datapath_id : "00002608cebd154d" datapath_type : netdev datapath_version : "<built-in>" external_ids : {} fail_mode : [] flood_vlans : [] flow_tables : {} ipfix : [] mcast_snooping_enable: false mirrors : [] name : "br0" netflow : [] other_config : {} ports : [52725b91-de7f-41e7-bb49-3b7e50354138] protocols : [] rstp_enable : false rstp_status : {} sflow : [] status : {} stp_enable : false
Review the OVS service by confirming that the
neutron-ovs-agent
is configured to start automatically.# systemctl status neutron-openvswitch-agent.service neutron-openvswitch-agent.service - OpenStack Neutron Open vSwitch Agent Loaded: loaded (/usr/lib/systemd/system/neutron-openvswitch-agent.service; enabled; vendor preset: disabled) Active: active (running) since Mon 2015-11-23 14:49:31 AEST; 25min ago
If the service is having trouble starting, you can view any related messages.
# journalctl -t neutron-openvswitch-agent.service
Confirm that the PMD CPU mask of the
ovs-dpdk
are pinned to the CPUs. In case of HT, use sibling CPUs.For example, take
CPU4
:# cat /sys/devices/system/cpu/cpu4/topology/thread_siblings_list 4,20
So, using CPU 4 and 20:
# ovs-vsctl set Open_vSwitch . other_config:pmd-cpu-mask=0x100010
Display the status.
# tuna -t ovs-vswitchd -CP thread ctxt_switches pid SCHED_ rtpri affinity voluntary nonvoluntary cmd 3161 OTHER 0 6 765023 614 ovs-vswitchd 3219 OTHER 0 6 1 0 handler24 3220 OTHER 0 6 1 0 handler21 3221 OTHER 0 6 1 0 handler22 3222 OTHER 0 6 1 0 handler23 3223 OTHER 0 6 1 0 handler25 3224 OTHER 0 6 1 0 handler26 3225 OTHER 0 6 1 0 handler27 3226 OTHER 0 6 1 0 handler28 3227 OTHER 0 6 2 0 handler31 3228 OTHER 0 6 2 4 handler30 3229 OTHER 0 6 2 5 handler32 3230 OTHER 0 6 953538 431 revalidator29 3231 OTHER 0 6 1424258 976 revalidator33 3232 OTHER 0 6 1424693 836 revalidator34 3233 OTHER 0 6 951678 503 revalidator36 3234 OTHER 0 6 1425128 498 revalidator35 *3235 OTHER 0 4 151123 51 pmd37* *3236 OTHER 0 20 298967 48 pmd38* 3164 OTHER 0 6 47575 0 dpdk_watchdog3 3165 OTHER 0 6 237634 0 vhost_thread1 3166 OTHER 0 6 3665 0 urcu2