Chapter 9. Tuning a Red Hat OpenStack Platform environment

9.1. Pinning emulator threads

Emulator threads handle interrupt requests and non-blocking processes for virtual machine hardware emulation. These threads float across the CPUs that the guest uses for processing. If threads used for the poll mode driver (PMD) or real-time processing run on these guest CPUs, you can experience packet loss or missed deadlines.

You can separate emulator threads from VM processing tasks by pinning the threads to their own guest CPUs, increasing performance as a result.

9.1.1. Configuring CPUs to host emulator threads

To improve performance, reserve a subset of host CPUs for hosting emulator threads.

Procedure

Deploy an overcloud with NovaComputeCpuSharedSet defined for a given role. The value of NovaComputeCpuSharedSet applies to the cpu_shared_set parameter in the nova.conf file for hosts within that role.
```
parameter_defaults:
    ComputeOvsDpdkParameters:
        NovaComputeCpuSharedSet: "0-1,16-17"
        NovaComputeCpuDedicatedSet: "2-15,18-31"
```
Create a flavor to build instances with emulator threads separated into a shared pool.
```
openstack flavor create --ram <size_mb> --disk <size_gb> --vcpus <vcpus> <flavor>
```
Add the hw:emulator_threads_policy extra specification, and set the value to share. Instances created with this flavor will use the instance CPUs defined in the cpu_share_set parameter in the nova.conf file.
```
openstack flavor set <flavor> --property hw:emulator_threads_policy=share
```

Note

You must set the cpu_share_set parameter in the nova.conf file to enable the share policy for this extra specification. You should use heat for this preferably, as editing nova.conf manually might not persist across redeployments.

9.1.2. Verify the emulator thread pinning

Procedure

Identify the host and name for a given instance.
```
openstack server show <instance_id>
```

Use SSH to log on to the identified host as heat-admin.

ssh heat-admin@compute-1
[compute-1]$ sudo virsh dumpxml instance-00001 | grep `'emulatorpin cpuset'`

9.2. Enabling RT-KVM for NFV Workloads

To facilitate installing and configuring Red Hat Enterprise Linux 8.2 Real Time KVM (RT-KVM), Red Hat OpenStack Platform provides the following features:

A real-time Compute node role that provisions Red Hat Enterprise Linux for real-time.
The additional RT-KVM kernel module.
Automatic configuration of the Compute node.

9.2.1. Planning for your RT-KVM Compute nodes

You must use Red Hat certified servers for your RT-KVM Compute nodes. For more information, see: Red Hat Enterprise Linux for Real Time 7 certified servers.

For details on how to enable the rhel-8-server-nfv-rpms repository for RT-KVM, and ensuring your system is up to date, see: Registering and updating your undercloud.

Note

You need a separate subscription to a Red Hat OpenStack Platform for Real Time SKU before you can access this repository.

Building the real-time image

Install the libguestfs-tools package on the undercloud to get the virt-customize tool:
```
(undercloud) [stack@undercloud-0 ~]$ sudo dnf install libguestfs-tools
```
Important
If you install the libguestfs-tools package on the undercloud, disable iscsid.socket to avoid port conflicts with the tripleo_iscsid service on the undercloud:
```
$ sudo systemctl disable --now iscsid.socket
```

Extract the images:

(undercloud) [stack@undercloud-0 ~]$ tar -xf /usr/share/rhosp-director-images/overcloud-full.tar
(undercloud) [stack@undercloud-0 ~]$ tar -xf /usr/share/rhosp-director-images/ironic-python-agent.tar

Copy the default image:

(undercloud) [stack@undercloud-0 ~]$ cp overcloud-full.qcow2 overcloud-realtime-compute.qcow2

Register your image to enable Red Hat repositories relevant to your customizations. Replace [username] and [password] with valid credentials in the following example.
```
virt-customize -a overcloud-realtime-compute.qcow2 --run-command \
'subscription-manager register --username=[username] --password=[password]' \
subscription-manager release --set 8.2
```
Note
For security, you can remove credentials from the history file if they are used on the command prompt. You can delete individual lines in history using the history -d command followed by the line number.

Find a list of pool IDs from your account’s subscriptions, and attach the appropriate pool ID to your image.

sudo subscription-manager list --all --available | less
...
virt-customize -a overcloud-realtime-compute.qcow2 --run-command \
'subscription-manager attach --pool [pool-ID]'

Add the repositories necessary for Red Hat OpenStack Platform with NFV.

virt-customize -a overcloud-realtime-compute.qcow2 --run-command \
'sudo subscription-manager repos --enable=rhel-8-for-x86_64-baseos-eus-rpms \
--enable=rhel-8-for-x86_64-appstream-eus-rpms \
--enable=rhel-8-for-x86_64-highavailability-eus-rpms \
--enable=ansible-2.9-for-rhel-8-x86_64-rpms \
--enable=openstack-16.1-for-rhel-8-x86_64-rpms \
--enable=rhel-8-for-x86_64-nfv-rpms \
--enable=advanced-virt-for-rhel-8-x86_64-rpms \
--enable=fast-datapath-for-rhel-8-x86_64-rpms'

Create a script to configure real-time capabilities on the image.

(undercloud) [stack@undercloud-0 ~]$ cat <<'EOF' > rt.sh
  #!/bin/bash

  set -eux

  dnf -v -y --setopt=protected_packages= erase kernel.$(uname -m)
  dnf -v -y install kernel-rt kernel-rt-kvm tuned-profiles-nfv-host
  grubby --set-default /boot/vmlinuz*rt*
  EOF

Run the script to configure the real-time image:
```
(undercloud) [stack@undercloud-0 ~]$ virt-customize -a overcloud-realtime-compute.qcow2 -v --run rt.sh 2>&1 | tee virt-customize.log
```
Note
If you see the following line in the rt.sh script output, "grubby fatal error: unable to find a suitable template", you can ignore this error.

Examine the virt-customize.log file that resulted from the previous command, to check that the packages installed correctly using the rt.sh script .

(undercloud) [stack@undercloud-0 ~]$ cat virt-customize.log | grep Verifying

  Verifying  : kernel-3.10.0-957.el7.x86_64                                 1/1
  Verifying  : 10:qemu-kvm-tools-rhev-2.12.0-18.el7_6.1.x86_64              1/8
  Verifying  : tuned-profiles-realtime-2.10.0-6.el7_6.3.noarch              2/8
  Verifying  : linux-firmware-20180911-69.git85c5d90.el7.noarch             3/8
  Verifying  : tuned-profiles-nfv-host-2.10.0-6.el7_6.3.noarch              4/8
  Verifying  : kernel-rt-kvm-3.10.0-957.10.1.rt56.921.el7.x86_64            5/8
  Verifying  : tuna-0.13-6.el7.noarch                                       6/8
  Verifying  : kernel-rt-3.10.0-957.10.1.rt56.921.el7.x86_64                7/8
  Verifying  : rt-setup-2.0-6.el7.x86_64                                    8/8

Relabel SELinux:

(undercloud) [stack@undercloud-0 ~]$ virt-customize -a overcloud-realtime-compute.qcow2 --selinux-relabel

Extract vmlinuz and initrd:

(undercloud) [stack@undercloud-0 ~]$ mkdir image
(undercloud) [stack@undercloud-0 ~]$ guestmount -a overcloud-realtime-compute.qcow2 -i --ro image
(undercloud) [stack@undercloud-0 ~]$ cp image/boot/vmlinuz-3.10.0-862.rt56.804.el7.x86_64 ./overcloud-realtime-compute.vmlinuz
(undercloud) [stack@undercloud-0 ~]$ cp image/boot/initramfs-3.10.0-862.rt56.804.el7.x86_64.img ./overcloud-realtime-compute.initrd
(undercloud) [stack@undercloud-0 ~]$ guestunmount image

Note

The software version in the vmlinuz and initramfs filenames vary with the kernel version.

Upload the image:

(undercloud) [stack@undercloud-0 ~]$ openstack overcloud image upload --update-existing --os-image-name overcloud-realtime-compute.qcow2

You now have a real-time image you can use with the ComputeOvsDpdkRT composable role on your selected Compute nodes.

Modifying BIOS settings on RT-KVM Compute nodes

To reduce latency on your RT-KVM Compute nodes, disable all options for the following parameters in your Compute node BIOS settings:

Power Management
Hyper-Threading
CPU sleep states
Logical processors

See Setting BIOS parameters for descriptions of these settings and the impact of disabling them. See your hardware manufacturer documentation for complete details on how to change BIOS settings.

9.2.2. Configuring OVS-DPDK with RT-KVM

Note

You must determine the best values for the OVS-DPDK parameters that you set in the network-environment.yaml file to optimize your OpenStack network for OVS-DPDK. For more details, see Section 8.1, “Deriving DPDK parameters with workflows”.

9.2.2.1. Generating the ComputeOvsDpdk composable role

Use the ComputeOvsDpdkRT role to specify Compute nodes for the real-time compute image.

Generate roles_data.yaml for the ComputeOvsDpdkRT role.

# (undercloud) [stack@undercloud-0 ~]$ openstack overcloud roles generate -o roles_data.yaml Controller ComputeOvsDpdkRT

9.2.2.2. Configuring the OVS-DPDK parameters

Important

Determine the best values for the OVS-DPDK parameters in the network-environment.yaml file to optimize your deployment. For more information, see Section 8.1, “Deriving DPDK parameters with workflows”.

Add the NIC configuration for the OVS-DPDK role you use under resource_registry:

resource_registry:
  # Specify the relative/absolute path to the config files you want to use for override the default.
  OS::TripleO::ComputeOvsDpdkRT::Net::SoftwareConfig: nic-configs/compute-ovs-dpdk.yaml
  OS::TripleO::Controller::Net::SoftwareConfig: nic-configs/controller.yaml

Under parameter_defaults, set the OVS-DPDK, and RT-KVM parameters:

  # DPDK compute node.
  ComputeOvsDpdkRTParameters:
    KernelArgs: "default_hugepagesz=1GB hugepagesz=1G hugepages=32 iommu=pt intel_iommu=on isolcpus=1-7,17-23,9-15,25-31"
    TunedProfileName: "realtime-virtual-host"
    IsolCpusList: "1,2,3,4,5,6,7,9,10,17,18,19,20,21,22,23,11,12,13,14,15,25,26,27,28,29,30,31"
    NovaComputeCpuDedicatedSet: ['2,3,4,5,6,7,18,19,20,21,22,23,10,11,12,13,14,15,26,27,28,29,30,31']
    NovaReservedHostMemory: 4096
    OvsDpdkSocketMemory: "1024,1024"
    OvsDpdkMemoryChannels: "4"
    OvsPmdCoreList: "1,17,9,25"
    VhostuserSocketGroup: "hugetlbfs"
  ComputeOvsDpdkRTImage: "overcloud-realtime-compute"

9.2.2.3. Deploying the overcloud

Deploy the overcloud for ML2-OVS:

(undercloud) [stack@undercloud-0 ~]$ openstack overcloud deploy \
--templates \
-r /home/stack/ospd-16-vlan-dpdk-ctlplane-bonding-rt/roles_data.yaml \
-e /usr/share/openstack-tripleo-heat-templates/environments/network-isolation.yaml \
-e /usr/share/openstack-tripleo-heat-templates/environments/services/neutron-ovs.yaml \
-e /usr/share/openstack-tripleo-heat-templates/environments/services/neutron-ovs-dpdk.yaml \
-e /home/stack/ospd-16-vxlan-dpdk-data-bonding-rt-hybrid/containers-prepare-parameter.yaml \
-e /home/stack/ospd-16-vxlan-dpdk-data-bonding-rt-hybrid/network-environment.yaml

9.2.3. Launching an RT-KVM instance

Perform the following steps to launch an RT-KVM instance on a real-time enabled Compute node:

Create an RT-KVM flavor on the overcloud:

# openstack flavor create  r1.small 99 4096 20 4
# openstack flavor set --property hw:cpu_policy=dedicated 99
# openstack flavor set --property hw:cpu_realtime=yes 99
# openstack flavor set --property hw:mem_page_size=1GB 99
# openstack flavor set --property hw:cpu_realtime_mask="^0-1" 99
# openstack flavor set --property hw:cpu_emulator_threads=isolate 99

Launch an RT-KVM instance:

# openstack server create  --image <rhel> --flavor r1.small --nic net-id=<dpdk-net> test-rt

To verify that the instance uses the assigned emulator threads, run the following command:

# virsh dumpxml <instance-id> | grep vcpu -A1
<vcpu placement='static'>4</vcpu>
<cputune>
  <vcpupin vcpu='0' cpuset='1'/>
  <vcpupin vcpu='1' cpuset='3'/>
  <vcpupin vcpu='2' cpuset='5'/>
  <vcpupin vcpu='3' cpuset='7'/>
  <emulatorpin cpuset='0-1'/>
  <vcpusched vcpus='2-3' scheduler='fifo'
  priority='1'/>
</cputune>

9.3. Trusted Virtual Functions

You can configure trust between physical functions (PFs) and virtual functions (VFs), so that VFs can perform privileged actions, such as enabling promiscuous mode, or modifying a hardware address.

9.3.1. Configuring trust between virtual and physical functions

Prerequisites

An operational installation of Red Hat OpenStack Platform including director

Procedure

Complete the following steps to configure and deploy the overcloud with trust between physical and virtual functions:

Add the NeutronPhysicalDevMappings parameter in the parameter_defaults section to link between the logical network name and the physical interface.
```
parameter_defaults:
  NeutronPhysicalDevMappings:
    - sriov2:p5p2
```

Add the new property, trusted, to the SR-IOV parameters.

parameter_defaults:
  NeutronPhysicalDevMappings:
    - sriov2:p5p2
  NovaPCIPassthrough:
    - vendor_id: "8086"
      product_id: "1572"
      physical_network: "sriov2"
      trusted: "true"

Note

You must include double quotation marks around the value "true".

9.3.2. Utilizing trusted VF networks

Create a network of type vlan.

openstack network create trusted_vf_network  --provider-network-type vlan \
 --provider-segment 111 --provider-physical-network sriov2 \
 --external --disable-port-security

Create a subnet.

openstack subnet create --network trusted_vf_network \
  --ip-version 4 --subnet-range 192.168.111.0/24 --no-dhcp \
 subnet-trusted_vf_network

Create a port. Set the vnic-type option to direct, and the binding-profile option to true.

openstack port create --network sriov111 \
--vnic-type direct --binding-profile trusted=true \
sriov111_port_trusted

Create an instance, and bind it to the previously-created trusted port.

openstack server create --image rhel --flavor dpdk  --network internal --port trusted_vf_network_port_trusted --config-drive True --wait rhel-dpdk-sriov_trusted

Verify the trusted VF configuration on the hypervisor

On the compute node that you created the instance, enter the following command:

# ip link
7: p5p2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc mq state UP mode DEFAULT group default qlen 1000
    link/ether b4:96:91:1c:40:fa brd ff:ff:ff:ff:ff:ff
    vf 6 MAC fa:16:3e:b8:91:c2, vlan 111, spoof checking off, link-state auto, trust on, query_rss off
    vf 7 MAC fa:16:3e:84:cf:c8, vlan 111, spoof checking off, link-state auto, trust off, query_rss off

Verify that the trust status of the VF is trust on. The example output contains details of an environment that contains two ports. Note that vf 6 contains the text trust on.
You can disable spoof checking if you set port_security_enabled: false in the Networking service (neutron) network, or if you include the argument --disable-port-security when you run the openstack port create command.

9.4. Configuring RX/TX queue size

You can experience packet loss at high packet rates above 3.5 million packets per second (mpps) for many reasons, such as:

a network interrupt
a SMI
packet processing latency in the Virtual Network Function

To prevent packet loss, increase the queue size from the default of 512 to a maximum of 1024.

Prerequisites

To configure RX, ensure that you have libvirt v2.3 and QEMU v2.7.
To configure TX, ensure that you have libvirt v3.7 and QEMU v2.10.

Procedure

To increase the RX and TX queue size, include the following lines to the parameter_defaults: section of a relevant director role. Here is an example with ComputeOvsDpdk role:
```
parameter_defaults:
  ComputeOvsDpdkParameters:
    -NovaLibvirtRxQueueSize: 1024
    -NovaLibvirtTxQueueSize: 1024
```

Testing

You can observe the values for RX queue size and TX queue size in the nova.conf file:
```
[libvirt]
rx_queue_size=1024
tx_queue_size=1024
```

You can check the values for RX queue size and TX queue size in the VM instance XML file generated by libvirt on the compute host.

<devices>
   <interface type='vhostuser'>
     <mac address='56:48:4f:4d:5e:6f'/>
     <source type='unix' path='/tmp/vhost-user1' mode='server'/>
     <model type='virtio'/>
     <driver name='vhost' rx_queue_size='1024'   tx_queue_size='1024' />
     <address type='pci' domain='0x0000' bus='0x00' slot='0x10' function='0x0'/>
   </interface>
</devices>

To verify the values for RX queue size and TX queue size, use the following command on a KVM host:

$ virsh dumpxml <vm name> | grep queue_size

You can check for improved performance, such as 3.8 mpps/core at 0 frame loss.

9.5. Configuring a NUMA-aware vSwitch

Important

This feature is available in this release as a Technology Preview, and therefore is not fully supported by Red Hat. It should only be used for testing, and should not be deployed in a production environment. For more information about Technology Preview features, see Scope of Coverage Details.

Before you implement a NUMA-aware vSwitch, examine the following components of your hardware configuration:

The number of physical networks.
The placement of PCI cards.
The physical architecture of the servers.

Memory-mapped I/O (MMIO) devices, such as PCIe NICs, are associated with specific NUMA nodes. When a VM and the NIC are on different NUMA nodes, there is a significant decrease in performance. To increase performance, align PCIe NIC placement and instance processing on the same NUMA node.

Use this feature to ensure that instances that share a physical network are located on the same NUMA node. To optimize utilization of datacenter hardware, you must use multiple physnets.

Warning

To configure NUMA-aware networks for optimal server utilization, you must understand the mapping of the PCIe slot and the NUMA node. For detailed information on your specific hardware, refer to your vendor’s documentation. If you fail to plan or implement your NUMA-aware vSwitch correctly, you can cause the servers to use only a single NUMA node.

To prevent a cross-NUMA configuration, place the VM on the correct NUMA node, by providing the location of the NIC to Nova.

Prerequisites

You have enabled the filter NUMATopologyFilter

Procedure

Set a new NeutronPhysnetNUMANodesMapping parameter to map the physical network to the NUMA node that you associate with the physical network.

If you use tunnels, such as VxLAN or GRE, you must also set the NeutronTunnelNUMANodes parameter.

parameter_defaults:
  NeutronPhysnetNUMANodesMapping: {<physnet_name>: [<NUMA_NODE>]}
  NeutronTunnelNUMANodes: <NUMA_NODE>,<NUMA_NODE>

Here is an example with two physical networks tunneled to NUMA node 0:

one project network associated with NUMA node 0

one management network without any affinity

parameter_defaults:
  NeutronBridgeMappings:
    - tenant:br-link0
  NeutronPhysnetNUMANodesMapping: {tenant: [1], mgmt: [0,1]}
  NeutronTunnelNUMANodes: 0

In the below example, assign the physnet of the device named eno2 to NUMA number 0.

# ethtool -i eno2
bus-info: 0000:18:00.1

# cat /sys/devices/pci0000:16/0000:16:02.0/0000:18:00.1/numa_node
0

Observe the physnet settings in the below example heat template.

NeutronBridgeMappings: 'physnet1:br-physnet1'
NeutronPhysnetNUMANodesMapping: {physnet1: [0] }

- type: ovs_user_bridge
                name: br-physnet1
                mtu: 9000
                members:
                  - type: ovs_dpdk_port
                    name: dpdk2
                    members:
                      - type: interface
                        name: eno2

Testing NUMA-aware vSwitch

Observe the configuration in the file /var/lib/config-data/puppet-generated/nova_libvirt/etc/nova/nova.conf
```
[neutron_physnet_tenant]
numa_nodes=1
[neutron_tunnel]
numa_nodes=1
```
Confirm the new configuration with the lscpu command:
```
$ lscpu
```
Launch a VM, with the NIC attached to the appropriate network

Known Limitations

You cannot start a VM that has two NICs connected to physnets on different NUMA nodes, if you did not specify a two-node guest NUMA topology.
You cannot start a VM that has one NIC connected to a physnet and another NIC connected to a tunneled network on different NUMA nodes, if you did not specify a two-node guest NUMA topology.
You cannot start a VM that has one vhost port and one VF on different NUMA nodes, if you did not specify a two-node guest NUMA topology.

NUMA-aware vSwitch parameters are specific to overcloud roles. For example, Compute node 1 and Compute node 2 can have different NUMA topologies.
If the interfaces of a VM have NUMA affinity, ensure that the affinity is for a single NUMA node only. You can locate any interface without NUMA affinity on any NUMA node.
Configure NUMA affinity for data plane networks, not management networks.
NUMA affinity for tunneled networks is a global setting that applies to all VMs.

9.6. Configuring Quality of Service (QoS) in an NFVi environment

For details on configuring QoS, see Configuring Quality of Service (QoS) policies. Support is limited to the following QoS rule types:

minimum bandwidth on SR-IOV, if supported by vendor.
bandwidth limit on SR-IOV and OVS-DPDK egress interfaces.

9.7. Deploying an overcloud with HCI and DPDK

You can deploy your NFV infrastructure with hyper-converged nodes, by co-locating and configuring Compute and Ceph Storage services for optimized resource usage.

For more information about hyper-converged infrastructure (HCI), see: Hyper Converged Infrastructure Guide

Prerequisites

Red Hat OpenStack Platform 16.1.
The latest version of Red Hat Ceph Storage 4.
The latest version of ceph-ansible 4, as provided by the rhceph-4-tools-for-rhel-8-x86_64-rpms repository.

Procedure

Install ceph-ansible on the undercloud.
```
$ sudo yum install ceph-ansible -y
```

Generate the roles_data.yaml file for the ComputeHCI role.

$ openstack overcloud roles generate -o ~/<templates>/roles_data.yaml Controller \
 ComputeHCIOvsDpdk

Create and configure a new flavor with the openstack flavor create and openstack flavor set commands. For more information about creating a flavor, see Creating a new role in the Advanced Overcloud Customization Guide.

Deploy the overcloud with the custom roles_data.yaml file that you generated.

# time openstack overcloud deploy --templates \
 --timeout 360 \
 -r ~/<templates>/roles_data.yaml \
 -e /usr/share/openstack-tripleo-heat-templates/environments/ceph-ansible/ceph-ansible.yaml \
 -e /usr/share/openstack-tripleo-heat-templates/environments/network-isolation.yaml \
 -e /usr/share/openstack-tripleo-heat-templates/environments/services-docker/neutron-ovs-dpdk.yaml \
 -e ~/<templates>/<custom environment file>

9.7.1. Example NUMA node configuration

For increased performance, place the tenant network and Ceph object service daemon (OSD)s in one NUMA node, such as NUMA-0, and the VNF and any non-NFV VMs in another NUMA node, such as NUMA-1.

CPU allocation:

NUMA-0	NUMA-1
Number of Ceph OSDs * 4 HT	Guest vCPU for the VNF and non-NFV VMs
DPDK lcore - 2 HT	DPDK lcore - 2 HT
DPDK PMD - 2 HT	DPDK PMD - 2 HT

Example of CPU allocation:

	NUMA-0	NUMA-1
Ceph OSD	32,34,36,38,40,42,76,78,80,82,84,86
DPDK-lcore	0,44	1,45
DPDK-pmd	2,46	3,47
nova		5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39,41,43,49,51,53,55,57,59,61,63,65,67,69,71,73,75,77,79,81,83,85,87

9.7.2. Example ceph configuration file

parameter_defaults:
  CephPoolDefaultSize: 3
  CephPoolDefaultPgNum: 64
  CephPools:
    - {"name": backups, "pg_num": 128, "pgp_num": 128, "application": "rbd"}
    - {"name": volumes, "pg_num": 256, "pgp_num": 256, "application": "rbd"}
    - {"name": vms, "pg_num": 64, "pgp_num": 64, "application": "rbd"}
    - {"name": images, "pg_num": 32, "pgp_num": 32, "application": "rbd"}
  CephConfigOverrides:
    osd_recovery_op_priority: 3
    osd_recovery_max_active: 3
    osd_max_backfills: 1
  CephAnsibleExtraConfig:
    nb_retry_wait_osd_up: 60
    delay_wait_osd_up: 20
    is_hci: true
    # 3 OSDs * 4 vCPUs per SSD = 12 vCPUs (list below not used for VNF)
    ceph_osd_docker_cpuset_cpus: "32,34,36,38,40,42,76,78,80,82,84,86" # 1
    # cpu_limit 0 means no limit as we are limiting CPUs with cpuset above
    ceph_osd_docker_cpu_limit: 0                                       # 2
    # numactl preferred to cross the numa boundary if we have to
    # but try to only use memory from numa node0
    # cpuset-mems would not let it cross numa boundary
    # lots of memory so NUMA boundary crossing unlikely
    ceph_osd_numactl_opts: "-N 0 --preferred=0"                        # 3
  CephAnsibleDisksConfig:
    osds_per_device: 1
    osd_scenario: lvm
    osd_objectstore: bluestore
    devices:
      - /dev/sda
      - /dev/sdb
      - /dev/sdc

Assign CPU resources for ceph OSD processes with the following parameters. Adjust the values based on the workload and hardware in this hyperconverged environment.

1: ceph_osd_docker_cpuset_cpus: Allocate 4 CPU threads for each OSD for SSD disks, or 1 CPU for each OSD for HDD disks. Include the list of cores and sibling threads from the NUMA node associated with ceph, and the CPUs not found in the three lists: NovaComputeCpuDedicatedSet, and OvsPmdCoreList.
2: ceph_osd_docker_cpu_limit: Set this value to 0, to pin the ceph OSDs to the CPU list from ceph_osd_docker_cpuset_cpus.
3: ceph_osd_numactl_opts: Set this value to preferred for cross-NUMA operations, as a precaution.

9.7.3. Example DPDK configuration file

parameter_defaults:
  ComputeHCIParameters:
    KernelArgs: "default_hugepagesz=1GB hugepagesz=1G hugepages=240 intel_iommu=on iommu=pt                                           # 1
      isolcpus=2,46,3,47,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39,41,43,49,51,53,55,57,59,61,63,65,67,69,71,73,75,77,79,81,83,85,87"
    TunedProfileName: "cpu-partitioning"
    IsolCpusList:                                               # 2
      ”2,46,3,47,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39,41,43,49,51,
      53,55,57,59,61,63,65,67,69,71,73,75,77,79,81,83,85,87"
    VhostuserSocketGroup: hugetlbfs
    OvsDpdkSocketMemory: "4096,4096"                            # 3
    OvsDpdkMemoryChannels: "4"

    OvsPmdCoreList: "2,46,3,47"                                 # 4
    NumDpdkInterfaceRxQueues: 1

1: KernelArgs: To calculate hugepages, subtract the value of the NovaReservedHostMemory parameter from total memory.
2: IsolCpusList: Assign a set of CPU cores that you want to isolate from the host processes with this parameter. Add the value of the OvsPmdCoreList parameter to the value of the NovaComputeCpuDedicatedSet parameter to calculate the value for the IsolCpusList parameter.
3: OvsDpdkSocketMemory: Specify the amount of memory in MB to pre-allocate from the hugepage pool per NUMA node with the OvsDpdkSocketMemory parameter. For more information about calculating OVS-DPDK parameters, see: ovsdpdk parameters
4: OvsPmdCoreList: Specify the CPU cores that are used for the DPDK poll mode drivers (PMD) with this parameter. Choose CPU cores that are associated with the local NUMA nodes of the DPDK interfaces. Allocate 2 HT sibling threads for each NUMA node to calculate the value for the OvsPmdCoreList parameter.

9.7.4. Example nova configuration file

parameter_defaults:
  ComputeHCIExtraConfig:
    nova::cpu_allocation_ratio: 16 # 2
    NovaReservedHugePages:                                         # 1
        - node:0,size:1GB,count:4
        - node:1,size:1GB,count:4
  NovaReservedHostMemory: 123904                                   # 2
  # All left over cpus from NUMA-1
  NovaComputeCpuDedicatedSet:                                                  # 3
  ['5','7','9','11','13','15','17','19','21','23','25','27','29','31','33','35','37','39','41','43','49','51','|
  53','55','57','59','61','63','65','67','69','71','73','75','77','79','81','83','85','87

1

NovaReservedHugePages: Pre-allocate memory in MB from the hugepage pool with the NovaReservedHugePages parameter. It is the same memory total as the value for the OvsDpdkSocketMemory parameter.

2

NovaReservedHostMemory: Reserve memory in MB for tasks on the host with the NovaReservedHostMemory parameter. Use the following guidelines to calculate the amount of memory that you must reserve:

5 GB for each OSD.
0.5 GB overhead for each VM.
4GB for general host processing. Ensure that you allocate sufficient memory to prevent potential performance degradation caused by cross-NUMA OSD operation.

3

NovaComputeCpuDedicatedSet: List the CPUs not found in OvsPmdCoreList, or Ceph_osd_docker_cpuset_cpus with the NovaComputeCpuDedicatedSet parameter. The CPUs must be in the same NUMA node as the DPDK NICs.

9.7.5. Recommended configuration for HCI-DPDK deployments

Table 9.1. Tunable parameters for HCI deployments
Block Device Type	OSDs, Memory, vCPUs per device
NVMe	Memory : 5GB per OSD OSDs per device: 4 vCPUs per device: 3
SSD	Memory : 5GB per OSD OSDs per device: 1 vCPUs per device: 4
HDD	Memory : 5GB per OSD OSDs per device: 1 vCPUs per device: 1

Use the same NUMA node for the following functions:

Disk controller
Storage networks
Storage CPU and memory

Allocate another NUMA node for the following functions of the DPDK provider network:

NIC
PMD CPUs
Socket memory

Chapter 9. Tuning a Red Hat OpenStack Platform environment

9.1. Pinning emulator threads

9.1.1. Configuring CPUs to host emulator threads

Procedure

9.1.2. Verify the emulator thread pinning

Procedure

9.2. Enabling RT-KVM for NFV Workloads

9.2.1. Planning for your RT-KVM Compute nodes

9.2.2. Configuring OVS-DPDK with RT-KVM

9.2.2.1. Generating the ComputeOvsDpdk composable role

9.2.2.2. Configuring the OVS-DPDK parameters

9.2.2.3. Deploying the overcloud

9.2.3. Launching an RT-KVM instance

9.3. Trusted Virtual Functions

9.3.1. Configuring trust between virtual and physical functions

Prerequisites

Procedure

9.3.2. Utilizing trusted VF networks

9.4. Configuring RX/TX queue size

Prerequisites

Procedure

Testing

9.5. Configuring a NUMA-aware vSwitch

Prerequisites

Procedure

Testing NUMA-aware vSwitch

Known Limitations

9.6. Configuring Quality of Service (QoS) in an NFVi environment

9.7. Deploying an overcloud with HCI and DPDK

Prerequisites

Procedure

9.7.1. Example NUMA node configuration

CPU allocation:

Example of CPU allocation:

9.7.2. Example ceph configuration file

9.7.3. Example DPDK configuration file

9.7.4. Example nova configuration file

9.7.5. Recommended configuration for HCI-DPDK deployments

Learn

Try, buy, & sell

Communities

About Red Hat Documentation

Making open source more inclusive

About Red Hat

Red Hat legal and privacy links

Red Hat legal and privacy links