You experience packet loss in Red Hat OpenStack Platform {vernum} NFV environment, and have not configured emulator thread pinning.
Note
In Red Hat OpenStack Platform 10, customers need a support exception to pin emulator threads. However, pinning emulator threads is strongly recommended by Red Hat in almost all NFV cases. You can significantly increase performance by changing the default emulator thread configuration. Open a ticket with Red Hat support and request a support exception if needed.
Emulator threads handle interrupt requests and non-blocking processes for virtual machine hardware emulation. Threads not running vCPUs are qemu-kvm emulator threads. See the following example.
Copy to ClipboardCopied!Toggle word wrapToggle overflow
Due to the Linux CFS (completely fair scheduler), emulator threads normally move periodically from one pCPU to another, within the defined in libvirt’s emulator pin set.
In NFV contexts, you might experience issues if you configure emulator threads when using the isolcpus parameter, because this kernel configuration disables the CFS scheduling on those CPUs. If you are not using the isolcpus parameter, you can experience packet loss when the emulator threads interrupt CPUs that are processing packets.
Examples of emulator threads include:
qemu-kvm threads
vnc_worker threads
vhost-<qemu-kvm PID> kernel threads (When virtio-net is used (kernel networking on the hypervisor)
By default, nova will configure an emulator thread pin set which spans the pCPUs assigned to all vCPUs. If you are not using the isolcpus parameter, then emulator threads can be scheduled on any pCPU, and will periodically move from one pCPU to another.
In Red Hat OpenStack Platforms 10, there is no officially supported way to pin emulator threads. Temporarily, you can move emulator threads to a set of pCPUs by using virsh emulatorpin (…) --live, as shown in the following example.
to pin emulator threads of instance instance-0000001d to CPU 34
virsh emulatorpin instance-0000001d 34 -live
to pin emulator threads of instance instance-0000001d to CPUs 32,34
virsh emulatorpin instance-0000001d 32,34 --live
# to pin emulator threads of instance instance-0000001d to CPU 34
virsh emulatorpin instance-0000001d 34 -live
# to pin emulator threads of instance instance-0000001d to CPUs 32,34
virsh emulatorpin instance-0000001d 32,34 --live
Copy to ClipboardCopied!Toggle word wrapToggle overflow
These changes only last for the runtime of the instance. Permanent modifications require an external mechanism, such as a cron job, bash script or Ansible task. This has to be the subject of a support exception.
When isolcpus is used, CFS scheduler is disabled and all emulator threads will run on the first available, lowest indexed pCPU. As a consequence, without intervention or further configuration, one vCPU of the instance runs a high risk for resource contention with the emulator threads.
If DPDK runs within the instance, packet processing is done entirely in the user space. Do not schedule PMDs to run on vCPU0, as this should remain for the OS and interrupt handling. Because PMD CPUs within the instance run an active loop and need 100% of the CPU, they should not be preempted. Packet loss can occur if one of these vCPUs is preempted. Thus, the emulatorpin cpuset needs to be configured in such a way that it does not overlap with the physical CPUs that handle the virtual CPUs numbered 1 and above.
With DPDK networking within the instance, the optimal location for emulator threads is either the pCPU that is handling vCPU 0 or a dedicated physical CPU that is not handling any virtual CPUs at all.
If OVS-DPDK is used on the hypervisor and DPDK within the instance, place the emulator thread on vCPU 0.
If kernel space networking is used on the hypervisor, then packet processing on the hypervisor is executed within the kernel.
With DPDK networking within the instance, the optimal location for emulator threads is either the pCPU that is handling vCPU 0, or a dedicated physical CPU that is not handling any virtual CPUs.
Note that in this scenario, packet processing for the vNIC queues is executed within vhost-<qemu-kvm PID> kernel threads of the hypervisor. Under high traffic, these kernel threads can generate a significant CPU load. The optimal location of the emulator threads needs to be determined on a case-by-case basis.
ps aux | grep vhost-
root 364948 0.0 0.0 0 0 ? S 20:32 0:00 [vhost-364936]
root 364949 0.0 0.0 0 0 ? S 20:32 0:00 [vhost-364936]
root 364950 0.0 0.0 0 0 ? S 20:32 0:00 [vhost-364936]
[root@overcloud-compute-0 ~]# ps aux | grep vhost-
root 364948 0.0 0.0 0 0 ? S 20:32 0:00 [vhost-364936]
root 364949 0.0 0.0 0 0 ? S 20:32 0:00 [vhost-364936]
root 364950 0.0 0.0 0 0 ? S 20:32 0:00 [vhost-364936]
Copy to ClipboardCopied!Toggle word wrapToggle overflow
With kernel networking within the instance, there are two options:
Optimize the interrupt distribution, for example, softirqs within the instance. In such a case, you do not have to allocate an additional pCPU for emulator threads and can assign the emulator threads to a pCPU that is not handling any network interrupts.
Use a dedicated pCPU on the same NUMA node for emulator threads.
Due to the complexity of the first option, the second option is recommended.
Copy to ClipboardCopied!Toggle word wrapToggle overflow
The emulator threads can be moved by using virsh emulatorpin:
virsh emulatorpin instance-0000001d 34
virsh emulatorpin instance-0000001d 34
Copy to ClipboardCopied!Toggle word wrapToggle overflow
Note that the affinity for all non-CPU threads changes.
ps -T -p 73517
PID SPID TTY TIME CMD
73517 73517 ? 00:00:00 qemu-kvm
73517 73527 ? 00:00:00 qemu-kvm
73517 73535 ? 00:00:06 CPU 0/KVM
73517 73536 ? 00:00:02 CPU 1/KVM
73517 73537 ? 00:00:03 CPU 2/KVM
73517 73538 ? 00:00:02 CPU 3/KVM
73517 73540 ? 00:00:00 vnc_worker
taskset -apc 73517
pid 73517's current affinity list: 34
pid 73527's current affinity list: 34
pid 73535's current affinity list: 34
pid 73536's current affinity list: 14
pid 73537's current affinity list: 10
pid 73538's current affinity list: 30
pid 73540's current affinity list: 34
[root@overcloud-compute-0 ~]# ps -T -p 73517
PID SPID TTY TIME CMD
73517 73517 ? 00:00:00 qemu-kvm
73517 73527 ? 00:00:00 qemu-kvm
73517 73535 ? 00:00:06 CPU 0/KVM
73517 73536 ? 00:00:02 CPU 1/KVM
73517 73537 ? 00:00:03 CPU 2/KVM
73517 73538 ? 00:00:02 CPU 3/KVM
73517 73540 ? 00:00:00 vnc_worker
[root@overcloud-compute-0 ~]# taskset -apc 73517
pid 73517's current affinity list: 34
pid 73527's current affinity list: 34
pid 73535's current affinity list: 34
pid 73536's current affinity list: 14
pid 73537's current affinity list: 10
pid 73538's current affinity list: 30
pid 73540's current affinity list: 34
Copy to ClipboardCopied!Toggle word wrapToggle overflow
Note the number of switches in the historic data in /proc/sched_debug. In the following example, PID 73517 already moved to cpu#34. The other emulator workers did not run since the last output, and therefore still show on cpu#10:
Copy to ClipboardCopied!Toggle word wrapToggle overflow
Note how thread 73517 moves to cpu#34. If you now interact with a VNC session, you can see that /proc/sched_debug shows the vnc_worker threads on cpu#34 as well.