Chapter 13. Using virsh emulatorpin in virtual environments with NFV
Use this procedure to determine the impact of using virsh emulatorpin in Red Hat OpenStack Platform with NFV.
13.1. Symptom Copy linkLink copied to clipboard!
You experience packet loss in Red Hat OpenStack Platform {vernum} NFV environment, and have not configured emulator thread pinning.
In Red Hat OpenStack Platform 10, customers need a support exception to pin emulator threads. However, pinning emulator threads is strongly recommended by Red Hat in almost all NFV cases. You can significantly increase performance by changing the default emulator thread configuration. Open a ticket with Red Hat support and request a support exception if needed.
13.2. Solution Copy linkLink copied to clipboard!
Use this section to investigate and configure emulator thread pinning.
13.2.1. qemu-kvm Emulator Threads Copy linkLink copied to clipboard!
Emulator threads handle interrupt requests and non-blocking processes for virtual machine hardware emulation. Threads not running vCPUs are qemu-kvm
emulator threads. See the following example.
Due to the Linux CFS (completely fair scheduler), emulator threads normally move periodically from one pCPU to another, within the defined in libvirt’s emulator pin set.
In NFV contexts, you might experience issues if you configure emulator threads when using the isolcpus
parameter, because this kernel configuration disables the CFS scheduling on those CPUs. If you are not using the isolcpus parameter
, you can experience packet loss when the emulator threads interrupt CPUs that are processing packets.
Examples of emulator threads include:
- qemu-kvm threads
- vnc_worker threads
- vhost-<qemu-kvm PID> kernel threads (When virtio-net is used (kernel networking on the hypervisor)
13.2.2. Default Behavior for Emulator Thread Pinning Copy linkLink copied to clipboard!
By default, nova will configure an emulator thread pin set which spans the pCPUs assigned to all vCPUs. If you are not using the isolcpus
parameter, then emulator threads can be scheduled on any pCPU, and will periodically move from one pCPU to another.
Therefore any of these CPUs can be preempted by qemu’s emulator threads, risking packet drops.
13.2.3. The Current Implementation for Emulator Thread Pinning in OpenStack nova (OpenStack Platform 10) Copy linkLink copied to clipboard!
In Red Hat OpenStack Platforms 10, there is no officially supported way to pin emulator threads. Temporarily, you can move emulator threads to a set of pCPUs by using virsh emulatorpin (…) --live
, as shown in the following example.
to pin emulator threads of instance instance-0000001d to CPU 34 to pin emulator threads of instance instance-0000001d to CPUs 32,34
# to pin emulator threads of instance instance-0000001d to CPU 34
virsh emulatorpin instance-0000001d 34 -live
# to pin emulator threads of instance instance-0000001d to CPUs 32,34
virsh emulatorpin instance-0000001d 32,34 --live
These changes only last for the runtime of the instance. Permanent modifications require an external mechanism, such as a cron job, bash script or Ansible task. This has to be the subject of a support exception.
13.2.4. About the Impact of isolcpus on Emulator Thread Scheduling Copy linkLink copied to clipboard!
When isolcpus is used, CFS scheduler is disabled and all emulator threads will run on the first available, lowest indexed pCPU. As a consequence, without intervention or further configuration, one vCPU of the instance runs a high risk for resource contention with the emulator threads.
Further details can be found at Kernel.org Bugzilla – Bug 116701.
Use the following algorithm to determine which vCPU the emulator threads are using:
PCPU=MIN([EMULATORPINSET]) VCPU=REVERSE_CPUSET(PCPU) REVERSE_CPUSET := SELECT pcpu from `virsh dumpxml <instance name> | grep "cpuset=$PCPU"`
PCPU=MIN([EMULATORPINSET])
VCPU=REVERSE_CPUSET(PCPU)
REVERSE_CPUSET := SELECT pcpu from `virsh dumpxml <instance name> | grep "cpuset=$PCPU"`
For example, in this instance, all emulator threads and children inherit affinity 1-3 from the default emulator pin set:
In combination with isolcpus, all emulator threads and the vhost-* threads execute on pCPU1 and are never rescheduled:
13.2.5. Optimal Location of Emulator Threads Copy linkLink copied to clipboard!
This section provides descriptions for placing emulator threads on the following networks:
- DPDK networking within the instance and netdev datapath in Open vSwitch
- DPDK networking within the instance, system datapath in Open vSwitch and kernel space networking on the hypervisor
- Kernel networking within the instance and netdev datapath in Open vSwitch
13.2.5.1. Optimal Placement of Emulator Threads with DPDK Networking Within the Instance and netdev datapath in Open vSwitch Copy linkLink copied to clipboard!
If DPDK runs within the instance, packet processing is done entirely in the user space. Do not schedule PMDs to run on vCPU0, as this should remain for the OS and interrupt handling. Because PMD CPUs within the instance run an active loop and need 100% of the CPU, they should not be preempted. Packet loss can occur if one of these vCPUs is preempted. Thus, the emulatorpin cpuset needs to be configured in such a way that it does not overlap with the physical CPUs that handle the virtual CPUs numbered 1 and above.
With DPDK networking within the instance, the optimal location for emulator threads is either the pCPU that is handling vCPU 0 or a dedicated physical CPU that is not handling any virtual CPUs at all.
If OVS-DPDK is used on the hypervisor and DPDK within the instance, place the emulator thread on vCPU 0.
13.2.5.2. Optimal Placement of Emulator Threads with DPDK Networking Within the Instance and System datapath in Open vSwitch Copy linkLink copied to clipboard!
If kernel space networking is used on the hypervisor, then packet processing on the hypervisor is executed within the kernel.
With DPDK networking within the instance, the optimal location for emulator threads is either the pCPU that is handling vCPU 0, or a dedicated physical CPU that is not handling any virtual CPUs.
Note that in this scenario, packet processing for the vNIC queues is executed within vhost-<qemu-kvm PID>
kernel threads of the hypervisor. Under high traffic, these kernel threads can generate a significant CPU load. The optimal location of the emulator threads needs to be determined on a case-by-case basis.
ps aux | grep vhost-
[root@overcloud-compute-0 ~]# ps aux | grep vhost-
root 364948 0.0 0.0 0 0 ? S 20:32 0:00 [vhost-364936]
root 364949 0.0 0.0 0 0 ? S 20:32 0:00 [vhost-364936]
root 364950 0.0 0.0 0 0 ? S 20:32 0:00 [vhost-364936]
13.2.5.3. Optimal Placement of Emulator Threads with Kernel Networking within the Instance and netdev datapath in Open vSwitch Copy linkLink copied to clipboard!
With kernel networking within the instance, there are two options:
- Optimize the interrupt distribution, for example, softirqs within the instance. In such a case, you do not have to allocate an additional pCPU for emulator threads and can assign the emulator threads to a pCPU that is not handling any network interrupts.
- Use a dedicated pCPU on the same NUMA node for emulator threads.
Due to the complexity of the first option, the second option is recommended.
13.3. Diagnosis Copy linkLink copied to clipboard!
13.3.1. The Demonstration Environment Copy linkLink copied to clipboard!
The demonstration environment runs one instance: instance-0000001d
. Its associated qemu-kvm thread has the following PID:
pidof qemu-kvm
[root@overcloud-compute-0 ~]# pidof qemu-kvm
73517
13.3.2. How Emulatorpin works Copy linkLink copied to clipboard!
By default, a Red Hat OpenStack Platform deployment uses the following settings:
This leads to an unpredictable allocation of the emulator threads, such as qemu-kvm, vnc_worker, and so on:
The emulator threads can be moved by using virsh emulatorpin:
virsh emulatorpin instance-0000001d 34
virsh emulatorpin instance-0000001d 34
Note that the affinity for all non-CPU threads changes.
Note the number of switches in the historic data in /proc/sched_debug
. In the following example, PID 73517 already moved to cpu#34. The other emulator workers did not run since the last output, and therefore still show on cpu#10:
Note how thread 73517 moves to cpu#34. If you now interact with a VNC session, you can see that /proc/sched_debug shows the vnc_worker threads on cpu#34 as well.