Chapter 17. Optimizing virtual machine performance
Virtual machines (VMs) always experience some degree of performance deterioration in comparison to the host. However, you can use a variety of methods to minimize the performance impact of virtualization in RHEL 10, so that your hardware infrastructure resources can be used as efficiently as possible.
17.1. What influences virtual machine performance Copy linkLink copied to clipboard!
Virtual machines (VMs) run as user-space processes on the host. The hypervisor needs to convert the host’s system resources so that the VMs can use them. As a consequence, a portion of the resources is consumed by the conversion, and the VM cannot achieve the same performance efficiency as the host.
17.1.1. The impact of virtualization on system performance Copy linkLink copied to clipboard!
More specific reasons for VM performance loss include:
- Virtual CPUs (vCPUs) are implemented as threads on the host, handled by the Linux scheduler.
- VMs do not automatically inherit optimization features, such as NUMA or huge pages, from the host kernel.
- Disk and network I/O settings of the host might have a significant performance impact on the VM.
- Network traffic typically travels to a VM through a software-based bridge.
- Depending on the host devices and their models, there might be significant overhead due to emulation of particular hardware.
The severity of the virtualization impact on the VM performance is influenced by a variety of factors, which include:
- The number of concurrently running VMs.
- The amount of virtual devices used by each VM.
- The device types used by the VMs.
17.1.2. Reducing VM performance loss Copy linkLink copied to clipboard!
RHEL 10 provides a number of features you can use to reduce the negative performance effects of virtualization. Notably:
-
The
TuneDservice can automatically optimize the resource distribution and performance of your VMs. - Block I/O tuning can improve the performances of the VM’s block devices, such as disks.
- NUMA tuning can increase vCPU performance.
- Virtual networking can be optimized in various ways.
Tuning VM performance can have negative effects on other virtualization functions. For example, it can make migrating the modified VM more difficult.
17.2. Optimizing virtual machine performance by using TuneD Copy linkLink copied to clipboard!
For an automated method of optimizing the performance of your virtual machines (VMs), you can use the TuneD utility.
TuneD is a tuning profile delivery mechanism that adapts RHEL for certain workload characteristics, such as requirements for CPU-intensive tasks or storage-network throughput responsiveness. It provides a number of tuning profiles that are pre-configured to enhance performance and reduce power consumption in a number of specific use cases. You can edit these profiles or create new profiles to create performance solutions tailored to your environment, including virtualized environments.
To optimize RHEL 10 for virtualization, use the following profiles:
-
For RHEL 10 virtual machines, use the virtual-guest profile. It is based on the generally applicable
throughput-performanceprofile, but also decreases the swappiness of virtual memory. - For RHEL 10 virtualization hosts, use the virtual-host profile. This enables more aggressive writeback of dirty memory pages, which benefits the host performance.
Prerequisites
-
The
TuneDservice is installed and enabled.
Procedure
List the available
TuneDprofiles.# tuned-adm list Available profiles: - balanced - General non-specialized TuneD profile - desktop - Optimize for the desktop use-case [...] - virtual-guest - Optimize for running inside a virtual guest - virtual-host - Optimize for running KVM guests Current active profile: balancedOptional: Create a new
TuneDprofile or edit an existingTuneDprofile.For more information, see Managing TuneD profiles.
Activate a
TuneDprofile.# tuned-adm profile selected-profileTo optimize a virtualization host, use the virtual-host profile.
# tuned-adm profile virtual-hostOn a RHEL guest operating system, use the virtual-guest profile.
# tuned-adm profile virtual-guest
Verification
Display the active profile for
TuneD.# tuned-adm active Current active profile: virtual-hostEnsure that the
TuneDprofile settings have been applied on your system.# tuned-adm verify Verification succeeded, current system settings match the preset profile. See tuned log file ('/var/log/tuned/tuned.log') for details.
17.3. Virtual machine performance optimization for specific workloads Copy linkLink copied to clipboard!
Virtual machines (VMs) are frequently dedicated to perform a specific workload. You can improve the performance of your VMs by optimizing their configuration for the intended workload.
| Use case | IOThread | vCPU pinning | vNUMA pinning | huge pages | multi-queue |
|---|---|---|---|---|---|
| Database | For database disks | Yes* | Yes* | Yes* | Yes, see: multi-queue virtio-blk, virtio-scsi |
| Virtualized Network Function (VNF) | No | Yes | Yes | Yes | Yes, see: multi-queue virtio-net |
| High Performance Computing (HPC) | No | Yes | Yes | Yes | No |
| Backup Server | For backup disks | No | No | No | Yes, see: multi-queue virtio-blk, virtio-scsi |
| VM with many CPUs (Usually more than 32) | No | Yes* | Yes* | No | No |
| VM with large RAM (Usually more than 128 GB) | No | No | Yes* | Yes | No |
* If the VM has enough CPUs and RAM to use more than one NUMA node.
A VM can fit in more than one category of use cases. In this situation, you should apply all of the suggested configurations.
17.4. Configuring virtual machine memory Copy linkLink copied to clipboard!
To improve the performance of a virtual machine (VM), you can assign additional host RAM to the VM. Similarly, you can decrease the amount of memory allocated to a VM so the host memory can be allocated to other VMs or tasks.
17.4.1. Memory overcommitment Copy linkLink copied to clipboard!
To ensure optimal use of memory resources available to your host, memory overcommitment is enabled by default in RHEL. By using memory overcommit, you can assign more memory to virtual machines (VMs) than is available to your host. The RHEL kernel then automatically assigns memory to VMs that require it.
This is because VMs running on the KVM hypervisor do not have dedicated blocks of physical RAM assigned to them. Instead, each VM functions as a Linux process where the host’s Linux kernel allocates memory only when requested.
In addition, the host’s memory manager can move the VM’s memory between its own physical memory and swap space. When memory overcommitment is enabled, the kernel can decide to allocate less physical memory than is requested by a VM, because often the requested amount of memory is not fully used by the VM’s process.
Note, however, that with frequent overcommitment for memory-intensive workloads, the system can still become unstable.
Memory overcommitment requires you to allocate sufficient swap space on the host physical machine to accommodate all VMs, as well as enough memory for the host physical machine’s processes. For instructions on the basic recommended swap space size, see What is the recommended swap size for Red Hat platforms? (Red Hat Knowledgebase).
Possible methods to deal with memory shortages on the host include the following:
- Allocate less memory per VM.
- Add more physical memory to the host.
- Use larger swap space.
A VM will run slower if it is swapped frequently. In addition, overcommitting can cause the system to run out of memory (OOM), which may lead to the Linux kernel shutting down important system processes.
Memory overcommit is not supported with device assignment. This is because when device assignment is in use, all virtual machine memory must be statically pre-allocated to enable direct memory access (DMA) with the assigned device.
17.4.2. Adding and removing virtual machine memory by using virtio-mem Copy linkLink copied to clipboard!
RHEL 10 provides the virtio-mem paravirtualized memory device. This device makes it possible to dynamically add or remove host memory in virtual machines (VMs).
17.4.2.1. Overview of virtio-mem Copy linkLink copied to clipboard!
virtio-mem is a paravirtualized memory device that can be used to dynamically add or remove host memory in virtual machines (VMs). For example, you can use this device to move memory resources between running VMs or to resize VM memory in cloud setups based on your current requirements.
By using virtio-mem, you can increase the memory of a VM beyond its initial size, and shrink it back to its original size, in units that can have the size of 2 to several hundred mebibytes (MiBs), depending on the used memory backing and the operating system running inside the VM. Note, however, that virtio-mem also relies on a specific guest operating system configuration, especially to reliably unplug memory.
virtio-mem feature limitations
virtio-mem is currently not compatible with the following features:
- Using memory locking for real-time applications on the host
- Using encrypted virtualization on the host
-
Combining
virtio-memwithmemballooninflation and deflation on the host -
Unloading or reloading the
virtio_memdriver in a VM -
Using vhost-user devices, with the exception of
virtiofs
17.4.2.2. Configuring memory onlining in virtual machines Copy linkLink copied to clipboard!
Before using virtio-mem to attach memory to a running virtual machine (also known as memory hot-plugging), you must configure the virtual machine (VM) operating system to automatically set the hot-plugged memory to an online state. Otherwise, the guest operating system is not able to use the additional memory.
By default in RHEL, memory onlining is configured with udev rules. However, when using virtio-mem, configure memory onlining directly in the kernel.
Prerequisites
- The host uses the Intel 64, AMD64, ARM 64, or IBM Z CPU architecture.
- The host uses RHEL 10 as the operating system.
VMs running on the host use one of the following operating system versions:
On Intel 64 and AMD64 hosts: RHEL 8.10, RHEL 9.4 or later, RHEL 10.0 or later, or supported 64-bit version of Windows
ImportantUnplugging memory from a running VM is disabled by default in RHEL 8.10 VMs.
For a list of supported Windows versions, see: Certified Guest Operating Systems
- On ARM 64 hosts: RHEL 9.6 or later or RHEL 10.0 or later
- On IBM Z hosts: RHEL 9.7 or later or RHEL 10.1 or later
You have chosen the optimal configuration for memory onlining:
-
online_movable -
online_kernel -
auto-movable
To learn about differences between these configurations, see: Comparison of memory onlining configurations
-
Procedure
To set memory onlining to use the
online_movableconfiguration in the VM:Set the
memhp_default_statekernel command line parameter toonline_movable:# grubby --update-kernel=ALL --remove-args=memhp_default_state --args=memhp_default_state=online_movable- Reboot the VM.
To set memory onlining to use the
online_kernelconfiguration in the VM:Set the
memhp_default_statekernel command line parameter toonline_kernel:# grubby --update-kernel=ALL --remove-args=memhp_default_state --args=memhp_default_state=online_kernel- Reboot the VM.
To use the
auto-movablememory onlining policy in the VM:Set the
memhp_default_statekernel command line parameter toonline:# grubby --update-kernel=ALL --remove-args=memhp_default_state --args=memhp_default_state=onlineSet the
memory_hotplug.online_policykernel command line parameter toauto-movable:# grubby --update-kernel=ALL --remove-args="memory_hotplug.online_policy" --args=memory_hotplug.online_policy=auto-movableOptional: To further tune the
auto-movableonlining policy, change thememory_hotplug.auto_movable_ratioandmemory_hotplug.auto_movable_numa_awareparameters:# grubby --update-kernel=ALL --remove-args="memory_hotplug.auto_movable_ratio" --args=memory_hotplug.auto_movable_ratio=<percentage> # grubby --update-kernel=ALL --remove-args="memory_hotplug.memory_auto_movable_numa_aware" --args=memory_hotplug.auto_movable_numa_aware=<y/n>-
The
memory_hotplug.auto_movable_ratio parametersets the maximum ratio of memory only available for movable allocations compared to memory available for any allocations. The ratio is expressed in percents and the default value is: 301 (%), which is a 3:1 ratio. The
memory_hotplug.auto_movable_numa_awareparameter controls whether thememory_hotplug.auto_movable_ratioparameter applies to memory across all available NUMA nodes or only for memory within a single NUMA node. The default value is: y (yes)For example, if the maximum ratio is set to 301% and the
memory_hotplug.auto_movable_numa_awareis set to y (yes), than the 3:1 ratio is applied even within the NUMA node with the attachedvirtio-memdevice. If the parameter is set to n (no), the maximum 3:1 ratio is applied only for all the NUMA nodes as a whole.Additionally, if the ratio is not exceeded, the newly hot-plugged memory will be available only for movable allocations. Otherwise, the newly hot-plugged memory will be available for both movable and unmovable allocations.
-
The
- Reboot the VM.
Verification
To see if the
online_movableconfiguration has been set correctly, check the current value of thememhp_default_statekernel parameter:# cat /sys/devices/system/memory/auto_online_blocks online_movableTo see if the
online_kernelconfiguration has been set correctly, check the current value of thememhp_default_statekernel parameter:# cat /sys/devices/system/memory/auto_online_blocks online_kernelTo see if the
auto-movableconfiguration has been set correctly, check the following kernel parameters:memhp_default_state:# cat /sys/devices/system/memory/auto_online_blocks onlinememory_hotplug.online_policy:# cat /sys/module/memory_hotplug/parameters/online_policy auto-movablememory_hotplug.auto_movable_ratio:# cat /sys/module/memory_hotplug/parameters/auto_movable_ratio 301memory_hotplug.auto_movable_numa_aware:# cat /sys/module/memory_hotplug/parameters/auto_movable_numa_aware y
17.4.2.3. Attaching a virtio-mem device to virtual machines Copy linkLink copied to clipboard!
To attach additional memory to a running virtual machine (also known as memory hot-plugging) and afterwards be able to resize the hot-plugged memory, you can use a virtio-mem device.
Specifically, you can use libvirt XML configuration files and virsh commands to define and attach virtio-mem devices to virtual machines (VMs).
Prerequisites
- The host uses the Intel 64, AMD64, ARM 64, or IBM Z CPU architecture.
- The host uses RHEL 10 as the operating system.
VMs running on the host use one of the following operating system versions:
On Intel 64 and AMD64 hosts: RHEL 8.10, RHEL 9.4 or later, RHEL 10.0 or later, or supported 64-bit version of Windows
ImportantUnplugging memory from a running VM is disabled by default in RHEL 8.10 VMs.
For a list of supported Windows versions, see: Certified Guest Operating Systems
- On ARM 64 hosts: RHEL 9.6 or later or RHEL 10.0 or later
- On IBM Z hosts: RHEL 9.7 or later or RHEL 10.1 or later
- The VM has memory onlining configured. For instructions, see: Configuring memory onlining in virtual machines
Procedure
Ensure that the XML configuration of the target VM includes the
maxMemoryparameter:# virsh edit testguest1 <domain type='kvm'> <name>testguest1</name> ... <maxMemory unit='GiB'>128</maxMemory> ... </domain>In this example, the XML configuration of the
testguest1VM defines amaxMemoryparameter with a 128 gibibyte (GiB) size. ThemaxMemorysize specifies the maximum memory the VM can use, which includes both initial and hot-plugged memory.Create and open an XML file to define
virtio-memdevices on the host, for example:# vim virtio-mem-device.xmlAdd XML definitions of
virtio-memdevices to the file and save it:<memory model='virtio-mem'> <target> <size unit='GiB'>48</size> <node>0</node> <block unit='MiB'>2</block> <requested unit='GiB'>16</requested> <current unit='GiB'>16</current> </target> <alias name='ua-virtiomem0'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x0'/> </memory> <memory model='virtio-mem'> <target> <size unit='GiB'>48</size> <node>1</node> <block unit='MiB'>2</block> <requested unit='GiB'>0</requested> <current unit='GiB'>0</current> </target> <alias name='ua-virtiomem1'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x0'/> </memory>In this example, two
virtio-memdevices are defined with the following parameters:-
size: This is the maximum size of the device. In the example, it is 48 GiB. Thesizemust be a multiple of theblocksize. -
node: This is the assigned vNUMA node for thevirtio-memdevice. -
block: This is the block size of the device. It must be at least the size of the Transparent Huge Page (THP), which is 2 MiB on Intel 64 and AMD64 CPU architecture. On ARM64 architecture, the size of THP can be 2 MiB or 512 MiB depending on the base page size. The 2 MiB block size on Intel 64 or AMD64 architecture is usually a good default choice. When usingvirtio-memwith Virtual Function I/O (VFIO) or mediated devices (mdev), the total number of blocks across allvirtio-memdevices must not be larger than 32768, otherwise the plugging of RAM might fail. -
requested: This is the amount of memory you attach to the VM with thevirtio-memdevice. However, it is just a request towards the VM and it might not be resolved successfully, for example if the VM is not properly configured. Therequestedsize must be a multiple of theblocksize and cannot exceed the maximum definedsize. -
current: This represents the current size thevirtio-memdevice provides to the VM. Thecurrentsize can differ from therequestedsize, for example when requests cannot be completed or when rebooting the VM. alias: This is an optional user-defined alias that you can use to specify the intendedvirtio-memdevice, for example when editing the device with libvirt commands. All user-defined aliases in libvirt must start with the "ua-" prefix.Apart from these specific parameters,
libvirthandles thevirtio-memdevice like any other PCI device.For more information on managing PCI devices attached to VMs, see: Managing virtual devices
-
Use the XML file to attach the defined
virtio-memdevices to a VM. For example, to permanently attach the two devices defined in thevirtio-mem-device.xmlto the runningtestguest1VM:# virsh attach-device testguest1 virtio-mem-device.xml --live --configThe
--liveoption attaches the device to a running VM only, without persistence between boots. The--configoption makes the configuration changes persistent. You can also attach the device to a shutdown VM without the--liveoption.Optional: To dynamically change the
requestedsize of avirtio-memdevice attached to a running VM, use thevirsh update-memory-devicecommand:# virsh update-memory-device testguest1 --alias ua-virtiomem0 --requested-size 4GiBIn this example:
-
testguest1is the VM you want to update. -
--alias ua-virtiomem0is thevirtio-memdevice specified by a previously defined alias. --requested-size 4GiBchanges therequestedsize of thevirtio-memdevice to 4 GiB.WarningUnplugging memory from a running VM by reducing the
requestedsize might be unreliable. Whether this process succeeds depends on various factors, such as the memory onlining policy that is used.In some cases, the guest operating system cannot complete the request successfully, because changing the amount of hot-plugged memory is not possible at that time.
Additionally, unplugging memory from a running VM is disabled by default in RHEL 8.10 VMs.
-
Optional: To unplug a
virtio-memdevice from a shut-down VM, use thevirsh detach-devicecommand:# virsh detach-device testguest1 virtio-mem-device.xmlOptional: To unplug a
virtio-memdevice from a running VM:Change the
requestedsize of thevirtio-memdevice to 0, otherwise the attempt to unplug avirtio-memdevice from a running VM will fail.# virsh update-memory-device testguest1 --alias ua-virtiomem0 --requested-size 0Unplug a
virtio-memdevice from the running VM:# virsh detach-device testguest1 virtio-mem-device.xml --config
Verification
In the VM, check the available RAM and see if the total amount now includes the hot-plugged memory:
# free -h total used free shared buff/cache available Mem: 31Gi 5.5Gi 14Gi 1.3Gi 11Gi 23Gi Swap: 8.0Gi 0B 8.0Gi# numactl -H available: 1 nodes (0) node 0 cpus: 0 1 2 3 4 5 6 7 node 0 size: 29564 MB node 0 free: 13351 MB node distances: node 0 0: 10The current amount of plugged-in RAM can be also viewed on the host by displaying the XML configuration of the running VM:
# virsh dumpxml testguest1 <domain type='kvm'> <name>testguest1</name> ... <currentMemory unit='GiB'>31</currentMemory> ... <memory model='virtio-mem'> <target> <size unit='GiB'>48</size> <node>0</node> <block unit='MiB'>2</block> <requested unit='GiB'>16</requested> <current unit='GiB'>16</current> </target> <alias name='ua-virtiomem0'/> <address type='pci' domain='0x0000' bus='0x08' slot='0x00' function='0x0'/> ... </domain>In this example:
-
<currentMemory unit='GiB'>31</currentMemory>represents the total RAM available in the VM from all sources. -
<current unit='GiB'>16</current>represents the current size of the plugged-in RAM provided by thevirtio-memdevice.
-
17.4.2.4. Comparison of memory onlining configurations Copy linkLink copied to clipboard!
When attaching memory to a running RHEL virtual machine (also known as memory hot-plugging), you must set the hot-plugged memory to an online state in the virtual machine (VM) operating system. Otherwise, the system will not be able to use the memory.
To do this, you can use several different configurations. The following table summarizes the main considerations when choosing between the available memory onlining configurations.
| Configuration name | Unplugging memory from a VM | A risk of creating a memory zone imbalance | A potential use case | Memory requirements of the intended workload |
|---|---|---|---|---|
|
| Hot-plugged memory can be reliably unplugged. | Yes | Hot-plugging a comparatively small amount of memory | Mostly user-space memory |
|
| Movable portions of hot-plugged memory can be reliably unplugged. | Minimal | Hot-plugging a large amount of memory | Mostly user-space memory |
|
| Hot-plugged memory cannot be reliably unplugged. | No | Unreliable memory unplugging is acceptable. | User-space or kernel-space memory |
A zone imbalance is a lack of available memory pages in one of the Linux memory zones. A zone imbalance can negatively impact the system performance. For example, the kernel might crash if it runs out of free memory for unmovable allocations. Usually, movable allocations contain mostly user-space memory pages and unmovable allocations contain mostly kernel-space memory pages.
17.4.3. Configuring virtual machines to use huge pages Copy linkLink copied to clipboard!
In certain use cases, you can improve memory allocation for your virtual machines (VMs) by using huge pages instead of the default 4 KiB memory pages. For example, huge pages can improve performance for VMs with high memory utilization, such as database servers.
Prerequisites
- The host is configured to use huge pages in memory allocation. For instructions, see Configuring HugeTLB at boot time
- The selected VM is shut down.
Procedure
Open the XML configuration of the selected VM. For example, to edit a
testguestVM, run the following command:# virsh edit testguestAdjust the huge page configuration of the VM. For example, to configure the VM to use 1 GiB huge pages, add the following lines to the
<memoryBacking>section in the configuration:<memoryBacking> <hugepages> <page size='1' unit='GiB'/> </hugepages> </memoryBacking>
Verification
- Start the VM.
Confirm that the host has successfully allocated huge pages for the running VM. On the host, run the following command:
# cat /proc/meminfo | grep Huge HugePages_Total: 4 HugePages_Free: 2 HugePages_Rsvd: 1 Hugepagesize: 1024000 kBWhen you add together the number of free and reserved huge pages (
HugePages_Free+HugePages_Rsvd), the result should be less than the total number of huge pages (HugePages_Total). The difference is the number of huge pages that is used by the running VM.
17.5. Optimizing virtual machine I/O performance Copy linkLink copied to clipboard!
The input and output (I/O) capabilities of a virtual machine (VM) can significantly limit the VM’s overall efficiency. To address this, you can optimize a VM’s I/O by configuring block I/O parameters.
17.5.1. Tuning block I/O in virtual machines Copy linkLink copied to clipboard!
When multiple block devices, such as storage drives, are being used by one or more virtual machines (VMs), you can adjust the I/O priority of specific virtual devices by modifying their I/O weights.
Increasing the I/O weight of a device increases its priority for I/O bandwidth, and as a result, it provides the device with more host resources. Similarly, reducing a device’s weight makes the device consume less host resources.
Each device’s weight value must be within the 100 to 1000 range. Alternatively, the value can be 0, which removes that device from per-device listings.
Procedure
Display the current
<blkio>parameters for a VM:# virsh dumpxml VM-name<domain> [...] <blkiotune> <weight>800</weight> <device> <path>/dev/sda</path> <weight>1000</weight> </device> <device> <path>/dev/sdb</path> <weight>500</weight> </device> </blkiotune> [...] </domain>Edit the I/O weight of a specified device:
# virsh blkiotune VM-name --device-weights device, I/O-weightFor example, the following changes the weight of the /dev/sda device in the testguest1 VM to 500.
# virsh blkiotune testguest1 --device-weights /dev/sda, 500
Verification
Check that the VM’s block I/O parameters have been configured correctly.
# virsh blkiotune testguest1 Block I/O tuning parameters for domain testguest1: weight : 800 device_weight : [ {"sda": 500}, ] ...ImportantCertain kernels do not support setting I/O weights for specific devices. If the previous step does not display the weights as expected, it is likely that this feature is not compatible with your host kernel.
17.5.2. Configuring disk I/O throttling in virtual machines Copy linkLink copied to clipboard!
When several virtual machines (VMs) are running simultaneously, they can interfere with system performance by using excessive disk I/O. To prevent this, you can use disk I/O throttling.
With disk I/O throttling, you can set a limit on disk I/O requests sent from the VMs to the host machine. This can prevent a VM from over-utilizing shared resources and impacting the performance of other VMs.
Disk I/O throttling can be useful in various situations, for example when VMs that belong to different customers are running on the same host, or when quality of service guarantees are given for different VMs. Disk I/O throttling can also be used to simulate slower disks.
To enable disk I/O throttling, set a limit on disk I/O requests sent from each block device attached to VMs to the host machine.
I/O throttling can be applied independently to each block device attached to a VM, and supports limits on throughput and I/O operations.
Procedure
Use the
virsh domblklistcommand to list the names of all the disk devices on a specified VM.# virsh domblklist testguest1 Target Source ------------------------------------------------ vda /var/lib/libvirt/images/testguest1.qcow2 sda - sdb /home/sample-disk.isoFind the host block device where the virtual disk that you want to throttle is mounted.
For example, if you want to throttle the
sdbvirtual disk from the previous step, the following output shows that the disk is mounted on the/dev/nvme0n1p3partition.$ lsblk NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT zram0 252:0 0 4G 0 disk [SWAP] nvme0n1 259:0 0 238.5G 0 disk ├─nvme0n1p1 259:1 0 600M 0 part /boot/efi ├─nvme0n1p2 259:2 0 1G 0 part /boot └─nvme0n1p3 259:3 0 236.9G 0 part └─luks-a1123911-6f37-463c-b4eb-fxzy1ac12fea 253:0 0 236.9G 0 crypt /homeSet I/O limits for the block device by using the
virsh blkiotunecommand.# virsh blkiotune VM-name --parameter device,limitThe following example throttles the
sdbdisk on thetestguest1VM to 1000 read and write I/O operations per second and to 50 MB per second read and write throughput.# virsh blkiotune testguest1 --device-read-iops-sec /dev/nvme0n1p3,1000 --device-write-iops-sec /dev/nvme0n1p3,1000 --device-write-bytes-sec /dev/nvme0n1p3,52428800 --device-read-bytes-sec /dev/nvme0n1p3,52428800ImportantRed Hat does not support using the
virsh blkdeviotunecommand to configure I/O throttling in VMs.+ For more information about unsupported features when using RHEL 10 as a VM host, see Unsupported features in RHEL 10 virtualization.
17.5.3. Enabling multi-queue on storage devices Copy linkLink copied to clipboard!
When using virtio-blk or virtio-scsi storage devices in your virtual machines (VMs), you can improve your storage performance and scalability by using the multi-queue feature. With multi-queue, each virtual CPU (vCPU) can have a separate queue and interrupt to use without affecting other vCPUs.
Note that _multi-queue feature is enabled by default for the Q35 machine type, but on the i440fx machine type, you must enable it manually. You can tune the number of queues to be optimal for your workload, but the optimal number differs for each type of workload and you must test which number of queues works best in your case.
Procedure
To enable multi-queue on a storage device, edit the XML configuration of the VM.
# virsh edit <example_vm>In the XML configuration, find the intended storage device and change the
queuesparameter to use multiple I/O queues. Replace N with the number of vCPUs in the VM, up to 16.A
virtio-blkexample:<disk type='block' device='disk'> <driver name='qemu' type='raw' queues='N'/> <source dev='/dev/sda'/> <target dev='vda' bus='virtio'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x0'/> </disk>A
virtio-scsiexample:<controller type='scsi' index='0' model='virtio-scsi'> <driver queues='N' /> </controller>
- Restart the VM for the changes to take effect.
17.5.4. Configuring dedicated IOThreads Copy linkLink copied to clipboard!
To improve the I/O performance of a disk on your virtual machine (VM), you can configure a dedicated IOThread that is used to manage the I/O operations of the VM’s disk.
Normally, the I/O operations of a disk are a part of the main QEMU thread, which can decrease the responsiveness of the VM as a whole during intensive I/O workloads. By separating the I/O operations to a dedicated IOThread, you can significantly increase the responsiveness and performance of your VM.
Procedure
- Shut down the selected VM if it is running.
On the host, add or edit the
<iothreads>tag in the XML configuration of the VM. For example, to create a singleIOThreadfor atestguest1VM:# virsh edit <testguest1> <domain type='kvm'> <name>testguest1</name> ... <vcpu placement='static'>8</vcpu> <iothreads>1</iothreads> ... </domain>NoteFor optimal results, use only 1-2
IOThreadsper CPU on the host.Assign a dedicated
IOThreadto a VM disk. For example, to assign anIOThreadwith ID of1to a disk on thetestguest1VM:# virsh edit <testguest1> <domain type='kvm'> <name>testguest1</name> ... <devices> <disk type='file' device='disk'> <driver name='qemu' type='raw' cache='none' io='native' iothread='1'/> <source file='/var/lib/libvirt/images/test-disk.raw'/> <target dev='vda' bus='virtio'/> <address type='pci' domain='0x0000' bus='0x04' slot='0x00' function='0x0'/> </disk> ... </devices> ... </domain>NoteIOThreadIDs start from 1 and you must dedicate only a singleIOThreadto a disk.Usually, a one dedicated
IOThreadper VM is sufficient for optimal performance.When using
virtio-scsistorage devices, assign a dedicatedIOThreadto thevirtio-scsicontroller. For example, to assign anIOThreadwith ID of1to a controller on thetestguest1VM:# virsh edit <testguest1> <domain type='kvm'> <name>testguest1</name> ... <devices> <controller type='scsi' index='0' model='virtio-scsi'> <driver iothread='1'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x0b' function='0x0'/> </controller> ... </devices> ... </domain>
Verification
- Evaluate the impact of your changes on your VM performance. For details, see: Virtual machine performance monitoring tools
17.5.5. Configuring virtual disk caching Copy linkLink copied to clipboard!
For intensive I/O workloads, selecting the optimal caching mode can significantly increase the virtual machine (VM) performance.
The KVM hypervisor provides several virtual disk caching modes:
writethrough- Host page cache is used for reading only. Writes are reported as completed only when the data has been committed to the storage device. The sustained IO performance is decreased but this mode has good write guarantees.
writeback-
Host page cache is used for both reading and writing. Writes are reported as complete when data reaches the host’s memory cache, not physical storage. This mode has faster IO performance than
writethroughbut it is possible to lose data on host failure. none- Host page cache is bypassed entirely. This mode relies directly on the write queue of the physical disk, so it has a predictable sustained IO performance and offers good write guarantees on a stable guest. It is also a safe cache mode for VM live migration.
Procedure
- Shut down the selected VM if it is running.
Edit the XML configuration of the selected VM.
# virsh edit <vm_name>Find the disk device and edit the
cacheoption in thedrivertag.<domain type='kvm'> <name>testguest1</name> ... <devices> <disk type='file' device='disk'> <driver name='qemu' type='raw' cache='none' io='native' iothread='1'/> <source file='/var/lib/libvirt/images/test-disk.raw'/> <target dev='vda' bus='virtio'/> <address type='pci' domain='0x0000' bus='0x04' slot='0x00' function='0x0'/> </disk> ... </devices> ... </domain>
17.6. Optimizing virtual machine CPU performance Copy linkLink copied to clipboard!
Much like physical CPUs in host machines, virtual CPUs (vCPUs) are critical to virtual machine (VM) performance. Optimizing vCPUs can have a significant impact on the resource efficiency of your VMs.
The general steps to optimize your vCPU include the following:
- Adjust how many host CPUs are assigned to the VM. You can do this by using the CLI or the web console.
Ensure that the vCPU model is aligned with the CPU model of the host. For example, to set the testguest1 VM to use the CPU model of the host:
# virt-xml testguest1 --edit --cpu host-modelOn an ARM 64 system, use
--cpu host-passthrough.- Manage kernel same-page merging (KSM).
If your host machine uses Non-Uniform Memory Access (NUMA), you can also configure NUMA for its VMs. This maps the host’s CPU and memory processes onto the CPU and memory processes of the VM as closely as possible. In effect, NUMA tuning provides the vCPU with a more streamlined access to the system memory allocated to the VM, which can improve the vCPU processing effectiveness.
For details, see Configuring NUMA in a virtual machine and Virtual machine performance optimization for specific workloads.
17.6.1. vCPU overcommitment Copy linkLink copied to clipboard!
By using virtual CPU (vCPU) overcommitment, you can have a setup where the sum of all vCPUs in virtual machines (VMs) running on a host exceeds the number of physical CPUs on the host.
However, you might experience performance deterioration when simultaneously running more cores in your VMs than are physically available on the host.
Best practices for vCPU overcommitment include the following:
- Assign the minimum amount of vCPUs required by by the VM’s workloads for best performance.
- Avoid overcommitting vCPUs in production without extensive testing.
- If overcomitting vCPUs, the safe ratio is typically 5 vCPUs to 1 physical CPU for loads under 100%.
- It is not recommended to have more than 10 total allocated vCPUs per physical processor core.
- Monitor CPU usage to prevent performance degradation under heavy loads.
Applications that use 100% of memory or processing resources may become unstable in overcommitted environments. Because the CPU overcommit ratio is workload-dependent, do not overcommit memory or CPUs in a production environment without extensive testing.
17.6.2. Adding and removing virtual CPUs by using the command line Copy linkLink copied to clipboard!
To increase or optimize the CPU performance of a virtual machine (VM), you can add or remove virtual CPUs (vCPUs) assigned to the VM.
When performed on a running VM, this is also referred to as vCPU hot plugging and hot unplugging. However, note that vCPU hot unplug is not supported in RHEL 10, and Red Hat highly discourages its use.
Procedure
Optional: View the current state of the vCPUs in the selected VM. For example, to display the number of vCPUs on the testguest VM:
# virsh vcpucount testguest maximum config 4 maximum live 2 current config 2 current live 1This output indicates that testguest is currently using 1 vCPU, and 1 more vCPu can be hot plugged to it to increase the VM’s performance. However, after reboot, the number of vCPUs testguest uses will change to 2, and it will be possible to hot plug 2 more vCPUs.
Adjust the maximum number of vCPUs that can be attached to the VM, which takes effect on the VM’s next boot.
For example, to increase the maximum vCPU count for the testguest VM to 8:
# virsh setvcpus testguest 8 --maximum --configNote that the maximum may be limited by the CPU topology, host hardware, the hypervisor, and other factors.
Adjust the current number of vCPUs attached to the VM, up to the maximum configured in the previous step. For example:
To increase the number of vCPUs attached to the running testguest VM to 4:
# virsh setvcpus testguest 4 --liveThis increases the VM’s performance and host load footprint of testguest until the VM’s next boot.
To permanently decrease the number of vCPUs attached to the testguest VM to 1:
# virsh setvcpus testguest 1 --configThis decreases the VM’s performance and host load footprint of testguest after the VM’s next boot. However, if needed, additional vCPUs can be hot plugged to the VM to temporarily increase its performance.
Verification
Confirm that the current state of vCPU for the VM reflects your changes.
# virsh vcpucount testguest maximum config 8 maximum live 4 current config 1 current live 4
17.6.3. Managing virtual CPUs by using the web console Copy linkLink copied to clipboard!
To review and configure virtual CPUs used by virtual machines (VMs) on your host, you can use the RHEL 10 web console.
Prerequisites
You have installed the RHEL 10 web console.
For instructions, see Installing and enabling the web console.
- The web console VM plug-in is installed on your system.
Procedure
- Log in to the RHEL 10 web console.
In the interface, click the VM whose information you want to see.
A new page opens with an Overview section with basic information about the selected VM and a Console section to access the VM’s graphical interface.
Click next to the number of vCPUs in the Overview pane.
The vCPU details dialog appears.
Configure the virtual CPUs for the selected VM.
vCPU Count - The number of vCPUs currently in use.
NoteThe vCPU count cannot be greater than the vCPU Maximum.
- vCPU Maximum - The maximum number of virtual CPUs that can be configured for the VM. If this value is higher than the vCPU Count, additional vCPUs can be attached to the VM.
- Sockets - The number of sockets to expose to the VM.
- Cores per socket - The number of cores for each socket to expose to the VM.
Threads per core - The number of threads for each core to expose to the VM.
ImportantNote that the Sockets, Cores per socket, and Threads per core options adjust the CPU topology of the VM. This may be beneficial for vCPU performance and may impact the functionality of certain software in the guest OS. If a different setting is not required by your deployment, keep the default values.
Click .
The virtual CPUs for the VM are configured.
- If the VM is running, restart it for the changes to virtual CPU settings to take effect.
17.6.4. Configuring NUMA in a virtual machine Copy linkLink copied to clipboard!
To configure Non-Uniform Memory Access (NUMA) settings of a virtual machine (VM), you can either use automated utilities or manual setup. By configuring NUMA settings, you can improve VM performance by aligning vCPUs with memory resources on NUMA-compatible hosts.
For ease of use, you can set up a VM’s NUMA configuration by using automated utilities and services. However, manual NUMA setup is more likely to yield a significant performance improvement.
Prerequisites
The host is a NUMA-compatible machine. To detect whether this is the case, use the
virsh nodeinfocommand and see theNUMA cell(s)line:# virsh nodeinfo CPU model: x86_64 CPU(s): 48 CPU frequency: 1200 MHz CPU socket(s): 1 Core(s) per socket: 12 Thread(s) per core: 2 NUMA cell(s): 2 Memory size: 67012964 KiBIf the value of the line is 2 or greater, the host is NUMA-compatible.
Optional: You have the
numactlpackage installed on the host.# dnf install numactl
Procedure
Set the VM’s NUMA policy to
Preferred. For example, to configure the testguest5 VM:# virt-xml testguest5 --edit --vcpus placement=auto # virt-xml testguest5 --edit --numatune mode=preferredSet up the
numadservice to automatically align the VM CPU with memory resources.# echo 1 > /proc/sys/kernel/numa_balancingStart the
numadservice.# systemctl start numadOptional: Tune NUMA settings manually. Specify which host NUMA nodes will be assigned specifically to a certain VM. This can improve the host memory usage by the VM’s vCPU.
Use the
numactlcommand to view the NUMA topology on the host:# numactl --hardware available: 2 nodes (0-1) node 0 size: 18156 MB node 0 free: 9053 MB node 1 size: 18180 MB node 1 free: 6853 MB node distances: node 0 1 0: 10 20 1: 20 10Edit the XML configuration of a VM to assign CPU and memory resources to specific NUMA nodes. For example, the following configuration sets testguest6 to use vCPUs 0-7 on NUMA node
0and vCPUS 8-15 on NUMA node1. Both nodes are also assigned 16 GiB of VM’s memory.# virsh edit <testguest6> <domain type='kvm'> <name>testguest6</name> ... <vcpu placement='static'>16</vcpu> ... <cpu ...> <numa> <cell id='0' cpus='0-7' memory='16' unit='GiB'/> <cell id='1' cpus='8-15' memory='16' unit='GiB'/> </numa> ... </domain>NoteFor best performance results, it is a good practice to respect the maximum memory size for each NUMA node on the host.
- If the VM is running, restart it to apply the configuration.
Troubleshooting
17.6.5. Configuring virtual CPU pinning Copy linkLink copied to clipboard!
To improve the CPU performance of a virtual machine (VM), you can pin a virtual CPU (vCPU) to a specific physical CPU thread on the host. This ensures that the vCPU will have its own dedicated physical CPU thread, which can significantly improve the vCPU performance.
To further optimize the CPU performance, you can also pin QEMU process threads associated with a specified VM to a specific host CPU.
Procedure
Check the CPU topology on the host:
# lscpu -p=node,cpu Node,CPU 0,0 0,1 0,2 0,3 0,4 0,5 0,6 0,7 1,0 1,1 1,2 1,3 1,4 1,5 1,6 1,7In this example, the output contains NUMA nodes and the available physical CPU threads on the host.
Check the number of vCPU threads inside the VM:
# lscpu -p=node,cpu Node,CPU 0,0 0,1 0,2 0,3In this example, the output contains NUMA nodes and the available vCPU threads inside the VM.
Pin specific vCPU threads from a VM to a specific host CPU or range of CPUs. This is suggested as a safe method of vCPU performance improvement.
For example, the following commands pin vCPU threads 0 to 3 of the testguest6 VM to host CPUs 1, 3, 5, 7, respectively:
# virsh vcpupin testguest6 0 1 # virsh vcpupin testguest6 1 3 # virsh vcpupin testguest6 2 5 # virsh vcpupin testguest6 3 7Optional: Verify whether the vCPU threads are successfully pinned to CPUs.
# virsh vcpupin testguest6 VCPU CPU Affinity ---------------------- 0 1 1 3 2 5 3 7Optional: After pinning vCPU threads, you can also pin QEMU process threads associated with a specified VM to a specific host CPU or range of CPUs. This can further help the QEMU process to run more efficiently on the physical CPU.
For example, the following commands pin the QEMU process thread of testguest6 to CPUs 2 and 4, and verify this was successful:
# virsh emulatorpin testguest6 2,4 # virsh emulatorpin testguest6 emulator: CPU Affinity ---------------------------------- *: 2,4
17.6.6. Configuring virtual CPU capping Copy linkLink copied to clipboard!
To limit the amount of CPU resources a virtual machine (VM) can use, you can set up virtual CPU (vCPU) capping. vCPU capping can improve the overall performance by preventing excessive use of host’s CPU resources by a single VM and by making it easier for the hypervisor to manage CPU scheduling.
Procedure
View the current vCPU scheduling configuration on the host.
# virsh schedinfo <vm_name> Scheduler : posix cpu_shares : 0 vcpu_period : 0 vcpu_quota : 0 emulator_period: 0 emulator_quota : 0 global_period : 0 global_quota : 0 iothread_period: 0 iothread_quota : 0To configure an absolute vCPU cap for a VM, set the
vcpu_periodandvcpu_quotaparameters. Both parameters use a numerical value that represents a time duration in microseconds.Set the
vcpu_periodparameter by using thevirsh schedinfocommand. For example:# virsh schedinfo <vm_name> --set vcpu_period=100000In this example, the
vcpu_periodis set to 100,000 microseconds, which means the scheduler enforces vCPU capping during this time interval.You can also use the
--live --configoptions to configure a running VM without restarting it.Set the
vcpu_quotaparameter by using thevirsh schedinfocommand. For example:# virsh schedinfo <vm_name> --set vcpu_quota=50000In this example, the
vcpu_quotais set to 50,000 microseconds, which specifies the maximum amount of CPU time that the VM can use during thevcpu_periodtime interval. In this case,vcpu_quotais set as the half ofvcpu_period, so the VM can use up to 50% of the CPU time during that interval.You can also use the
--live --configoptions to configure a running VM without restarting it.
Verification
Check that the vCPU scheduling parameters have the correct values.
# virsh schedinfo <vm_name> Scheduler : posix cpu_shares : 2048 vcpu_period : 100000 vcpu_quota : 50000 ...
17.6.7. Tuning CPU weights Copy linkLink copied to clipboard!
To improve the CPU performance of a specific virtal machine (VM), you can prioritize CPU time allocation for that VM. To do this, use the CPU weight setting.
This setting controls how much CPU time a virtual machine (VM) receives compared to other running VMs. When you increase the CPU weight of a specific VM, this VM receives more CPU time relative to other VMs.
To configure the CPU weight of a VM, adjust its cpu_shares parameter. The possible CPU weight values range from 0 to 262144 and the default value for a new VM is 1024.
Procedure
Check the current CPU weight of a VM.
# virsh schedinfo <vm_name> Scheduler : posix cpu_shares : 1024 vcpu_period : 0 vcpu_quota : 0 emulator_period: 0 emulator_quota : 0 global_period : 0 global_quota : 0 iothread_period: 0 iothread_quota : 0Adjust the CPU weight to a preferred value.
# virsh schedinfo <vm_name> --set cpu_shares=2048 Scheduler : posix cpu_shares : 2048 vcpu_period : 0 vcpu_quota : 0 emulator_period: 0 emulator_quota : 0 global_period : 0 global_quota : 0 iothread_period: 0 iothread_quota : 0In this example,
cpu_sharesis set to 2048. This means that if all other VMs have the value set to 1024, this VM receives approximately twice the amount of CPU time.You can also use the
--live --configoptions to configure a running VM without restarting it.
17.6.8. Enabling and disabling kernel same-page merging Copy linkLink copied to clipboard!
To optimize CPU efficiency of your virtual machines (VMs), you can enable or disable kernel same-page merging (KSM).
KSM improves memory density by sharing identical memory pages between VMs. Therefore, enabling KSM might improve memory efficiency of your VM deployment. However, enabling KSM also increases CPU utilization, and might negatively affect overall performance depending on the workload.
In RHEL 10, KSM is disabled by default. To enable KSM and test its impact on your VM performance, see the following instructions.
Prerequisites
- Root access to your host system.
Procedure
Enable KSM:
WarningEnabling KSM increases CPU utilization and affects overall CPU performance.
Install the
ksmtunedservice:# dnf install ksmtunedStart the service:
To enable KSM for a single session, use the
systemctlutility to start theksmandksmtunedservices.# systemctl start ksm # systemctl start ksmtunedTo enable KSM persistently, use the
systemctlutility to enable theksmandksmtunedservices.# systemctl enable ksm Created symlink /etc/systemd/system/multi-user.target.wants/ksm.service/usr/lib/systemd/system/ksm.service # systemctl enable ksmtuned Created symlink /etc/systemd/system/multi-user.target.wants/ksmtuned.service /usr/lib/systemd/system/ksmtuned.service
- Monitor the performance and resource consumption of VMs on your host to evaluate the benefits of activating KSM. Specifically, ensure that the additional CPU usage by KSM does not offset the memory improvements and does not cause additional performance issues. In latency-sensitive workloads, also pay attention to cross-NUMA page merges.
Optional: If KSM has not improved your VM performance, disable it:
To disable KSM for a single session, use the
systemctlutility to stopksmandksmtunedservices.# systemctl stop ksm # systemctl stop ksmtunedTo disable KSM persistently, use the
systemctlutility to disableksmandksmtunedservices.# systemctl disable ksm Removed /etc/systemd/system/multi-user.target.wants/ksm.service. # systemctl disable ksmtuned Removed /etc/systemd/system/multi-user.target.wants/ksmtuned.service.NoteMemory pages shared between VMs before deactivating KSM will remain shared. To stop sharing, delete all the
PageKSMpages in the system by using the following command:# echo 2 > /sys/kernel/mm/ksm/runHowever, this command increases memory usage, and might cause performance problems on your host or your VMs.
17.7. Optimizing virtual machine network performance Copy linkLink copied to clipboard!
Due to the virtual nature of a VM’s network interface controller (NIC), the VM loses a portion of its allocated host network bandwidth. This can reduce the overall workload efficiency of the VM. To minimize the negative impact of virtualization on the virtual NIC (vNIC) throughput, you can use a variety of methods.
Procedure
Use any of the following methods and observe if it has a beneficial effect on your VM network performance:
Enable the vhost_net module
On the host, ensure the
vhost_netkernel feature is enabled:# lsmod | grep vhost vhost_net 32768 1 vhost 53248 1 vhost_net tap 24576 1 vhost_net tun 57344 6 vhost_netIf the output of this command is blank, enable the
vhost_netkernel module:# modprobe vhost_net
Set up multi-queue virtio-net
To set up the multi-queue virtio-net feature for a VM, use the
virsh editcommand to edit to the XML configuration of the VM. In the XML, add the following to the<devices>section, and replaceNwith the number of vCPUs in the VM, up to 16:<interface type='network'> <source network='default'/> <model type='virtio'/> <driver name='vhost' queues='N'/> </interface>If the VM is running, restart it for the changes to take effect.
Batching network packets
In Linux VM configurations with a long transmission path, batching packets before submitting them to the kernel may improve cache utilization. To set up packet batching, use the following command on the host, and replace tap0 with the name of the network interface that the VMs use:
# ethtool -C tap0 rx-frames 64SR-IOV
If your host NIC supports SR-IOV, use SR-IOV device assignment for your vNICs.
For more information, see Managing SR-IOV devices.
17.8. Virtual machine performance monitoring tools Copy linkLink copied to clipboard!
To identify what consumes the most virtual machine (VM) resources and which aspect of VM performance needs optimization, you can use a variety of performance diagnostic tools.
Default OS performance monitoring tools
For standard performance evaluation, you can use the utilities provided by default by your host and guest operating systems:
On your RHEL 10 host, as root, use the
toputility or the system monitor application, and look forqemuandvirtin the output. This shows how much host system resources your VMs are consuming.-
If the monitoring tool displays that any of the
qemuorvirtprocesses consume a large portion of the host CPU or memory capacity, use theperfutility to investigate. For details, see the following section. -
In addition, if a
vhost_netthread process, named for example vhost_net-1234, is displayed as consuming an excessive amount of host CPU capacity, consider using virtual network optimization features, such asmulti-queue virtio-net.
-
If the monitoring tool displays that any of the
On the guest operating system, use performance utilities and applications available on the system to evaluate which processes consume the most system resources.
-
On Linux systems, you can use the
toputility. - On Windows systems, you can use the Task Manager application.
-
On Linux systems, you can use the
perf kvm
You can use the perf utility to collect and analyze virtualization-specific statistics about the performance of your RHEL 10 host. To do so:
On the host, install the perf package:
# dnf install perfUse one of the
perf kvm statcommands to display perf statistics for your virtualization host:-
For real-time monitoring of your hypervisor, use the
perf kvm stat livecommand. -
To log the perf data of your hypervisor over a period of time, activate the logging by using the
perf kvm stat recordcommand. After the command is canceled or interrupted, the data is saved in theperf.data.guestfile, which can be analyzed by using theperf kvm stat reportcommand.
-
For real-time monitoring of your hypervisor, use the
Analyze the
perfoutput for types ofVM-EXITevents and their distribution. For example, thePAUSE_INSTRUCTIONevents should be infrequent, but in the following output, the high occurrence of this event suggests that the host CPUs are not handling the running vCPUs well. In such a scenario, consider shutting down some of your active VMs, removing vCPUs from these VMs, or tuning the performance of the vCPUs.# perf kvm stat report Analyze events for all VMs, all VCPUs: VM-EXIT Samples Samples% Time% Min Time Max Time Avg time EXTERNAL_INTERRUPT 365634 31.59% 18.04% 0.42us 58780.59us 204.08us ( +- 0.99% ) MSR_WRITE 293428 25.35% 0.13% 0.59us 17873.02us 1.80us ( +- 4.63% ) PREEMPTION_TIMER 276162 23.86% 0.23% 0.51us 21396.03us 3.38us ( +- 5.19% ) PAUSE_INSTRUCTION 189375 16.36% 11.75% 0.72us 29655.25us 256.77us ( +- 0.70% ) HLT 20440 1.77% 69.83% 0.62us 79319.41us 14134.56us ( +- 0.79% ) VMCALL 12426 1.07% 0.03% 1.02us 5416.25us 8.77us ( +- 7.36% ) EXCEPTION_NMI 27 0.00% 0.00% 0.69us 1.34us 0.98us ( +- 3.50% ) EPT_MISCONFIG 5 0.00% 0.00% 5.15us 10.85us 7.88us ( +- 11.67% ) Total Samples:1157497, Total events handled time:413728274.66us.Other event types that can signal problems in the output of
perf kvm statinclude:-
INSN_EMULATION- suggests suboptimal VM I/O configuration.
-
For more information about using perf to monitor virtualization performance, see the perf-kvm(1) man page on your system.
numastat
To see the current NUMA configuration of your system, you can use the numastat utility, which is provided by installing the numactl package.
The following shows a host with 4 running VMs, each obtaining memory from multiple NUMA nodes. This is not optimal for vCPU performance, and warrants adjusting:
# numastat -c qemu-kvm
Per-node process memory usage (in MBs)
PID Node 0 Node 1 Node 2 Node 3 Node 4 Node 5 Node 6 Node 7 Total
--------------- ------ ------ ------ ------ ------ ------ ------ ------ -----
51722 (qemu-kvm) 68 16 357 6936 2 3 147 598 8128
51747 (qemu-kvm) 245 11 5 18 5172 2532 1 92 8076
53736 (qemu-kvm) 62 432 1661 506 4851 136 22 445 8116
53773 (qemu-kvm) 1393 3 1 2 12 0 0 6702 8114
--------------- ------ ------ ------ ------ ------ ------ ------ ------ -----
Total 1769 463 2024 7462 10037 2672 169 7837 32434
In contrast, the following shows memory being provided to each VM by a single node, which is significantly more efficient.
# numastat -c qemu-kvm
Per-node process memory usage (in MBs)
PID Node 0 Node 1 Node 2 Node 3 Node 4 Node 5 Node 6 Node 7 Total
--------------- ------ ------ ------ ------ ------ ------ ------ ------ -----
51747 (qemu-kvm) 0 0 7 0 8072 0 1 0 8080
53736 (qemu-kvm) 0 0 7 0 0 0 8113 0 8120
53773 (qemu-kvm) 0 0 7 0 0 0 1 8110 8118
59065 (qemu-kvm) 0 0 8050 0 0 0 0 0 8051
--------------- ------ ------ ------ ------ ------ ------ ------ ------ -----
Total 0 0 8072 0 8072 0 8114 8110 32368