Chapter 17. Optimizing virtual machine performance
Virtual machines (VMs) always experience some degree of performance deterioration in comparison to the host. The following sections explain the reasons for this deterioration and provide instructions on how to minimize the performance impact of virtualization in RHEL 8, so that your hardware infrastructure resources can be used as efficiently as possible.
17.1. What influences virtual machine performance
VMs are run as user-space processes on the host. The hypervisor therefore needs to convert the host’s system resources so that the VMs can use them. As a consequence, a portion of the resources is consumed by the conversion, and the VM therefore cannot achieve the same performance efficiency as the host.
The impact of virtualization on system performance
More specific reasons for VM performance loss include:
- Virtual CPUs (vCPUs) are implemented as threads on the host, handled by the Linux scheduler.
- VMs do not automatically inherit optimization features, such as NUMA or huge pages, from the host kernel.
- Disk and network I/O settings of the host might have a significant performance impact on the VM.
- Network traffic typically travels to a VM through a software-based bridge.
- Depending on the host devices and their models, there might be significant overhead due to emulation of particular hardware.
The severity of the virtualization impact on the VM performance is influenced by a variety factors, which include:
- The number of concurrently running VMs.
- The amount of virtual devices used by each VM.
- The device types used by the VMs.
Reducing VM performance loss
RHEL 8 provides a number of features you can use to reduce the negative performance effects of virtualization. Notably:
-
The
TuneD
service can automatically optimize the resource distribution and performance of your VMs. - Block I/O tuning can improve the performances of the VM’s block devices, such as disks.
- NUMA tuning can increase vCPU performance.
- Virtual networking can be optimized in various ways.
Tuning VM performance can have negative effects on other virtualization functions. For example, it can make migrating the modified VM more difficult.
17.2. Optimizing virtual machine performance by using TuneD
The TuneD
utility is a tuning profile delivery mechanism that adapts RHEL for certain workload characteristics, such as requirements for CPU-intensive tasks or storage-network throughput responsiveness. It provides a number of tuning profiles that are pre-configured to enhance performance and reduce power consumption in a number of specific use cases. You can edit these profiles or create new profiles to create performance solutions tailored to your environment, including virtualized environments.
To optimize RHEL 8 for virtualization, use the following profiles:
-
For RHEL 8 virtual machines, use the virtual-guest profile. It is based on the generally applicable
throughput-performance
profile, but also decreases the swappiness of virtual memory. - For RHEL 8 virtualization hosts, use the virtual-host profile. This enables more aggressive writeback of dirty memory pages, which benefits the host performance.
Prerequisites
-
The
TuneD
service is installed and enabled.
Procedure
To enable a specific TuneD
profile:
List the available
TuneD
profiles.# tuned-adm list Available profiles: - balanced - General non-specialized TuneD profile - desktop - Optimize for the desktop use-case [...] - virtual-guest - Optimize for running inside a virtual guest - virtual-host - Optimize for running KVM guests Current active profile: balanced
Optional: Create a new
TuneD
profile or edit an existingTuneD
profile.For more information, see Customizing TuneD profiles.
Activate a
TuneD
profile.# tuned-adm profile selected-profile
To optimize a virtualization host, use the virtual-host profile.
# tuned-adm profile virtual-host
On a RHEL guest operating system, use the virtual-guest profile.
# tuned-adm profile virtual-guest
Verification
Display the active profile for
TuneD
.# tuned-adm active Current active profile: virtual-host
Ensure that the
TuneD
profile settings have been applied on your system.# tuned-adm verify Verification succeeded, current system settings match the preset profile. See tuned log file ('/var/log/tuned/tuned.log') for details.
Additional resources
17.3. Virtual machine performance optimization for specific workloads
Virtual machines (VMs) are frequently dedicated to perform a specific workload. You can improve the performance of your VMs by optimizing their configuration for the intended workload.
Use case | IOThread | vCPU pinning | vNUMA pinning | huge pages | multi-queue |
---|---|---|---|---|---|
Database | For database disks | Yes* | Yes* | Yes* | Yes, see: multi-queue virtio-blk, virtio-scsi |
Virtualized Network Function (VNF) | No | Yes | Yes | Yes | Yes, see: multi-queue virtio-net |
High Performance Computing (HPC) | No | Yes | Yes | Yes | No |
Backup Server | For backup disks | No | No | No | Yes, see: multi-queue virtio-blk, virtio-scsi |
VM with many CPUs (Usually more than 32) | No | Yes* | Yes* | No | No |
VM with large RAM (Usually more than 128 GB) | No | No | Yes* | Yes | No |
* If the VM has enough CPUs and RAM to use more than one NUMA node.
A VM can fit in more than one category of use cases. In this situation, you should apply all of the recommended configurations.
17.4. Configuring virtual machine memory
To improve the performance of a virtual machine (VM), you can assign additional host RAM to the VM. Similarly, you can decrease the amount of memory allocated to a VM so the host memory can be allocated to other VMs or tasks.
To perform these actions, you can use the web console or the command line.
17.4.1. Memory overcommitment
Virtual machines (VMs) running on a KVM hypervisor do not have dedicated blocks of physical RAM assigned to them. Instead, each VM functions as a Linux process where the host’s Linux kernel allocates memory only when requested. In addition, the host’s memory manager can move the VM’s memory between its own physical memory and swap space. If memory overcommitment is enabled, the kernel can decide to allocate less physical memory than is requested by a VM, because often the requested amount of memory is not fully used by the VM’s process.
By default, memory overcommitment is enabled in the Linux kernel and the kernel estimates the safe amount of memory overcommitment for VM’s requests. However, the system can still become unstable with frequent overcommitment for memory-intensive workloads.
Memory overcommitment requires you to allocate sufficient swap space on the host physical machine to accommodate all VMs as well as enough memory for the host physical machine’s processes. For instructions on the basic recommended swap space size, see: What is the recommended swap size for Red Hat platforms?
Recommended methods to deal with memory shortages on the host:
- Allocate less memory per VM.
- Add more physical memory to the host.
- Use larger swap space.
A VM will run slower if it is swapped frequently. In addition, overcommitting can cause the system to run out of memory (OOM), which may lead to the Linux kernel shutting down important system processes.
Memory overcommit is not supported with device assignment. This is because when device assignment is in use, all virtual machine memory must be statically pre-allocated to enable direct memory access (DMA) with the assigned device.
Additional resources
17.4.2. Adding and removing virtual machine memory by using the web console
To improve the performance of a virtual machine (VM) or to free up the host resources it is using, you can use the web console to adjust amount of memory allocated to the VM.
Prerequisites
- You have installed the RHEL 8 web console.
- You have enabled the cockpit service.
Your user account is allowed to log in to the web console.
For instructions, see Installing and enabling the web console.
The guest OS is running the memory balloon drivers. To verify this is the case:
Ensure the VM’s configuration includes the
memballoon
device:# virsh dumpxml testguest | grep memballoon <memballoon model='virtio'> </memballoon>
If this commands displays any output and the model is not set to
none
, thememballoon
device is present.Ensure the balloon drivers are running in the guest OS.
-
In Windows guests, the drivers are installed as a part of the
virtio-win
driver package. For instructions, see Installing KVM paravirtualized drivers for Windows virtual machines. -
In Linux guests, the drivers are generally included by default and activate when the
memballoon
device is present.
-
In Windows guests, the drivers are installed as a part of the
- The web console VM plug-in is installed on your system.
Procedure
Optional: Obtain the information about the maximum memory and currently used memory for a VM. This will serve as a baseline for your changes, and also for verification.
# virsh dominfo testguest Max memory: 2097152 KiB Used memory: 2097152 KiB
Log in to the RHEL 8 web console.
For details, see Logging in to the web console.
In the
interface, click the VM whose information you want to see.A new page opens with an Overview section with basic information about the selected VM and a Console section to access the VM’s graphical interface.
Click
next to theMemory
line in the Overview pane.The
Memory Adjustment
dialog appears.Configure the virtual memory for the selected VM.
Maximum allocation - Sets the maximum amount of host memory that the VM can use for its processes. You can specify the maximum memory when creating the VM or increase it later. You can specify memory as multiples of MiB or GiB.
Adjusting maximum memory allocation is only possible on a shut-off VM.
Current allocation - Sets the actual amount of memory allocated to the VM. This value can be less than the Maximum allocation but cannot exceed it. You can adjust the value to regulate the memory available to the VM for its processes. You can specify memory as multiples of MiB or GiB.
If you do not specify this value, the default allocation is the Maximum allocation value.
Click
.The memory allocation of the VM is adjusted.
17.4.3. Adding and removing virtual machine memory by using the command line
To improve the performance of a virtual machine (VM) or to free up the host resources it is using, you can use the CLI to adjust amount of memory allocated to the VM.
Prerequisites
The guest OS is running the memory balloon drivers. To verify this is the case:
Ensure the VM’s configuration includes the
memballoon
device:# virsh dumpxml testguest | grep memballoon <memballoon model='virtio'> </memballoon>
If this commands displays any output and the model is not set to
none
, thememballoon
device is present.Ensure the ballon drivers are running in the guest OS.
-
In Windows guests, the drivers are installed as a part of the
virtio-win
driver package. For instructions, see Installing KVM paravirtualized drivers for Windows virtual machines. -
In Linux guests, the drivers are generally included by default and activate when the
memballoon
device is present.
-
In Windows guests, the drivers are installed as a part of the
Procedure
Optional: Obtain the information about the maximum memory and currently used memory for a VM. This will serve as a baseline for your changes, and also for verification.
# virsh dominfo testguest Max memory: 2097152 KiB Used memory: 2097152 KiB
Adjust the maximum memory allocated to a VM. Increasing this value improves the performance potential of the VM, and reducing the value lowers the performance footprint the VM has on your host. Note that this change can only be performed on a shut-off VM, so adjusting a running VM requires a reboot to take effect.
For example, to change the maximum memory that the testguest VM can use to 4096 MiB:
# virt-xml testguest --edit --memory memory=4096,currentMemory=4096 Domain 'testguest' defined successfully. Changes will take effect after the domain is fully powered off.
To increase the maximum memory of a running VM, you can attach a memory device to the VM. This is also referred to as memory hot plug. For details, see Attaching devices to virtual machines.
WarningRemoving memory devices from a running VM (also referred as a memory hot unplug) is not supported, and highly discouraged by Red Hat.
Optional: You can also adjust the memory currently used by the VM, up to the maximum allocation. This regulates the memory load that the VM has on the host until the next reboot, without changing the maximum VM allocation.
# virsh setmem testguest --current 2048
Verification
Confirm that the memory used by the VM has been updated:
# virsh dominfo testguest Max memory: 4194304 KiB Used memory: 2097152 KiB
Optional: If you adjusted the current VM memory, you can obtain the memory balloon statistics of the VM to evaluate how effectively it regulates its memory use.
# virsh domstats --balloon testguest Domain: 'testguest' balloon.current=365624 balloon.maximum=4194304 balloon.swap_in=0 balloon.swap_out=0 balloon.major_fault=306 balloon.minor_fault=156117 balloon.unused=3834448 balloon.available=4035008 balloon.usable=3746340 balloon.last-update=1587971682 balloon.disk_caches=75444 balloon.hugetlb_pgalloc=0 balloon.hugetlb_pgfail=0 balloon.rss=1005456
17.4.4. Configuring virtual machines to use huge pages
In certain use cases, you can improve memory allocation for your virtual machines (VMs) by using huge pages instead of the default 4 KiB memory pages. For example, huge pages can improve performance for VMs with high memory utilization, such as database servers.
Prerequisites
- The host is configured to use huge pages in memory allocation. For instructions, see: Configuring HugeTLB at boot time
Procedure
- Shut down the selected VM if it is running.
To configure a VM to use 1 GiB huge pages, open the XML definition of a VM for editing. For example, to edit a
testguest
VM, run the following command:# virsh edit testguest
Add the following lines to the
<memoryBacking>
section in the XML definition:<memoryBacking> <hugepages> <page size='1' unit='GiB'/> </hugepages> </memoryBacking>
Verification
- Start the VM.
Confirm that the host has successfully allocated huge pages for the running VM. On the host, run the following command:
# cat /proc/meminfo | grep Huge HugePages_Total: 4 HugePages_Free: 2 HugePages_Rsvd: 1 Hugepagesize: 1024000 kB
When you add together the number of free and reserved huge pages (
HugePages_Free
+HugePages_Rsvd
), the result should be less than the total number of huge pages (HugePages_Total
). The difference is the number of huge pages that is used by the running VM.
Additional resources
17.4.5. Additional resources
17.5. Optimizing virtual machine I/O performance
The input and output (I/O) capabilities of a virtual machine (VM) can significantly limit the VM’s overall efficiency. To address this, you can optimize a VM’s I/O by configuring block I/O parameters.
17.5.1. Tuning block I/O in virtual machines
When multiple block devices are being used by one or more VMs, it might be important to adjust the I/O priority of specific virtual devices by modifying their I/O weights.
Increasing the I/O weight of a device increases its priority for I/O bandwidth, and therefore provides it with more host resources. Similarly, reducing a device’s weight makes it consume less host resources.
Each device’s weight
value must be within the 100
to 1000
range. Alternatively, the value can be 0
, which removes that device from per-device listings.
Procedure
To display and set a VM’s block I/O parameters:
Display the current
<blkio>
parameters for a VM:# virsh dumpxml VM-name
<domain> [...] <blkiotune> <weight>800</weight> <device> <path>/dev/sda</path> <weight>1000</weight> </device> <device> <path>/dev/sdb</path> <weight>500</weight> </device> </blkiotune> [...] </domain>
Edit the I/O weight of a specified device:
# virsh blkiotune VM-name --device-weights device, I/O-weight
For example, the following changes the weight of the /dev/sda device in the testguest1 VM to 500.
# virsh blkiotune testguest1 --device-weights /dev/sda, 500
Verification
Check that the VM’s block I/O parameters have been configured correctly.
# virsh blkiotune testguest1 Block I/O tuning parameters for domain testguest1: weight : 800 device_weight : [ {"sda": 500}, ] ...
ImportantCertain kernels do not support setting I/O weights for specific devices. If the previous step does not display the weights as expected, it is likely that this feature is not compatible with your host kernel.
17.5.2. Disk I/O throttling in virtual machines
When several VMs are running simultaneously, they can interfere with system performance by using excessive disk I/O. Disk I/O throttling in KVM virtualization provides the ability to set a limit on disk I/O requests sent from the VMs to the host machine. This can prevent a VM from over-utilizing shared resources and impacting the performance of other VMs.
To enable disk I/O throttling, set a limit on disk I/O requests sent from each block device attached to VMs to the host machine.
Procedure
Use the
virsh domblklist
command to list the names of all the disk devices on a specified VM.# virsh domblklist rollin-coal Target Source ------------------------------------------------ vda /var/lib/libvirt/images/rollin-coal.qcow2 sda - sdb /home/horridly-demanding-processes.iso
Find the host block device where the virtual disk that you want to throttle is mounted.
For example, if you want to throttle the
sdb
virtual disk from the previous step, the following output shows that the disk is mounted on the/dev/nvme0n1p3
partition.$ lsblk NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT zram0 252:0 0 4G 0 disk [SWAP] nvme0n1 259:0 0 238.5G 0 disk ├─nvme0n1p1 259:1 0 600M 0 part /boot/efi ├─nvme0n1p2 259:2 0 1G 0 part /boot └─nvme0n1p3 259:3 0 236.9G 0 part └─luks-a1123911-6f37-463c-b4eb-fxzy1ac12fea 253:0 0 236.9G 0 crypt /home
Set I/O limits for the block device by using the
virsh blkiotune
command.# virsh blkiotune VM-name --parameter device,limit
The following example throttles the
sdb
disk on therollin-coal
VM to 1000 read and write I/O operations per second and to 50 MB per second read and write throughput.# virsh blkiotune rollin-coal --device-read-iops-sec /dev/nvme0n1p3,1000 --device-write-iops-sec /dev/nvme0n1p3,1000 --device-write-bytes-sec /dev/nvme0n1p3,52428800 --device-read-bytes-sec /dev/nvme0n1p3,52428800
Additional information
- Disk I/O throttling can be useful in various situations, for example when VMs belonging to different customers are running on the same host, or when quality of service guarantees are given for different VMs. Disk I/O throttling can also be used to simulate slower disks.
- I/O throttling can be applied independently to each block device attached to a VM and supports limits on throughput and I/O operations.
-
Red Hat does not support using the
virsh blkdeviotune
command to configure I/O throttling in VMs. For more information about unsupported features when using RHEL 8 as a VM host, see Unsupported features in RHEL 8 virtualization.
17.5.3. Enabling multi-queue on storage devices
When using virtio-blk
or virtio-scsi
storage devices in your virtual machines (VMs), the multi-queue feature provides improved storage performance and scalability. It enables each virtual CPU (vCPU) to have a separate queue and interrupt to use without affecting other vCPUs.
The multi-queue feature is enabled by default for the Q35
machine type, however you must enable it manually on the i440FX
machine type. You can tune the number of queues to be optimal for your workload, however the optimal number differs for each type of workload and you must test which number of queues works best in your case.
Procedure
To enable
multi-queue
on a storage device, edit the XML configuration of the VM.# virsh edit <example_vm>
In the XML configuration, find the intended storage device and change the
queues
parameter to use multiple I/O queues. Replace N with the number of vCPUs in the VM, up to 16.A
virtio-blk
example:<disk type='block' device='disk'> <driver name='qemu' type='raw' queues='N'/> <source dev='/dev/sda'/> <target dev='vda' bus='virtio'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x0'/> </disk>
A
virtio-scsi
example:<controller type='scsi' index='0' model='virtio-scsi'> <driver queues='N' /> </controller>
- Restart the VM for the changes to take effect.
17.5.4. Configuring dedicated IOThreads
To improve the Input/Output (IO) performance of a disk on your virtual machine (VM), you can configure a dedicated IOThread
that is used to manage the IO operations of the VM’s disk.
Normally, the IO operations of a disk are a part of the main QEMU thread, which can decrease the responsiveness of the VM as a whole during intensive IO workloads. By separating the IO operations to a dedicated IOThread
, you can significantly increase the responsiveness and performance of your VM.
Procedure
- Shut down the selected VM if it is running.
On the host, add or edit the
<iothreads>
tag in the XML configuration of the VM. For example, to create a singleIOThread
for atestguest1
VM:# virsh edit <testguest1> <domain type='kvm'> <name>testguest1</name> ... <vcpu placement='static'>8</vcpu> <iothreads>1</iothreads> ... </domain>
NoteFor optimal results, use only 1-2
IOThreads
per CPU on the host.Assign a dedicated
IOThread`
to a VM disk. For example, to assign anIOThread
with ID of1
to a disk on thetestguest1
VM:# virsh edit <testguest1> <domain type='kvm'> <name>testguest1</name> ... <devices> <disk type='file' device='disk'> <driver name='qemu' type='raw' cache='none' io='native' iothread='1'/> <source file='/var/lib/libvirt/images/test-disk.raw'/> <target dev='vda' bus='virtio'/> <address type='pci' domain='0x0000' bus='0x04' slot='0x00' function='0x0'/> </disk> ... </devices> ... </domain>
NoteIOThread
IDs start from 1 and you must dedicate only a singleIOThread
to a disk.Usually, a one dedicated
IOThread
per VM is sufficient for optimal performance.When using
virtio-scsi
storage devices, assign a dedicatedIOThread`
to thevirtio-scsi
controller. For example, to assign anIOThread
with ID of1
to a controller on thetestguest1
VM:# virsh edit <testguest1> <domain type='kvm'> <name>testguest1</name> ... <devices> <controller type='scsi' index='0' model='virtio-scsi'> <driver iothread='1'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x0b' function='0x0'/> </controller> ... </devices> ... </domain>
Verification
- Evaluate the impact of your changes on your VM performance. For details, see: Virtual machine performance monitoring tools
17.5.5. Configuring virtual disk caching
KVM provides several virtual disk caching modes. For intensive Input/Output (IO) workloads, selecting the optimal caching mode can significantly increase the virtual machine (VM) performance.
Virtual disk cache modes overview
writethrough
- Host page cache is used for reading only. Writes are reported as completed only when the data has been committed to the storage device. The sustained IO performance is decreased but this mode has good write guarantees.
writeback
-
Host page cache is used for both reading and writing. Writes are reported as complete when data reaches the host’s memory cache, not physical storage. This mode has faster IO performance than
writethrough
but it is possible to lose data on host failure. none
- Host page cache is bypassed entirely. This mode relies directly on the write queue of the physical disk, so it has a predictable sustained IO performance and offers good write guarantees on a stable guest. It is also a safe cache mode for VM live migration.
Procedure
- Shut down the selected VM if it is running.
Edit the XML configuration of the selected VM.
# virsh edit <vm_name>
Find the disk device and edit the
cache
option in thedriver
tag.<domain type='kvm'> <name>testguest1</name> ... <devices> <disk type='file' device='disk'> <driver name='qemu' type='raw' cache='none' io='native' iothread='1'/> <source file='/var/lib/libvirt/images/test-disk.raw'/> <target dev='vda' bus='virtio'/> <address type='pci' domain='0x0000' bus='0x04' slot='0x00' function='0x0'/> </disk> ... </devices> ... </domain>
17.6. Optimizing virtual machine CPU performance
Much like physical CPUs in host machines, vCPUs are critical to virtual machine (VM) performance. As a result, optimizing vCPUs can have a significant impact on the resource efficiency of your VMs. To optimize your vCPU:
- Adjust how many host CPUs are assigned to the VM. You can do this using the CLI or the web console.
Ensure that the vCPU model is aligned with the CPU model of the host. For example, to set the testguest1 VM to use the CPU model of the host:
# virt-xml testguest1 --edit --cpu host-model
- Deactivate kernel same-page merging (KSM).
If your host machine uses Non-Uniform Memory Access (NUMA), you can also configure NUMA for its VMs. This maps the host’s CPU and memory processes onto the CPU and memory processes of the VM as closely as possible. In effect, NUMA tuning provides the vCPU with a more streamlined access to the system memory allocated to the VM, which can improve the vCPU processing effectiveness.
For details, see Configuring NUMA in a virtual machine and Virtual machine performance optimization for specific workloads.
17.6.1. vCPU overcommitment
vCPU overcommitment allows you to have a setup where the sum of all vCPUs in virtual machines (VMs) running on a host exceeds the number of physical CPUs on the host. However, you might experience performance deterioration when simultaneously running more cores in your VMs than are physically available on the host.
For best performance, assign VMs with only as many vCPUs as are required to run the intended workloads in each VM.
vCPU overcommitment recommendations:
- Assign the minimum amount of vCPUs required by by the VM’s workloads for best performance.
- Avoid overcommitting vCPUs in production without extensive testing.
- If overcomitting vCPUs, the safe ratio is typically 5 vCPUs to 1 physical CPU for loads under 100%.
- It is not recommended to have more than 10 total allocated vCPUs per physical processor core.
- Monitor CPU usage to prevent performance degradation under heavy loads.
Applications that use 100% of memory or processing resources may become unstable in overcommitted environments. Do not overcommit memory or CPUs in a production environment without extensive testing, as the CPU overcommit ratio is workload-dependent.
17.6.2. Adding and removing virtual CPUs by using the command line
To increase or optimize the CPU performance of a virtual machine (VM), you can add or remove virtual CPUs (vCPUs) assigned to the VM.
When performed on a running VM, this is also referred to as vCPU hot plugging and hot unplugging. However, note that vCPU hot unplug is not supported in RHEL 8, and Red Hat highly discourages its use.
Prerequisites
Optional: View the current state of the vCPUs in the targeted VM. For example, to display the number of vCPUs on the testguest VM:
# virsh vcpucount testguest maximum config 4 maximum live 2 current config 2 current live 1
This output indicates that testguest is currently using 1 vCPU, and 1 more vCPu can be hot plugged to it to increase the VM’s performance. However, after reboot, the number of vCPUs testguest uses will change to 2, and it will be possible to hot plug 2 more vCPUs.
Procedure
Adjust the maximum number of vCPUs that can be attached to a VM, which takes effect on the VM’s next boot.
For example, to increase the maximum vCPU count for the testguest VM to 8:
# virsh setvcpus testguest 8 --maximum --config
Note that the maximum may be limited by the CPU topology, host hardware, the hypervisor, and other factors.
Adjust the current number of vCPUs attached to a VM, up to the maximum configured in the previous step. For example:
To increase the number of vCPUs attached to the running testguest VM to 4:
# virsh setvcpus testguest 4 --live
This increases the VM’s performance and host load footprint of testguest until the VM’s next boot.
To permanently decrease the number of vCPUs attached to the testguest VM to 1:
# virsh setvcpus testguest 1 --config
This decreases the VM’s performance and host load footprint of testguest after the VM’s next boot. However, if needed, additional vCPUs can be hot plugged to the VM to temporarily increase its performance.
Verification
Confirm that the current state of vCPU for the VM reflects your changes.
# virsh vcpucount testguest maximum config 8 maximum live 4 current config 1 current live 4
Additional resources
17.6.3. Managing virtual CPUs by using the web console
By using the RHEL 8 web console, you can review and configure virtual CPUs used by virtual machines (VMs) to which the web console is connected.
Prerequisites
- You have installed the RHEL 8 web console.
- You have enabled the cockpit service.
Your user account is allowed to log in to the web console.
For instructions, see Installing and enabling the web console.
- The web console VM plug-in is installed on your system.
Procedure
Log in to the RHEL 8 web console.
For details, see Logging in to the web console.
In the
interface, click the VM whose information you want to see.A new page opens with an Overview section with basic information about the selected VM and a Console section to access the VM’s graphical interface.
Click
next to the number of vCPUs in the Overview pane.The vCPU details dialog appears.
Configure the virtual CPUs for the selected VM.
vCPU Count - The number of vCPUs currently in use.
NoteThe vCPU count cannot be greater than the vCPU Maximum.
- vCPU Maximum - The maximum number of virtual CPUs that can be configured for the VM. If this value is higher than the vCPU Count, additional vCPUs can be attached to the VM.
- Sockets - The number of sockets to expose to the VM.
- Cores per socket - The number of cores for each socket to expose to the VM.
Threads per core - The number of threads for each core to expose to the VM.
Note that the Sockets, Cores per socket, and Threads per core options adjust the CPU topology of the VM. This may be beneficial for vCPU performance and may impact the functionality of certain software in the guest OS. If a different setting is not required by your deployment, keep the default values.
Click
.The virtual CPUs for the VM are configured.
NoteChanges to virtual CPU settings only take effect after the VM is restarted.
Additional resources
17.6.4. Configuring NUMA in a virtual machine
The following methods can be used to configure Non-Uniform Memory Access (NUMA) settings of a virtual machine (VM) on a RHEL 8 host.
For ease of use, you can set up a VM’s NUMA configuration by using automated utilities and services. However, manual NUMA setup is more likely to yield a significant performance improvement.
Prerequisites
The host is a NUMA-compatible machine. To detect whether this is the case, use the
virsh nodeinfo
command and see theNUMA cell(s)
line:# virsh nodeinfo CPU model: x86_64 CPU(s): 48 CPU frequency: 1200 MHz CPU socket(s): 1 Core(s) per socket: 12 Thread(s) per core: 2 NUMA cell(s): 2 Memory size: 67012964 KiB
If the value of the line is 2 or greater, the host is NUMA-compatible.
Optional: You have the
numactl
package installed on the host.# yum install numactl
Procedure
Automatic methods
Set the VM’s NUMA policy to
Preferred
. For example, to configure the testguest5 VM:# virt-xml testguest5 --edit --vcpus placement=auto # virt-xml testguest5 --edit --numatune mode=preferred
Use the
numad
service to automatically align the VM CPU with memory resources.# echo 1 > /proc/sys/kernel/numa_balancing
Start the
numad
service to automatically align the VM CPU with memory resources.# systemctl start numad
Manual methods
To manually tune NUMA settings, you can specify which host NUMA nodes will be assigned specifically to a certain VM. This can improve the host memory usage by the VM’s vCPU.
Optional: Use the
numactl
command to view the NUMA topology on the host:# numactl --hardware available: 2 nodes (0-1) node 0 size: 18156 MB node 0 free: 9053 MB node 1 size: 18180 MB node 1 free: 6853 MB node distances: node 0 1 0: 10 20 1: 20 10
Edit the XML configuration of a VM to assign CPU and memory resources to specific NUMA nodes. For example, the following configuration sets testguest6 to use vCPUs 0-7 on NUMA node
0
and vCPUS 8-15 on NUMA node1
. Both nodes are also assigned 16 GiB of VM’s memory.# virsh edit <testguest6> <domain type='kvm'> <name>testguest6</name> ... <vcpu placement='static'>16</vcpu> ... <cpu ...> <numa> <cell id='0' cpus='0-7' memory='16' unit='GiB'/> <cell id='1' cpus='8-15' memory='16' unit='GiB'/> </numa> ... </domain>
- If the VM is running, restart it to apply the configuration.
For best performance results, it is recommended to respect the maximum memory size for each NUMA node on the host.
Additional resources
17.6.5. Configuring virtual CPU pinning
To improve the CPU performance of a virtual machine (VM), you can pin a virtual CPU (vCPU) to a specific physical CPU thread on the host. This ensures that the vCPU will have its own dedicated physical CPU thread, which can significantly improve the vCPU performance.
To further optimize the CPU performance, you can also pin QEMU process threads associated with a specified VM to a specific host CPU.
Procedure
Check the CPU topology on the host:
# lscpu -p=node,cpu Node,CPU 0,0 0,1 0,2 0,3 0,4 0,5 0,6 0,7 1,0 1,1 1,2 1,3 1,4 1,5 1,6 1,7
In this example, the output contains NUMA nodes and the available physical CPU threads on the host.
Check the number of vCPU threads inside the VM:
# lscpu -p=node,cpu Node,CPU 0,0 0,1 0,2 0,3
In this example, the output contains NUMA nodes and the available vCPU threads inside the VM.
Pin specific vCPU threads from a VM to a specific host CPU or range of CPUs. This is recommended as a safe method of vCPU performance improvement.
For example, the following commands pin vCPU threads 0 to 3 of the testguest6 VM to host CPUs 1, 3, 5, 7, respectively:
# virsh vcpupin testguest6 0 1 # virsh vcpupin testguest6 1 3 # virsh vcpupin testguest6 2 5 # virsh vcpupin testguest6 3 7
Optional: Verify whether the vCPU threads are successfully pinned to CPUs.
# virsh vcpupin testguest6 VCPU CPU Affinity ---------------------- 0 1 1 3 2 5 3 7
After pinning vCPU threads, you can also pin QEMU process threads associated with a specified VM to a specific host CPU or range of CPUs. This can further help the QEMU process to run more efficiently on the physical CPU.
For example, the following commands pin the QEMU process thread of testguest6 to CPUs 2 and 4, and verify this was successful:
# virsh emulatorpin testguest6 2,4 # virsh emulatorpin testguest6 emulator: CPU Affinity ---------------------------------- *: 2,4
17.6.6. Configuring virtual CPU capping
You can use virtual CPU (vCPU) capping to limit the amount of CPU resources a virtual machine (VM) can use. vCPU capping can improve the overall performance by preventing excessive use of host’s CPU resources by a single VM and by making it easier for the hypervisor to manage CPU scheduling.
Procedure
View the current vCPU scheduling configuration on the host.
# virsh schedinfo <vm_name> Scheduler : posix cpu_shares : 0 vcpu_period : 0 vcpu_quota : 0 emulator_period: 0 emulator_quota : 0 global_period : 0 global_quota : 0 iothread_period: 0 iothread_quota : 0
To configure an absolute vCPU cap for a VM, set the
vcpu_period
andvcpu_quota
parameters. Both parameters use a numerical value that represents a time duration in microseconds.Set the
vcpu_period
parameter by using thevirsh schedinfo
command. For example:# virsh schedinfo <vm_name> --set vcpu_period=100000
In this example, the
vcpu_period
is set to 100,000 microseconds, which means the scheduler enforces vCPU capping during this time interval.You can also use the
--live --config
options to configure a running VM without restarting it.Set the
vcpu_quota
parameter by using thevirsh schedinfo
command. For example:# virsh schedinfo <vm_name> --set vcpu_quota=50000
In this example, the
vcpu_quota
is set to 50,000 microseconds, which specifies the maximum amount of CPU time that the VM can use during thevcpu_period
time interval. In this case,vcpu_quota
is set as the half ofvcpu_period
, so the VM can use up to 50% of the CPU time during that interval.You can also use the
--live --config
options to configure a running VM without restarting it.
Verification
Check that the vCPU scheduling parameters have the correct values.
# virsh schedinfo <vm_name> Scheduler : posix cpu_shares : 2048 vcpu_period : 100000 vcpu_quota : 50000 ...
17.6.7. Tuning CPU weights
The CPU weight (or CPU shares) setting controls how much CPU time a virtual machine (VM) receives compared to other running VMs. By increasing the CPU weight of a specific VM, you can ensure that this VM gets more CPU time relative to other VMs. To prioritize CPU time allocation between multiple VMs, set the cpu_shares
parameter
The possible CPU weight values range from 0 to 262144 and the default value for a new KVM VM is 1024.
Procedure
Check the current CPU weight of a VM.
# virsh schedinfo <vm_name> Scheduler : posix cpu_shares : 1024 vcpu_period : 0 vcpu_quota : 0 emulator_period: 0 emulator_quota : 0 global_period : 0 global_quota : 0 iothread_period: 0 iothread_quota : 0
Adjust the CPU weight to a preferred value.
# virsh schedinfo <vm_name> --set cpu_shares=2048 Scheduler : posix cpu_shares : 2048 vcpu_period : 0 vcpu_quota : 0 emulator_period: 0 emulator_quota : 0 global_period : 0 global_quota : 0 iothread_period: 0 iothread_quota : 0
In this example,
cpu_shares
is set to 2048. This means that if all other VMs have the value set to 1024, this VM gets approximately twice the amount of CPU time.You can also use the
--live --config
options to configure a running VM without restarting it.
17.6.8. Disabling kernel same-page merging
Kernel Same-Page Merging (KSM) improves memory density by sharing identical memory pages between virtual machines (VMs).
However, using KSM increases CPU utilization, and might negatively affect overall performance depending on the workload.
In RHEL 8, KSM is enabled by default. Therefore, if the CPU performance in your VM deployment is sub-optimal, you can improve this by disabling KSM.
Prerequisites
- Root access to your host system.
Procedure
- Monitor the performance and resource consumption of VMs on your host to evaluate the benefits of KSM. Specifically, ensure that the additional CPU usage by KSM does not offset the memory improvements and does not cause additional performance issues. In latency-sensitive workloads, also pay attention to cross-NUMA page merges.
Optional: If KSM has not improved your VM performance, disable it:
To disable KSM for a single session, use the
systemctl
utility to stopksm
andksmtuned
services.# systemctl stop ksm # systemctl stop ksmtuned
To disable KSM persistently, use the
systemctl
utility to disableksm
andksmtuned
services.# systemctl disable ksm Removed /etc/systemd/system/multi-user.target.wants/ksm.service. # systemctl disable ksmtuned Removed /etc/systemd/system/multi-user.target.wants/ksmtuned.service.
NoteMemory pages shared between VMs before deactivating KSM will remain shared. To stop sharing, delete all the
PageKSM
pages in the system by using the following command:# echo 2 > /sys/kernel/mm/ksm/run
However, this command increases memory usage, and might cause performance problems on your host or your VMs.
Verification
- Monitor the performance and resource consumption of VMs on your host to evaluate the benefits of deactivating KSM. For instructions, see Virtual machine performance monitoring tools.
17.7. Optimizing virtual machine network performance
Due to the virtual nature of a VM’s network interface controller (NIC), the VM loses a portion of its allocated host network bandwidth, which can reduce the overall workload efficiency of the VM. The following tips can minimize the negative impact of virtualization on the virtual NIC (vNIC) throughput.
Procedure
Use any of the following methods and observe if it has a beneficial effect on your VM network performance:
- Enable the vhost_net module
On the host, ensure the
vhost_net
kernel feature is enabled:# lsmod | grep vhost vhost_net 32768 1 vhost 53248 1 vhost_net tap 24576 1 vhost_net tun 57344 6 vhost_net
If the output of this command is blank, enable the
vhost_net
kernel module:# modprobe vhost_net
- Set up multi-queue virtio-net
To set up the multi-queue virtio-net feature for a VM, use the
virsh edit
command to edit to the XML configuration of the VM. In the XML, add the following to the<devices>
section, and replaceN
with the number of vCPUs in the VM, up to 16:<interface type='network'> <source network='default'/> <model type='virtio'/> <driver name='vhost' queues='N'/> </interface>
If the VM is running, restart it for the changes to take effect.
- Batching network packets
In Linux VM configurations with a long transmission path, batching packets before submitting them to the kernel may improve cache utilization. To set up packet batching, use the following command on the host, and replace tap0 with the name of the network interface that the VMs use:
# ethtool -C tap0 rx-frames 64
- SR-IOV
- If your host NIC supports SR-IOV, use SR-IOV device assignment for your vNICs. For more information, see Managing SR-IOV devices.
Additional resources
17.8. Virtual machine performance monitoring tools
To identify what consumes the most VM resources and which aspect of VM performance needs optimization, performance diagnostic tools, both general and VM-specific, can be used.
Default OS performance monitoring tools
For standard performance evaluation, you can use the utilities provided by default by your host and guest operating systems:
On your RHEL 8 host, as root, use the
top
utility or the system monitor application, and look forqemu
andvirt
in the output. This shows how much host system resources your VMs are consuming.-
If the monitoring tool displays that any of the
qemu
orvirt
processes consume a large portion of the host CPU or memory capacity, use theperf
utility to investigate. For details, see below. -
In addition, if a
vhost_net
thread process, named for example vhost_net-1234, is displayed as consuming an excessive amount of host CPU capacity, consider using virtual network optimization features, such asmulti-queue virtio-net
.
-
If the monitoring tool displays that any of the
On the guest operating system, use performance utilities and applications available on the system to evaluate which processes consume the most system resources.
-
On Linux systems, you can use the
top
utility. - On Windows systems, you can use the Task Manager application.
-
On Linux systems, you can use the
perf kvm
You can use the perf
utility to collect and analyze virtualization-specific statistics about the performance of your RHEL 8 host. To do so:
On the host, install the perf package:
# yum install perf
Use one of the
perf kvm stat
commands to display perf statistics for your virtualization host:-
For real-time monitoring of your hypervisor, use the
perf kvm stat live
command. -
To log the perf data of your hypervisor over a period of time, activate the logging by using the
perf kvm stat record
command. After the command is canceled or interrupted, the data is saved in theperf.data.guest
file, which can be analyzed by using theperf kvm stat report
command.
-
For real-time monitoring of your hypervisor, use the
Analyze the
perf
output for types ofVM-EXIT
events and their distribution. For example, thePAUSE_INSTRUCTION
events should be infrequent, but in the following output, the high occurrence of this event suggests that the host CPUs are not handling the running vCPUs well. In such a scenario, consider shutting down some of your active VMs, removing vCPUs from these VMs, or tuning the performance of the vCPUs.# perf kvm stat report Analyze events for all VMs, all VCPUs: VM-EXIT Samples Samples% Time% Min Time Max Time Avg time EXTERNAL_INTERRUPT 365634 31.59% 18.04% 0.42us 58780.59us 204.08us ( +- 0.99% ) MSR_WRITE 293428 25.35% 0.13% 0.59us 17873.02us 1.80us ( +- 4.63% ) PREEMPTION_TIMER 276162 23.86% 0.23% 0.51us 21396.03us 3.38us ( +- 5.19% ) PAUSE_INSTRUCTION 189375 16.36% 11.75% 0.72us 29655.25us 256.77us ( +- 0.70% ) HLT 20440 1.77% 69.83% 0.62us 79319.41us 14134.56us ( +- 0.79% ) VMCALL 12426 1.07% 0.03% 1.02us 5416.25us 8.77us ( +- 7.36% ) EXCEPTION_NMI 27 0.00% 0.00% 0.69us 1.34us 0.98us ( +- 3.50% ) EPT_MISCONFIG 5 0.00% 0.00% 5.15us 10.85us 7.88us ( +- 11.67% ) Total Samples:1157497, Total events handled time:413728274.66us.
Other event types that can signal problems in the output of
perf kvm stat
include:-
INSN_EMULATION
- suggests suboptimal VM I/O configuration.
-
For more information about using perf
to monitor virtualization performance, see the perf-kvm
man page on your system.
numastat
To see the current NUMA configuration of your system, you can use the numastat
utility, which is provided by installing the numactl package.
The following shows a host with 4 running VMs, each obtaining memory from multiple NUMA nodes. This is not optimal for vCPU performance, and warrants adjusting:
# numastat -c qemu-kvm
Per-node process memory usage (in MBs)
PID Node 0 Node 1 Node 2 Node 3 Node 4 Node 5 Node 6 Node 7 Total
--------------- ------ ------ ------ ------ ------ ------ ------ ------ -----
51722 (qemu-kvm) 68 16 357 6936 2 3 147 598 8128
51747 (qemu-kvm) 245 11 5 18 5172 2532 1 92 8076
53736 (qemu-kvm) 62 432 1661 506 4851 136 22 445 8116
53773 (qemu-kvm) 1393 3 1 2 12 0 0 6702 8114
--------------- ------ ------ ------ ------ ------ ------ ------ ------ -----
Total 1769 463 2024 7462 10037 2672 169 7837 32434
In contrast, the following shows memory being provided to each VM by a single node, which is significantly more efficient.
# numastat -c qemu-kvm
Per-node process memory usage (in MBs)
PID Node 0 Node 1 Node 2 Node 3 Node 4 Node 5 Node 6 Node 7 Total
--------------- ------ ------ ------ ------ ------ ------ ------ ------ -----
51747 (qemu-kvm) 0 0 7 0 8072 0 1 0 8080
53736 (qemu-kvm) 0 0 7 0 0 0 8113 0 8120
53773 (qemu-kvm) 0 0 7 0 0 0 1 8110 8118
59065 (qemu-kvm) 0 0 8050 0 0 0 0 0 8051
--------------- ------ ------ ------ ------ ------ ------ ------ ------ -----
Total 0 0 8072 0 8072 0 8114 8110 32368