17.6. Optimizing virtual machine CPU performance

PDF

Much like physical CPUs in host machines, vCPUs are critical to virtual machine (VM) performance. As a result, optimizing vCPUs can have a significant impact on the resource efficiency of your VMs. To optimize your vCPU:

Adjust how many host CPUs are assigned to the VM. You can do this using the CLI or the web console.
Ensure that the vCPU model is aligned with the CPU model of the host. For example, to set the testguest1 VM to use the CPU model of the host:
```
# virt-xml testguest1 --edit --cpu host-model
```
Manage kernel same-page merging (KSM).
If your host machine uses Non-Uniform Memory Access (NUMA), you can also configure NUMA for its VMs. This maps the host’s CPU and memory processes onto the CPU and memory processes of the VM as closely as possible. In effect, NUMA tuning provides the vCPU with a more streamlined access to the system memory allocated to the VM, which can improve the vCPU processing effectiveness.
For details, see Configuring NUMA in a virtual machine and Sample vCPU performance tuning scenario.

17.6.1. Adding and removing virtual CPUs using the command-line interface

To increase or optimize the CPU performance of a virtual machine (VM), you can add or remove virtual CPUs (vCPUs) assigned to the VM.

When performed on a running VM, this is also referred to as vCPU hot plugging and hot unplugging. However, note that vCPU hot unplug is not supported in RHEL 9, and Red Hat highly discourages its use.

Conditions préalables

Optional: View the current state of the vCPUs in the targeted VM. For example, to display the number of vCPUs on the testguest VM:
```
# virsh vcpucount testguest
maximum      config         4
maximum      live           2
current      config         2
current      live           1
```
This output indicates that testguest is currently using 1 vCPU, and 1 more vCPu can be hot plugged to it to increase the VM’s performance. However, after reboot, the number of vCPUs testguest uses will change to 2, and it will be possible to hot plug 2 more vCPUs.

Procédure

Adjust the maximum number of vCPUs that can be attached to a VM, which takes effect on the VM’s next boot.
For example, to increase the maximum vCPU count for the testguest VM to 8:
```
# virsh setvcpus testguest 8 --maximum --config
```
Note that the maximum may be limited by the CPU topology, host hardware, the hypervisor, and other factors.
Adjust the current number of vCPUs attached to a VM, up to the maximum configured in the previous step. For example:
- To increase the number of vCPUs attached to the running testguest VM to 4:
```
# virsh setvcpus testguest 4 --live
```
  This increases the VM’s performance and host load footprint of testguest until the VM’s next boot.
- To permanently decrease the number of vCPUs attached to the testguest VM to 1:
```
# virsh setvcpus testguest 1 --config
```
  This decreases the VM’s performance and host load footprint of testguest after the VM’s next boot. However, if needed, additional vCPUs can be hot plugged to the VM to temporarily increase its performance.

Vérification

Confirm that the current state of vCPU for the VM reflects your changes.

# virsh vcpucount testguest
maximum      config         8
maximum      live           4
current      config         1
current      live           4

Ressources supplémentaires

Managing virtual CPUs using the web console

17.6.2. Managing virtual CPUs using the web console

Using the RHEL 9 web console, you can review and configure virtual CPUs used by virtual machines (VMs) to which the web console is connected.

Conditions préalables

The web console VM plug-in is installed on your system.

Procédure

In the Virtual Machines interface, click the VM whose information you want to see.
A new page opens with an Overview section with basic information about the selected VM and a Console section to access the VM’s graphical interface.
Click edit next to the number of vCPUs in the Overview pane.
The vCPU details dialog appears.

Configure the virtual CPUs for the selected VM.
- vCPU Count - The number of vCPUs currently in use.
  Note
  The vCPU count cannot be greater than the vCPU Maximum.
- vCPU Maximum - The maximum number of virtual CPUs that can be configured for the VM. If this value is higher than the vCPU Count, additional vCPUs can be attached to the VM.
- Sockets - The number of sockets to expose to the VM.
- Cores per socket - The number of cores for each socket to expose to the VM.
- Threads per core - The number of threads for each core to expose to the VM.
  Note that the Sockets, Cores per socket, and Threads per core options adjust the CPU topology of the VM. This may be beneficial for vCPU performance and may impact the functionality of certain software in the guest OS. If a different setting is not required by your deployment, keep the default values.
Cliquez sur Appliquer.
The virtual CPUs for the VM are configured.
Note
Changes to virtual CPU settings only take effect after the VM is restarted.

Ressources supplémentaires

Adding and removing virtual CPUs using the command-line interface

17.6.3. Configuring NUMA in a virtual machine

The following methods can be used to configure Non-Uniform Memory Access (NUMA) settings of a virtual machine (VM) on a RHEL 9 host.

Conditions préalables

The host is a NUMA-compatible machine. To detect whether this is the case, use the virsh nodeinfo command and see the NUMA cell(s) line:

# virsh nodeinfo
CPU model:           x86_64
CPU(s):              48
CPU frequency:       1200 MHz
CPU socket(s):       1
Core(s) per socket:  12
Thread(s) per core:  2
NUMA cell(s):        2
Memory size:         67012964 KiB

If the value of the line is 2 or greater, the host is NUMA-compatible.

Procédure

For ease of use, you can set up a VM’s NUMA configuration using automated utilities and services. However, manual NUMA setup is more likely to yield a significant performance improvement.

Automatic methods

Set the VM’s NUMA policy to Preferred. For example, to do so for the testguest5 VM:

# virt-xml testguest5 --edit --vcpus placement=auto
# virt-xml testguest5 --edit --numatune mode=preferred

Enable automatic NUMA balancing on the host:
```
# echo 1 > /proc/sys/kernel/numa_balancing
```
Use the numad command to automatically align the VM CPU with memory resources.
```
# numad
```

Manual methods

Pin specific vCPU threads to a specific host CPU or range of CPUs. This is also possible on non-NUMA hosts and VMs, and is recommended as a safe method of vCPU performance improvement.
For example, the following commands pin vCPU threads 0 to 5 of the testguest6 VM to host CPUs 1, 3, 5, 7, 9, and 11, respectively:
```
# virsh vcpupin testguest6 0 1
# virsh vcpupin testguest6 1 3
# virsh vcpupin testguest6 2 5
# virsh vcpupin testguest6 3 7
# virsh vcpupin testguest6 4 9
# virsh vcpupin testguest6 5 11
```
Afterwards, you can verify whether this was successful:
```
# virsh vcpupin testguest6
VCPU   CPU Affinity
----------------------
0      1
1      3
2      5
3      7
4      9
5      11
```
After pinning vCPU threads, you can also pin QEMU process threads associated with a specified VM to a specific host CPU or range of CPUs. For example, the following commands pin the QEMU process thread of testguest6 to CPUs 13 and 15, and verify this was successful:
```
# virsh emulatorpin testguest6 13,15
# virsh emulatorpin testguest6
emulator: CPU Affinity
----------------------------------
       *: 13,15
```
Finally, you can also specify which host NUMA nodes will be assigned specifically to a certain VM. This can improve the host memory usage by the VM’s vCPU. For example, the following commands set testguest6 to use host NUMA nodes 3 to 5, and verify this was successful:
```
# virsh numatune testguest6 --nodeset 3-5
# virsh numatune testguest6
```

Note

For best performance results, it is recommended to use all of the manual tuning methods listed above

Problèmes connus

NUMA tuning currently cannot be performed on IBM Z hosts.

Ressources supplémentaires

Sample vCPU performance tuning scenario
View the current NUMA configuration of your system using the numastat utility

17.6.4. Sample vCPU performance tuning scenario

To obtain the best vCPU performance possible, Red Hat recommends using manual vcpupin, emulatorpin, and numatune settings together, for example like in the following scenario.

Starting scenario

Your host has the following hardware specifics:

2 NUMA nodes
3 CPU cores on each node
2 threads on each core

The output of virsh nodeinfo of such a machine would look similar to:

# virsh nodeinfo
CPU model:           x86_64
CPU(s):              12
CPU frequency:       3661 MHz
CPU socket(s):       2
Core(s) per socket:  3
Thread(s) per core:  2
NUMA cell(s):        2
Memory size:         31248692 KiB

You intend to modify an existing VM to have 8 vCPUs, which means that it will not fit in a single NUMA node.
Therefore, you should distribute 4 vCPUs on each NUMA node and make the vCPU topology resemble the host topology as closely as possible. This means that vCPUs that run as sibling threads of a given physical CPU should be pinned to host threads on the same core. For details, see the Solution below:

Solution

Obtain the information on the host topology:

# virsh capabilities

The output should include a section that looks similar to the following:

<topology>
  <cells num="2">
    <cell id="0">
      <memory unit="KiB">15624346</memory>
      <pages unit="KiB" size="4">3906086</pages>
      <pages unit="KiB" size="2048">0</pages>
      <pages unit="KiB" size="1048576">0</pages>
      <distances>
        <sibling id="0" value="10" />
        <sibling id="1" value="21" />
      </distances>
      <cpus num="6">
        <cpu id="0" socket_id="0" core_id="0" siblings="0,3" />
        <cpu id="1" socket_id="0" core_id="1" siblings="1,4" />
        <cpu id="2" socket_id="0" core_id="2" siblings="2,5" />
        <cpu id="3" socket_id="0" core_id="0" siblings="0,3" />
        <cpu id="4" socket_id="0" core_id="1" siblings="1,4" />
        <cpu id="5" socket_id="0" core_id="2" siblings="2,5" />
      </cpus>
    </cell>
    <cell id="1">
      <memory unit="KiB">15624346</memory>
      <pages unit="KiB" size="4">3906086</pages>
      <pages unit="KiB" size="2048">0</pages>
      <pages unit="KiB" size="1048576">0</pages>
      <distances>
        <sibling id="0" value="21" />
        <sibling id="1" value="10" />
      </distances>
      <cpus num="6">
        <cpu id="6" socket_id="1" core_id="3" siblings="6,9" />
        <cpu id="7" socket_id="1" core_id="4" siblings="7,10" />
        <cpu id="8" socket_id="1" core_id="5" siblings="8,11" />
        <cpu id="9" socket_id="1" core_id="3" siblings="6,9" />
        <cpu id="10" socket_id="1" core_id="4" siblings="7,10" />
        <cpu id="11" socket_id="1" core_id="5" siblings="8,11" />
      </cpus>
    </cell>
  </cells>
</topology>

Optional: Test the performance of the VM using the applicable tools and utilities.

Set up and mount 1 GiB huge pages on the host:

Add the following line to the host’s kernel command line:
```
default_hugepagesz=1G hugepagesz=1G
```

Create the /etc/systemd/system/hugetlb-gigantic-pages.service file with the following content:

[Unit]
Description=HugeTLB Gigantic Pages Reservation
DefaultDependencies=no
Before=dev-hugepages.mount
ConditionPathExists=/sys/devices/system/node
ConditionKernelCommandLine=hugepagesz=1G

[Service]
Type=oneshot
RemainAfterExit=yes
ExecStart=/etc/systemd/hugetlb-reserve-pages.sh

[Install]
WantedBy=sysinit.target

Create the /etc/systemd/hugetlb-reserve-pages.sh file with the following content:

#!/bin/sh

nodes_path=/sys/devices/system/node/
if [ ! -d $nodes_path ]; then
	echo "ERROR: $nodes_path does not exist"
	exit 1
fi

reserve_pages()
{
	echo $1 > $nodes_path/$2/hugepages/hugepages-1048576kB/nr_hugepages
}

reserve_pages 4 node1
reserve_pages 4 node2

This reserves four 1GiB huge pages from node1 and four 1GiB huge pages from node2.

Make the script created in the previous step executable:
```
# chmod +x /etc/systemd/hugetlb-reserve-pages.sh
```

Enable huge page reservation on boot:

# systemctl enable hugetlb-gigantic-pages

Use the virsh edit command to edit the XML configuration of the VM you wish to optimize, in this example super-VM:
```
# virsh edit super-vm
```
Adjust the XML configuration of the VM in the following way:
1. Set the VM to use 8 static vCPUs. Use the <vcpu/> element to do this.
2. Pin each of the vCPU threads to the corresponding host CPU threads that it mirrors in the topology. To do so, use the <vcpupin/> elements in the <cputune> section.
  Note that, as shown by the virsh capabilities utility above, host CPU threads are not ordered sequentially in their respective cores. In addition, the vCPU threads should be pinned to the highest available set of host cores on the same NUMA node. For a table illustration, see the Sample topology section below.
  The XML configuration for steps a. and b. can look similar to:
```
<cputune>
  <vcpupin vcpu='0' cpuset='1'/>
  <vcpupin vcpu='1' cpuset='4'/>
  <vcpupin vcpu='2' cpuset='2'/>
  <vcpupin vcpu='3' cpuset='5'/>
  <vcpupin vcpu='4' cpuset='7'/>
  <vcpupin vcpu='5' cpuset='10'/>
  <vcpupin vcpu='6' cpuset='8'/>
  <vcpupin vcpu='7' cpuset='11'/>
  <emulatorpin cpuset='6,9'/>
</cputune>
```
3. Set the VM to use 1 GiB huge pages:
```
<memoryBacking>
  <hugepages>
    <page size='1' unit='GiB'/>
  </hugepages>
</memoryBacking>
```
4. Configure the VM’s NUMA nodes to use memory from the corresponding NUMA nodes on the host. To do so, use the <memnode/> elements in the <numatune/> section:
```
<numatune>
  <memory mode="preferred" nodeset="1"/>
  <memnode cellid="0" mode="strict" nodeset="0"/>
  <memnode cellid="1" mode="strict" nodeset="1"/>
</numatune>
```
5. Ensure the CPU mode is set to host-passthrough, and that the CPU uses cache in passthrough mode:
```
<cpu mode="host-passthrough">
  <topology sockets="2" cores="2" threads="2"/>
  <cache mode="passthrough"/>
```

Vérification

Confirm that the resulting XML configuration of the VM includes a section similar to the following:

[...]
  <memoryBacking>
    <hugepages>
      <page size='1' unit='GiB'/>
    </hugepages>
  </memoryBacking>
  <vcpu placement='static'>8</vcpu>
  <cputune>
    <vcpupin vcpu='0' cpuset='1'/>
    <vcpupin vcpu='1' cpuset='4'/>
    <vcpupin vcpu='2' cpuset='2'/>
    <vcpupin vcpu='3' cpuset='5'/>
    <vcpupin vcpu='4' cpuset='7'/>
    <vcpupin vcpu='5' cpuset='10'/>
    <vcpupin vcpu='6' cpuset='8'/>
    <vcpupin vcpu='7' cpuset='11'/>
    <emulatorpin cpuset='6,9'/>
  </cputune>
  <numatune>
    <memory mode="preferred" nodeset="1"/>
    <memnode cellid="0" mode="strict" nodeset="0"/>
    <memnode cellid="1" mode="strict" nodeset="1"/>
  </numatune>
  <cpu mode="host-passthrough">
    <topology sockets="2" cores="2" threads="2"/>
    <cache mode="passthrough"/>
    <numa>
      <cell id="0" cpus="0-3" memory="2" unit="GiB">
        <distances>
          <sibling id="0" value="10"/>
          <sibling id="1" value="21"/>
        </distances>
      </cell>
      <cell id="1" cpus="4-7" memory="2" unit="GiB">
        <distances>
          <sibling id="0" value="21"/>
          <sibling id="1" value="10"/>
        </distances>
      </cell>
    </numa>
  </cpu>
</domain>

Optional: Test the performance of the VM using the applicable tools and utilities to evaluate the impact of the VM’s optimization.

Sample topology

The following tables illustrate the connections between the vCPUs and the host CPUs they should be pinned to:

Tableau 17.1. Host topology
CPU threads	3	1	4	2	5	6	9	7	10	8	11
Cores		1		2		3		4		5
Sockets						1
NUMA nodes						1

Tableau 17.2. VM topology
vCPU threads	1	2	3	4	5	6	7
Cores		1		2		3
Sockets				1
NUMA nodes				1

Tableau 17.3. Combined host and VM topology
vCPU threads			0	1	2	3			4	5	6	7
Host CPU threads	0	3	1	4	2	5	6	9	7	10	8	11
Cores	0		1		2		3		4		5
Sockets	0						1
NUMA nodes	0						1

In this scenario, there are 2 NUMA nodes and 8 vCPUs. Therefore, 4 vCPU threads should be pinned to each node.

In addition, Red Hat recommends leaving at least a single CPU thread available on each node for host system operations.

Because in this example, each NUMA node houses 3 cores, each with 2 host CPU threads, the set for node 0 translates as follows:

<vcpupin vcpu='0' cpuset='1'/>
<vcpupin vcpu='1' cpuset='4'/>
<vcpupin vcpu='2' cpuset='2'/>
<vcpupin vcpu='3' cpuset='5'/>

17.6.5. Managing kernel same-page merging

Kernel Same-Page Merging (KSM) improves memory density by sharing identical memory pages between virtual machines (VMs). However, enabling KSM increases CPU utilization, and might adversely affect overall performance depending on the workload.

Depending on your requirements, you can either enable or disable KSM for a single session or persistently.

Note

In RHEL 9 and later, KSM is disabled by default.

Conditions préalables

Root access to your host system.

Procédure

Disable KSM:

To deactivate KSM for a single session, use the systemctl utility to stop ksm and ksmtuned services.
```
# systemctl stop ksm

# systemctl stop ksmtuned
```

To deactivate KSM persistently, use the systemctl utility to disable ksm and ksmtuned services.

# systemctl disable ksm
Removed /etc/systemd/system/multi-user.target.wants/ksm.service.
# systemctl disable ksmtuned
Removed /etc/systemd/system/multi-user.target.wants/ksmtuned.service.

Note

Memory pages shared between VMs before deactivating KSM will remain shared. To stop sharing, delete all the PageKSM pages in the system using the following command:

# echo 2 > /sys/kernel/mm/ksm/run

After anonymous pages replace the KSM pages, the khugepaged kernel service will rebuild transparent hugepages on the VM’s physical memory.

Enable KSM:

Avertissement

Enabling KSM increases CPU utilization and affects overall CPU performance.

Install the ksmtuned service:
```
# yum install ksmtuned
```

Start the service:

To enable KSM for a single session, use the systemctl utility to start the ksm and ksmtuned services.
```
# systemctl start ksm
# systemctl start ksmtuned
```

To enable KSM persistently, use the systemctl utility to enable the ksm and ksmtuned services.

# systemctl enable ksm
Created symlink /etc/systemd/system/multi-user.target.wants/ksm.service  /usr/lib/systemd/system/ksm.service

# systemctl enable ksmtuned
Created symlink /etc/systemd/system/multi-user.target.wants/ksmtuned.service  /usr/lib/systemd/system/ksmtuned.service

17.6. Optimizing virtual machine CPU performance

17.6.1. Adding and removing virtual CPUs using the command-line interface

17.6.2. Managing virtual CPUs using the web console

17.6.3. Configuring NUMA in a virtual machine

17.6.4. Sample vCPU performance tuning scenario

17.6.5. Managing kernel same-page merging

Apprendre

Essayez, achetez et vendez

Communautés

À propos de la documentation Red Hat

Rendre l’open source plus inclusif

À propos de Red Hat

Red Hat legal and privacy links

Red Hat legal and privacy links