Rechercher

17.6. Optimizing virtual machine CPU performance

download PDF

Much like physical CPUs in host machines, vCPUs are critical to virtual machine (VM) performance. As a result, optimizing vCPUs can have a significant impact on the resource efficiency of your VMs. To optimize your vCPU:

  1. Adjust how many host CPUs are assigned to the VM. You can do this using the CLI or the web console.
  2. Ensure that the vCPU model is aligned with the CPU model of the host. For example, to set the testguest1 VM to use the CPU model of the host:

    # virt-xml testguest1 --edit --cpu host-model
  3. Manage kernel same-page merging (KSM).
  4. If your host machine uses Non-Uniform Memory Access (NUMA), you can also configure NUMA for its VMs. This maps the host’s CPU and memory processes onto the CPU and memory processes of the VM as closely as possible. In effect, NUMA tuning provides the vCPU with a more streamlined access to the system memory allocated to the VM, which can improve the vCPU processing effectiveness.

    For details, see Configuring NUMA in a virtual machine and Sample vCPU performance tuning scenario.

17.6.1. Adding and removing virtual CPUs using the command-line interface

To increase or optimize the CPU performance of a virtual machine (VM), you can add or remove virtual CPUs (vCPUs) assigned to the VM.

When performed on a running VM, this is also referred to as vCPU hot plugging and hot unplugging. However, note that vCPU hot unplug is not supported in RHEL 9, and Red Hat highly discourages its use.

Conditions préalables

  • Optional: View the current state of the vCPUs in the targeted VM. For example, to display the number of vCPUs on the testguest VM:

    # virsh vcpucount testguest
    maximum      config         4
    maximum      live           2
    current      config         2
    current      live           1

    This output indicates that testguest is currently using 1 vCPU, and 1 more vCPu can be hot plugged to it to increase the VM’s performance. However, after reboot, the number of vCPUs testguest uses will change to 2, and it will be possible to hot plug 2 more vCPUs.

Procédure

  1. Adjust the maximum number of vCPUs that can be attached to a VM, which takes effect on the VM’s next boot.

    For example, to increase the maximum vCPU count for the testguest VM to 8:

    # virsh setvcpus testguest 8 --maximum --config

    Note that the maximum may be limited by the CPU topology, host hardware, the hypervisor, and other factors.

  2. Adjust the current number of vCPUs attached to a VM, up to the maximum configured in the previous step. For example:

    • To increase the number of vCPUs attached to the running testguest VM to 4:

      # virsh setvcpus testguest 4 --live

      This increases the VM’s performance and host load footprint of testguest until the VM’s next boot.

    • To permanently decrease the number of vCPUs attached to the testguest VM to 1:

      # virsh setvcpus testguest 1 --config

      This decreases the VM’s performance and host load footprint of testguest after the VM’s next boot. However, if needed, additional vCPUs can be hot plugged to the VM to temporarily increase its performance.

Vérification

  • Confirm that the current state of vCPU for the VM reflects your changes.

    # virsh vcpucount testguest
    maximum      config         8
    maximum      live           4
    current      config         1
    current      live           4

Ressources supplémentaires

17.6.2. Managing virtual CPUs using the web console

Using the RHEL 9 web console, you can review and configure virtual CPUs used by virtual machines (VMs) to which the web console is connected.

Conditions préalables

Procédure

  1. In the Virtual Machines interface, click the VM whose information you want to see.

    A new page opens with an Overview section with basic information about the selected VM and a Console section to access the VM’s graphical interface.

  2. Click edit next to the number of vCPUs in the Overview pane.

    The vCPU details dialog appears.

    Image displaying the VM CPU details dialog box.
  1. Configure the virtual CPUs for the selected VM.

    • vCPU Count - The number of vCPUs currently in use.

      Note

      The vCPU count cannot be greater than the vCPU Maximum.

    • vCPU Maximum - The maximum number of virtual CPUs that can be configured for the VM. If this value is higher than the vCPU Count, additional vCPUs can be attached to the VM.
    • Sockets - The number of sockets to expose to the VM.
    • Cores per socket - The number of cores for each socket to expose to the VM.
    • Threads per core - The number of threads for each core to expose to the VM.

      Note that the Sockets, Cores per socket, and Threads per core options adjust the CPU topology of the VM. This may be beneficial for vCPU performance and may impact the functionality of certain software in the guest OS. If a different setting is not required by your deployment, keep the default values.

  2. Cliquez sur Appliquer.

    The virtual CPUs for the VM are configured.

    Note

    Changes to virtual CPU settings only take effect after the VM is restarted.

17.6.3. Configuring NUMA in a virtual machine

The following methods can be used to configure Non-Uniform Memory Access (NUMA) settings of a virtual machine (VM) on a RHEL 9 host.

Conditions préalables

  • The host is a NUMA-compatible machine. To detect whether this is the case, use the virsh nodeinfo command and see the NUMA cell(s) line:

    # virsh nodeinfo
    CPU model:           x86_64
    CPU(s):              48
    CPU frequency:       1200 MHz
    CPU socket(s):       1
    Core(s) per socket:  12
    Thread(s) per core:  2
    NUMA cell(s):        2
    Memory size:         67012964 KiB

    If the value of the line is 2 or greater, the host is NUMA-compatible.

Procédure

For ease of use, you can set up a VM’s NUMA configuration using automated utilities and services. However, manual NUMA setup is more likely to yield a significant performance improvement.

Automatic methods

  • Set the VM’s NUMA policy to Preferred. For example, to do so for the testguest5 VM:

    # virt-xml testguest5 --edit --vcpus placement=auto
    # virt-xml testguest5 --edit --numatune mode=preferred
  • Enable automatic NUMA balancing on the host:

    # echo 1 > /proc/sys/kernel/numa_balancing
  • Use the numad command to automatically align the VM CPU with memory resources.

    # numad

Manual methods

  1. Pin specific vCPU threads to a specific host CPU or range of CPUs. This is also possible on non-NUMA hosts and VMs, and is recommended as a safe method of vCPU performance improvement.

    For example, the following commands pin vCPU threads 0 to 5 of the testguest6 VM to host CPUs 1, 3, 5, 7, 9, and 11, respectively:

    # virsh vcpupin testguest6 0 1
    # virsh vcpupin testguest6 1 3
    # virsh vcpupin testguest6 2 5
    # virsh vcpupin testguest6 3 7
    # virsh vcpupin testguest6 4 9
    # virsh vcpupin testguest6 5 11

    Afterwards, you can verify whether this was successful:

    # virsh vcpupin testguest6
    VCPU   CPU Affinity
    ----------------------
    0      1
    1      3
    2      5
    3      7
    4      9
    5      11
  2. After pinning vCPU threads, you can also pin QEMU process threads associated with a specified VM to a specific host CPU or range of CPUs. For example, the following commands pin the QEMU process thread of testguest6 to CPUs 13 and 15, and verify this was successful:

    # virsh emulatorpin testguest6 13,15
    # virsh emulatorpin testguest6
    emulator: CPU Affinity
    ----------------------------------
           *: 13,15
  3. Finally, you can also specify which host NUMA nodes will be assigned specifically to a certain VM. This can improve the host memory usage by the VM’s vCPU. For example, the following commands set testguest6 to use host NUMA nodes 3 to 5, and verify this was successful:

    # virsh numatune testguest6 --nodeset 3-5
    # virsh numatune testguest6
Note

For best performance results, it is recommended to use all of the manual tuning methods listed above

17.6.4. Sample vCPU performance tuning scenario

To obtain the best vCPU performance possible, Red Hat recommends using manual vcpupin, emulatorpin, and numatune settings together, for example like in the following scenario.

Starting scenario

  • Your host has the following hardware specifics:

    • 2 NUMA nodes
    • 3 CPU cores on each node
    • 2 threads on each core

    The output of virsh nodeinfo of such a machine would look similar to:

    # virsh nodeinfo
    CPU model:           x86_64
    CPU(s):              12
    CPU frequency:       3661 MHz
    CPU socket(s):       2
    Core(s) per socket:  3
    Thread(s) per core:  2
    NUMA cell(s):        2
    Memory size:         31248692 KiB
  • You intend to modify an existing VM to have 8 vCPUs, which means that it will not fit in a single NUMA node.

    Therefore, you should distribute 4 vCPUs on each NUMA node and make the vCPU topology resemble the host topology as closely as possible. This means that vCPUs that run as sibling threads of a given physical CPU should be pinned to host threads on the same core. For details, see the Solution below:

Solution

  1. Obtain the information on the host topology:

    # virsh capabilities

    The output should include a section that looks similar to the following:

    <topology>
      <cells num="2">
        <cell id="0">
          <memory unit="KiB">15624346</memory>
          <pages unit="KiB" size="4">3906086</pages>
          <pages unit="KiB" size="2048">0</pages>
          <pages unit="KiB" size="1048576">0</pages>
          <distances>
            <sibling id="0" value="10" />
            <sibling id="1" value="21" />
          </distances>
          <cpus num="6">
            <cpu id="0" socket_id="0" core_id="0" siblings="0,3" />
            <cpu id="1" socket_id="0" core_id="1" siblings="1,4" />
            <cpu id="2" socket_id="0" core_id="2" siblings="2,5" />
            <cpu id="3" socket_id="0" core_id="0" siblings="0,3" />
            <cpu id="4" socket_id="0" core_id="1" siblings="1,4" />
            <cpu id="5" socket_id="0" core_id="2" siblings="2,5" />
          </cpus>
        </cell>
        <cell id="1">
          <memory unit="KiB">15624346</memory>
          <pages unit="KiB" size="4">3906086</pages>
          <pages unit="KiB" size="2048">0</pages>
          <pages unit="KiB" size="1048576">0</pages>
          <distances>
            <sibling id="0" value="21" />
            <sibling id="1" value="10" />
          </distances>
          <cpus num="6">
            <cpu id="6" socket_id="1" core_id="3" siblings="6,9" />
            <cpu id="7" socket_id="1" core_id="4" siblings="7,10" />
            <cpu id="8" socket_id="1" core_id="5" siblings="8,11" />
            <cpu id="9" socket_id="1" core_id="3" siblings="6,9" />
            <cpu id="10" socket_id="1" core_id="4" siblings="7,10" />
            <cpu id="11" socket_id="1" core_id="5" siblings="8,11" />
          </cpus>
        </cell>
      </cells>
    </topology>
  2. Optional: Test the performance of the VM using the applicable tools and utilities.
  3. Set up and mount 1 GiB huge pages on the host:

    1. Add the following line to the host’s kernel command line:

      default_hugepagesz=1G hugepagesz=1G
    2. Create the /etc/systemd/system/hugetlb-gigantic-pages.service file with the following content:

      [Unit]
      Description=HugeTLB Gigantic Pages Reservation
      DefaultDependencies=no
      Before=dev-hugepages.mount
      ConditionPathExists=/sys/devices/system/node
      ConditionKernelCommandLine=hugepagesz=1G
      
      [Service]
      Type=oneshot
      RemainAfterExit=yes
      ExecStart=/etc/systemd/hugetlb-reserve-pages.sh
      
      [Install]
      WantedBy=sysinit.target
    3. Create the /etc/systemd/hugetlb-reserve-pages.sh file with the following content:

      #!/bin/sh
      
      nodes_path=/sys/devices/system/node/
      if [ ! -d $nodes_path ]; then
      	echo "ERROR: $nodes_path does not exist"
      	exit 1
      fi
      
      reserve_pages()
      {
      	echo $1 > $nodes_path/$2/hugepages/hugepages-1048576kB/nr_hugepages
      }
      
      reserve_pages 4 node1
      reserve_pages 4 node2

      This reserves four 1GiB huge pages from node1 and four 1GiB huge pages from node2.

    4. Make the script created in the previous step executable:

      # chmod +x /etc/systemd/hugetlb-reserve-pages.sh
    5. Enable huge page reservation on boot:

      # systemctl enable hugetlb-gigantic-pages
  4. Use the virsh edit command to edit the XML configuration of the VM you wish to optimize, in this example super-VM:

    # virsh edit super-vm
  5. Adjust the XML configuration of the VM in the following way:

    1. Set the VM to use 8 static vCPUs. Use the <vcpu/> element to do this.
    2. Pin each of the vCPU threads to the corresponding host CPU threads that it mirrors in the topology. To do so, use the <vcpupin/> elements in the <cputune> section.

      Note that, as shown by the virsh capabilities utility above, host CPU threads are not ordered sequentially in their respective cores. In addition, the vCPU threads should be pinned to the highest available set of host cores on the same NUMA node. For a table illustration, see the Sample topology section below.

      The XML configuration for steps a. and b. can look similar to:

      <cputune>
        <vcpupin vcpu='0' cpuset='1'/>
        <vcpupin vcpu='1' cpuset='4'/>
        <vcpupin vcpu='2' cpuset='2'/>
        <vcpupin vcpu='3' cpuset='5'/>
        <vcpupin vcpu='4' cpuset='7'/>
        <vcpupin vcpu='5' cpuset='10'/>
        <vcpupin vcpu='6' cpuset='8'/>
        <vcpupin vcpu='7' cpuset='11'/>
        <emulatorpin cpuset='6,9'/>
      </cputune>
    3. Set the VM to use 1 GiB huge pages:

      <memoryBacking>
        <hugepages>
          <page size='1' unit='GiB'/>
        </hugepages>
      </memoryBacking>
    4. Configure the VM’s NUMA nodes to use memory from the corresponding NUMA nodes on the host. To do so, use the <memnode/> elements in the <numatune/> section:

      <numatune>
        <memory mode="preferred" nodeset="1"/>
        <memnode cellid="0" mode="strict" nodeset="0"/>
        <memnode cellid="1" mode="strict" nodeset="1"/>
      </numatune>
    5. Ensure the CPU mode is set to host-passthrough, and that the CPU uses cache in passthrough mode:

      <cpu mode="host-passthrough">
        <topology sockets="2" cores="2" threads="2"/>
        <cache mode="passthrough"/>

Vérification

  1. Confirm that the resulting XML configuration of the VM includes a section similar to the following:

    [...]
      <memoryBacking>
        <hugepages>
          <page size='1' unit='GiB'/>
        </hugepages>
      </memoryBacking>
      <vcpu placement='static'>8</vcpu>
      <cputune>
        <vcpupin vcpu='0' cpuset='1'/>
        <vcpupin vcpu='1' cpuset='4'/>
        <vcpupin vcpu='2' cpuset='2'/>
        <vcpupin vcpu='3' cpuset='5'/>
        <vcpupin vcpu='4' cpuset='7'/>
        <vcpupin vcpu='5' cpuset='10'/>
        <vcpupin vcpu='6' cpuset='8'/>
        <vcpupin vcpu='7' cpuset='11'/>
        <emulatorpin cpuset='6,9'/>
      </cputune>
      <numatune>
        <memory mode="preferred" nodeset="1"/>
        <memnode cellid="0" mode="strict" nodeset="0"/>
        <memnode cellid="1" mode="strict" nodeset="1"/>
      </numatune>
      <cpu mode="host-passthrough">
        <topology sockets="2" cores="2" threads="2"/>
        <cache mode="passthrough"/>
        <numa>
          <cell id="0" cpus="0-3" memory="2" unit="GiB">
            <distances>
              <sibling id="0" value="10"/>
              <sibling id="1" value="21"/>
            </distances>
          </cell>
          <cell id="1" cpus="4-7" memory="2" unit="GiB">
            <distances>
              <sibling id="0" value="21"/>
              <sibling id="1" value="10"/>
            </distances>
          </cell>
        </numa>
      </cpu>
    </domain>
  2. Optional: Test the performance of the VM using the applicable tools and utilities to evaluate the impact of the VM’s optimization.

Sample topology

  • The following tables illustrate the connections between the vCPUs and the host CPUs they should be pinned to:

    Tableau 17.1. Host topology

    CPU threads

    0

    3

    1

    4

    2

    5

    6

    9

    7

    10

    8

    11

    Cores

    0

    1

    2

    3

    4

    5

    Sockets

    0

    1

    NUMA nodes

    0

    1

    Tableau 17.2. VM topology

    vCPU threads

    0

    1

    2

    3

    4

    5

    6

    7

    Cores

    0

    1

    2

    3

    Sockets

    0

    1

    NUMA nodes

    0

    1

    Tableau 17.3. Combined host and VM topology

    vCPU threads

     

    0

    1

    2

    3

     

    4

    5

    6

    7

    Host CPU threads

    0

    3

    1

    4

    2

    5

    6

    9

    7

    10

    8

    11

    Cores

    0

    1

    2

    3

    4

    5

    Sockets

    0

    1

    NUMA nodes

    0

    1

    In this scenario, there are 2 NUMA nodes and 8 vCPUs. Therefore, 4 vCPU threads should be pinned to each node.

    In addition, Red Hat recommends leaving at least a single CPU thread available on each node for host system operations.

    Because in this example, each NUMA node houses 3 cores, each with 2 host CPU threads, the set for node 0 translates as follows:

    <vcpupin vcpu='0' cpuset='1'/>
    <vcpupin vcpu='1' cpuset='4'/>
    <vcpupin vcpu='2' cpuset='2'/>
    <vcpupin vcpu='3' cpuset='5'/>

17.6.5. Managing kernel same-page merging

Kernel Same-Page Merging (KSM) improves memory density by sharing identical memory pages between virtual machines (VMs). However, enabling KSM increases CPU utilization, and might adversely affect overall performance depending on the workload.

Depending on your requirements, you can either enable or disable KSM for a single session or persistently.

Note

In RHEL 9 and later, KSM is disabled by default.

Conditions préalables

  • Root access to your host system.

Procédure

  • Disable KSM:

    • To deactivate KSM for a single session, use the systemctl utility to stop ksm and ksmtuned services.

      # systemctl stop ksm
      
      # systemctl stop ksmtuned
    • To deactivate KSM persistently, use the systemctl utility to disable ksm and ksmtuned services.

      # systemctl disable ksm
      Removed /etc/systemd/system/multi-user.target.wants/ksm.service.
      # systemctl disable ksmtuned
      Removed /etc/systemd/system/multi-user.target.wants/ksmtuned.service.
Note

Memory pages shared between VMs before deactivating KSM will remain shared. To stop sharing, delete all the PageKSM pages in the system using the following command:

# echo 2 > /sys/kernel/mm/ksm/run

After anonymous pages replace the KSM pages, the khugepaged kernel service will rebuild transparent hugepages on the VM’s physical memory.

  • Enable KSM:
Avertissement

Enabling KSM increases CPU utilization and affects overall CPU performance.

  1. Install the ksmtuned service:

    # yum install ksmtuned
  2. Start the service:

    • To enable KSM for a single session, use the systemctl utility to start the ksm and ksmtuned services.

      # systemctl start ksm
      # systemctl start ksmtuned
    • To enable KSM persistently, use the systemctl utility to enable the ksm and ksmtuned services.

      # systemctl enable ksm
      Created symlink /etc/systemd/system/multi-user.target.wants/ksm.service  /usr/lib/systemd/system/ksm.service
      
      # systemctl enable ksmtuned
      Created symlink /etc/systemd/system/multi-user.target.wants/ksmtuned.service  /usr/lib/systemd/system/ksmtuned.service
Red Hat logoGithubRedditYoutubeTwitter

Apprendre

Essayez, achetez et vendez

Communautés

À propos de la documentation Red Hat

Nous aidons les utilisateurs de Red Hat à innover et à atteindre leurs objectifs grâce à nos produits et services avec un contenu auquel ils peuvent faire confiance.

Rendre l’open source plus inclusif

Red Hat s'engage à remplacer le langage problématique dans notre code, notre documentation et nos propriétés Web. Pour plus de détails, consultez leBlog Red Hat.

À propos de Red Hat

Nous proposons des solutions renforcées qui facilitent le travail des entreprises sur plusieurs plates-formes et environnements, du centre de données central à la périphérie du réseau.

© 2024 Red Hat, Inc.