Home
Prodotti
OpenShift Container Platform
4.10
Scalability and performance
Chapter 2. Recommended host practices for IBM Z & LinuxONE environments

Questo contenuto non è disponibile nella lingua selezionata.

Chapter 2. Recommended host practices for IBM Z & LinuxONE environments

This topic provides recommended host practices for OpenShift Container Platform on IBM Z and LinuxONE.

Note

The s390x architecture is unique in many aspects. Therefore, some recommendations made here might not apply to other platforms.

Note

Unless stated otherwise, these practices apply to both z/VM and Red Hat Enterprise Linux (RHEL) KVM installations on IBM Z and LinuxONE.

2.1. Managing CPU overcommitment
Copia collegamento

In a highly virtualized IBM Z environment, you must carefully plan the infrastructure setup and sizing. One of the most important features of virtualization is the capability to do resource overcommitment, allocating more resources to the virtual machines than actually available at the hypervisor level. This is very workload dependent and there is no golden rule that can be applied to all setups.

Depending on your setup, consider these best practices regarding CPU overcommitment:

At LPAR level (PR/SM hypervisor), avoid assigning all available physical cores (IFLs) to each LPAR. For example, with four physical IFLs available, you should not define three LPARs with four logical IFLs each.
Check and understand LPAR shares and weights.
An excessive number of virtual CPUs can adversely affect performance. Do not define more virtual processors to a guest than logical processors are defined to the LPAR.
Configure the number of virtual processors per guest for peak workload, not more.
Start small and monitor the workload. Increase the vCPU number incrementally if necessary.
Not all workloads are suitable for high overcommitment ratios. If the workload is CPU intensive, you will probably not be able to achieve high ratios without performance problems. Workloads that are more I/O intensive can keep consistent performance even with high overcommitment ratios.

2.2. Disable Transparent Huge Pages
Copia collegamento

Transparent Huge Pages (THP) attempt to automate most aspects of creating, managing, and using huge pages. Since THP automatically manages the huge pages, this is not always handled optimally for all types of workloads. THP can lead to performance regressions, since many applications handle huge pages on their own. Therefore, consider disabling THP.

2.3. Boost networking performance with Receive Flow Steering
Copia collegamento

Receive Flow Steering (RFS) extends Receive Packet Steering (RPS) by further reducing network latency. RFS is technically based on RPS, and improves the efficiency of packet processing by increasing the CPU cache hit rate. RFS achieves this, and in addition considers queue length, by determining the most convenient CPU for computation so that cache hits are more likely to occur within the CPU. Thus, the CPU cache is invalidated less and requires fewer cycles to rebuild the cache. This can help reduce packet processing run time.

2.3.1. Use the Machine Config Operator (MCO) to activate RFS
Copia collegamento

Procedure

Copy the following MCO sample profile into a YAML file. For example, enable-rfs.yaml:

apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfig
metadata:
  labels:
    machineconfiguration.openshift.io/role: worker
  name: 50-enable-rfs
spec:
  config:
    ignition:
      version: 2.2.0
    storage:
      files:
      - contents:
          source: data:text/plain;charset=US-ASCII,%23%20turn%20on%20Receive%20Flow%20Steering%20%28RFS%29%20for%20all%20network%20interfaces%0ASUBSYSTEM%3D%3D%22net%22%2C%20ACTION%3D%3D%22add%22%2C%20RUN%7Bprogram%7D%2B%3D%22/bin/bash%20-c%20%27for%20x%20in%20/sys/%24DEVPATH/queues/rx-%2A%3B%20do%20echo%208192%20%3E%20%24x/rps_flow_cnt%3B%20%20done%27%22%0A
        filesystem: root
        mode: 0644
        path: /etc/udev/rules.d/70-persistent-net.rules
      - contents:
          source: data:text/plain;charset=US-ASCII,%23%20define%20sock%20flow%20enbtried%20for%20%20Receive%20Flow%20Steering%20%28RFS%29%0Anet.core.rps_sock_flow_entries%3D8192%0A
        filesystem: root
        mode: 0644
        path: /etc/sysctl.d/95-enable-rps.conf

apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfig
metadata:
  labels:
    machineconfiguration.openshift.io/role: worker
  name: 50-enable-rfs
spec:
  config:
    ignition:
      version: 2.2.0
    storage:
      files:
      - contents:
          source: data:text/plain;charset=US-ASCII,%23%20turn%20on%20Receive%20Flow%20Steering%20%28RFS%29%20for%20all%20network%20interfaces%0ASUBSYSTEM%3D%3D%22net%22%2C%20ACTION%3D%3D%22add%22%2C%20RUN%7Bprogram%7D%2B%3D%22/bin/bash%20-c%20%27for%20x%20in%20/sys/%24DEVPATH/queues/rx-%2A%3B%20do%20echo%208192%20%3E%20%24x/rps_flow_cnt%3B%20%20done%27%22%0A
        filesystem: root
        mode: 0644
        path: /etc/udev/rules.d/70-persistent-net.rules
      - contents:
          source: data:text/plain;charset=US-ASCII,%23%20define%20sock%20flow%20enbtried%20for%20%20Receive%20Flow%20Steering%20%28RFS%29%0Anet.core.rps_sock_flow_entries%3D8192%0A
        filesystem: root
        mode: 0644
        path: /etc/sysctl.d/95-enable-rps.conf

Copy to Clipboard

Toggle word wrap

Create the MCO profile:
```
oc create -f enable-rfs.yaml
```
```
$ oc create -f enable-rfs.yaml
```
Copy to Clipboard Toggle word wrap
Verify that an entry named 50-enable-rfs is listed:
```
oc get mc
```
```
$ oc get mc
```
Copy to Clipboard Toggle word wrap
To deactivate, enter:
```
oc delete mc 50-enable-rfs
```
```
$ oc delete mc 50-enable-rfs
```
Copy to Clipboard Toggle word wrap

2.4. Choose your networking setup
Copia collegamento

The networking stack is one of the most important components for a Kubernetes-based product like OpenShift Container Platform. For IBM Z setups, the networking setup depends on the hypervisor of your choice. Depending on the workload and the application, the best fit usually changes with the use case and the traffic pattern.

Depending on your setup, consider these best practices:

Consider all options regarding networking devices to optimize your traffic pattern. Explore the advantages of OSA-Express, RoCE Express, HiperSockets, z/VM VSwitch, Linux Bridge (KVM), and others to decide which option leads to the greatest benefit for your setup.
Always use the latest available NIC version. For example, OSA Express 7S 10 GbE shows great improvement compared to OSA Express 6S 10 GbE with transactional workload types, although both are 10 GbE adapters.
Each virtual switch adds an additional layer of latency.
The load balancer plays an important role for network communication outside the cluster. Consider using a production-grade hardware load balancer if this is critical for your application.
OpenShift Container Platform SDN introduces flows and rules, which impact the networking performance. Make sure to consider pod affinities and placements, to benefit from the locality of services where communication is critical.
Balance the trade-off between performance and functionality.

2.5. Ensure high disk performance with HyperPAV on z/VM
Copia collegamento

DASD and ECKD devices are commonly used disk types in IBM Z environments. In a typical OpenShift Container Platform setup in z/VM environments, DASD disks are commonly used to support the local storage for the nodes. You can set up HyperPAV alias devices to provide more throughput and overall better I/O performance for the DASD disks that support the z/VM guests.

Using HyperPAV for the local storage devices leads to a significant performance benefit. However, you must be aware that there is a trade-off between throughput and CPU costs.

2.5.1. Use the Machine Config Operator (MCO) to activate HyperPAV aliases in nodes using z/VM full-pack minidisks
Copia collegamento

For z/VM-based OpenShift Container Platform setups that use full-pack minidisks, you can leverage the advantage of MCO profiles by activating HyperPAV aliases in all of the nodes. You must add YAML configurations for both control plane and compute nodes.

Procedure

Copy the following MCO sample profile into a YAML file for the control plane node. For example, 05-master-kernelarg-hpav.yaml:

cat 05-master-kernelarg-hpav.yaml
apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfig
metadata:
  labels:
    machineconfiguration.openshift.io/role: master
  name: 05-master-kernelarg-hpav
spec:
  config:
    ignition:
      version: 3.1.0
  kernelArguments:
    - rd.dasd=800-805

$ cat 05-master-kernelarg-hpav.yaml
apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfig
metadata:
  labels:
    machineconfiguration.openshift.io/role: master
  name: 05-master-kernelarg-hpav
spec:
  config:
    ignition:
      version: 3.1.0
  kernelArguments:
    - rd.dasd=800-805

Copy to Clipboard

Toggle word wrap

Copy the following MCO sample profile into a YAML file for the compute node. For example, 05-worker-kernelarg-hpav.yaml:

cat 05-worker-kernelarg-hpav.yaml
apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfig
metadata:
  labels:
    machineconfiguration.openshift.io/role: worker
  name: 05-worker-kernelarg-hpav
spec:
  config:
    ignition:
      version: 3.1.0
  kernelArguments:
    - rd.dasd=800-805

$ cat 05-worker-kernelarg-hpav.yaml
apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfig
metadata:
  labels:
    machineconfiguration.openshift.io/role: worker
  name: 05-worker-kernelarg-hpav
spec:
  config:
    ignition:
      version: 3.1.0
  kernelArguments:
    - rd.dasd=800-805

Copy to Clipboard

Toggle word wrap

Note

You must modify the rd.dasd arguments to fit the device IDs.

Create the MCO profiles:

oc create -f 05-master-kernelarg-hpav.yaml

$ oc create -f 05-master-kernelarg-hpav.yaml

Copy to Clipboard

Toggle word wrap

oc create -f 05-worker-kernelarg-hpav.yaml

$ oc create -f 05-worker-kernelarg-hpav.yaml

Copy to Clipboard

Toggle word wrap

To deactivate, enter:

oc delete -f 05-master-kernelarg-hpav.yaml

$ oc delete -f 05-master-kernelarg-hpav.yaml

Copy to Clipboard

Toggle word wrap

oc delete -f 05-worker-kernelarg-hpav.yaml

$ oc delete -f 05-worker-kernelarg-hpav.yaml

Copy to Clipboard

Toggle word wrap

2.6. RHEL KVM on IBM Z host recommendations
Copia collegamento

Optimizing a KVM virtual server environment strongly depends on the workloads of the virtual servers and on the available resources. The same action that enhances performance in one environment can have adverse effects in another. Finding the best balance for a particular setting can be a challenge and often involves experimentation.

The following section introduces some best practices when using OpenShift Container Platform with RHEL KVM on IBM Z and LinuxONE environments.

2.6.1. Use multiple queues for your VirtIO network interfaces
Copia collegamento

With multiple virtual CPUs, you can transfer packages in parallel if you provide multiple queues for incoming and outgoing packets. Use the queues attribute of the driver element to configure multiple queues. Specify an integer of at least 2 that does not exceed the number of virtual CPUs of the virtual server.

The following example specification configures two input and output queues for a network interface:

<interface type="direct">
    <source network="net01"/>
    <model type="virtio"/>
    <driver ... queues="2"/>
</interface>

<interface type="direct">
    <source network="net01"/>
    <model type="virtio"/>
    <driver ... queues="2"/>
</interface>

Copy to Clipboard

Toggle word wrap

Multiple queues are designed to provide enhanced performance for a network interface, but they also use memory and CPU resources. Start with defining two queues for busy interfaces. Next, try two queues for interfaces with less traffic or more than two queues for busy interfaces.

2.6.2. Use I/O threads for your virtual block devices
Copia collegamento

To make virtual block devices use I/O threads, you must configure one or more I/O threads for the virtual server and each virtual block device to use one of these I/O threads.

The following example specifies <iothreads>3</iothreads> to configure three I/O threads, with consecutive decimal thread IDs 1, 2, and 3. The iothread="2" parameter specifies the driver element of the disk device to use the I/O thread with ID 2.

Sample I/O thread specification

...
<domain>
 	<iothreads>3</iothreads>
  	 ...
    	<devices>
       ...
          <disk type="block" device="disk">
<driver ... iothread="2"/>
    </disk>
       ...
    	</devices>
   ...
</domain>

...
<domain>
 	<iothreads>3</iothreads>


  	 ...
    	<devices>
       ...
          <disk type="block" device="disk">


<driver ... iothread="2"/>
    </disk>
       ...
    	</devices>
   ...
</domain>

Copy to Clipboard

Toggle word wrap

1: The number of I/O threads.
2: The driver element of the disk device.

Threads can increase the performance of I/O operations for disk devices, but they also use memory and CPU resources. You can configure multiple devices to use the same thread. The best mapping of threads to devices depends on the available resources and the workload.

Start with a small number of I/O threads. Often, a single I/O thread for all disk devices is sufficient. Do not configure more threads than the number of virtual CPUs, and do not configure idle threads.

You can use the virsh iothreadadd command to add I/O threads with specific thread IDs to a running virtual server.

2.6.3. Avoid virtual SCSI devices
Copia collegamento

Configure virtual SCSI devices only if you need to address the device through SCSI-specific interfaces. Configure disk space as virtual block devices rather than virtual SCSI devices, regardless of the backing on the host.

However, you might need SCSI-specific interfaces for:

A LUN for a SCSI-attached tape drive on the host.
A DVD ISO file on the host file system that is mounted on a virtual DVD drive.

2.6.4. Configure guest caching for disk
Copia collegamento

Configure your disk devices to do caching by the guest and not by the host.

Ensure that the driver element of the disk device includes the cache="none" and io="native" parameters.

<disk type="block" device="disk">
    <driver name="qemu" type="raw" cache="none" io="native" iothread="1"/>
...
</disk>

<disk type="block" device="disk">
    <driver name="qemu" type="raw" cache="none" io="native" iothread="1"/>
...
</disk>

Copy to Clipboard

Toggle word wrap

2.6.5. Exclude the memory balloon device
Copia collegamento

Unless you need a dynamic memory size, do not define a memory balloon device and ensure that libvirt does not create one for you. Include the memballoon parameter as a child of the devices element in your domain configuration XML file.

Check the list of active profiles:
```
<memballoon model="none"/>
```
```
<memballoon model="none"/>
```
Copy to Clipboard Toggle word wrap

2.6.6. Tune the CPU migration algorithm of the host scheduler
Copia collegamento

Important

Do not change the scheduler settings unless you are an expert who understands the implications. Do not apply changes to production systems without testing them and confirming that they have the intended effect.

The kernel.sched_migration_cost_ns parameter specifies a time interval in nanoseconds. After the last execution of a task, the CPU cache is considered to have useful content until this interval expires. Increasing this interval results in fewer task migrations. The default value is 500000 ns.

If the CPU idle time is higher than expected when there are runnable processes, try reducing this interval. If tasks bounce between CPUs or nodes too often, try increasing it.

To dynamically set the interval to 60000 ns, enter the following command:

sysctl kernel.sched_migration_cost_ns=60000

# sysctl kernel.sched_migration_cost_ns=60000

Copy to Clipboard

Toggle word wrap

To persistently change the value to 60000 ns, add the following entry to /etc/sysctl.conf:

kernel.sched_migration_cost_ns=60000

kernel.sched_migration_cost_ns=60000

Copy to Clipboard

Toggle word wrap

2.6.7. Disable the cpuset cgroup controller
Copia collegamento

Note

This setting applies only to KVM hosts with cgroups version 1. To enable CPU hotplug on the host, disable the cgroup controller.

Procedure

Open /etc/libvirt/qemu.conf with an editor of your choice.
Go to the cgroup_controllers line.
Duplicate the entire line and remove the leading number sign (#) from the copy.

Remove the cpuset entry, as follows:

cgroup_controllers = [ "cpu", "devices", "memory", "blkio", "cpuacct" ]

cgroup_controllers = [ "cpu", "devices", "memory", "blkio", "cpuacct" ]

Copy to Clipboard

Toggle word wrap

For the new setting to take effect, you must restart the libvirtd daemon:
1. Stop all virtual machines.
2. Run the following command:
  # systemctl restart libvirtd
  Copy to Clipboard Toggle word wrap
3. Restart the virtual machines.

This setting persists across host reboots.

2.6.8. Tune the polling period for idle virtual CPUs
Copia collegamento

When a virtual CPU becomes idle, KVM polls for wakeup conditions for the virtual CPU before allocating the host resource. You can specify the time interval, during which polling takes place in sysfs at /sys/module/kvm/parameters/halt_poll_ns. During the specified time, polling reduces the wakeup latency for the virtual CPU at the expense of resource usage. Depending on the workload, a longer or shorter time for polling can be beneficial. The time interval is specified in nanoseconds. The default is 50000 ns.

To optimize for low CPU consumption, enter a small value or write 0 to disable polling:
```
echo 0 > /sys/module/kvm/parameters/halt_poll_ns
```
```
# echo 0 > /sys/module/kvm/parameters/halt_poll_ns
```
Copy to Clipboard Toggle word wrap
To optimize for low latency, for example for transactional workloads, enter a large value:
```
echo 80000 > /sys/module/kvm/parameters/halt_poll_ns
```
```
# echo 80000 > /sys/module/kvm/parameters/halt_poll_ns
```
Copy to Clipboard Toggle word wrap

Questo contenuto non è disponibile nella lingua selezionata.

Chapter 2. Recommended host practices for IBM Z & LinuxONE environments

2.1. Managing CPU overcommitment
Copia collegamento

2.2. Disable Transparent Huge Pages
Copia collegamento

2.3. Boost networking performance with Receive Flow Steering
Copia collegamento

2.3.1. Use the Machine Config Operator (MCO) to activate RFS
Copia collegamento

2.4. Choose your networking setup
Copia collegamento

2.5. Ensure high disk performance with HyperPAV on z/VM
Copia collegamento

2.5.1. Use the Machine Config Operator (MCO) to activate HyperPAV aliases in nodes using z/VM full-pack minidisks
Copia collegamento

2.6. RHEL KVM on IBM Z host recommendations
Copia collegamento

2.6.1. Use multiple queues for your VirtIO network interfaces
Copia collegamento

2.6.2. Use I/O threads for your virtual block devices
Copia collegamento

2.6.3. Avoid virtual SCSI devices
Copia collegamento

2.6.4. Configure guest caching for disk
Copia collegamento

2.6.5. Exclude the memory balloon device
Copia collegamento

2.6.6. Tune the CPU migration algorithm of the host scheduler
Copia collegamento

2.6.7. Disable the cpuset cgroup controller
Copia collegamento

2.6.8. Tune the polling period for idle virtual CPUs
Copia collegamento

Formazione

Prova, acquista e vendi

Community

Informazioni sulla documentazione di Red Hat

Rendiamo l’open source più inclusivo

Informazioni su Red Hat

Theme

Red Hat legal and privacy links

Red Hat legal and privacy links

Questo contenuto non è disponibile nella lingua selezionata.

Chapter 2. Recommended host practices for IBM Z & LinuxONE environments

2.1. Managing CPU overcommitmentCopia collegamentoCollegamento copiato negli appunti!

2.2. Disable Transparent Huge PagesCopia collegamentoCollegamento copiato negli appunti!

2.3. Boost networking performance with Receive Flow SteeringCopia collegamentoCollegamento copiato negli appunti!

2.3.1. Use the Machine Config Operator (MCO) to activate RFSCopia collegamentoCollegamento copiato negli appunti!

2.4. Choose your networking setupCopia collegamentoCollegamento copiato negli appunti!

2.5. Ensure high disk performance with HyperPAV on z/VMCopia collegamentoCollegamento copiato negli appunti!

2.5.1. Use the Machine Config Operator (MCO) to activate HyperPAV aliases in nodes using z/VM full-pack minidisksCopia collegamentoCollegamento copiato negli appunti!

2.6. RHEL KVM on IBM Z host recommendationsCopia collegamentoCollegamento copiato negli appunti!

2.6.1. Use multiple queues for your VirtIO network interfacesCopia collegamentoCollegamento copiato negli appunti!

2.6.2. Use I/O threads for your virtual block devicesCopia collegamentoCollegamento copiato negli appunti!

2.6.3. Avoid virtual SCSI devicesCopia collegamentoCollegamento copiato negli appunti!

2.6.4. Configure guest caching for diskCopia collegamentoCollegamento copiato negli appunti!

2.6.5. Exclude the memory balloon deviceCopia collegamentoCollegamento copiato negli appunti!

2.6.6. Tune the CPU migration algorithm of the host schedulerCopia collegamentoCollegamento copiato negli appunti!

2.6.7. Disable the cpuset cgroup controllerCopia collegamentoCollegamento copiato negli appunti!

2.6.8. Tune the polling period for idle virtual CPUsCopia collegamentoCollegamento copiato negli appunti!

Formazione

Prova, acquista e vendi

Community

Informazioni sulla documentazione di Red Hat

Rendiamo l’open source più inclusivo

Informazioni su Red Hat

Theme

Red Hat legal and privacy links

Red Hat legal and privacy links

2.1. Managing CPU overcommitment
Copia collegamento

2.2. Disable Transparent Huge Pages
Copia collegamento

2.3. Boost networking performance with Receive Flow Steering
Copia collegamento

2.3.1. Use the Machine Config Operator (MCO) to activate RFS
Copia collegamento

2.4. Choose your networking setup
Copia collegamento

2.5. Ensure high disk performance with HyperPAV on z/VM
Copia collegamento

2.5.1. Use the Machine Config Operator (MCO) to activate HyperPAV aliases in nodes using z/VM full-pack minidisks
Copia collegamento

2.6. RHEL KVM on IBM Z host recommendations
Copia collegamento

2.6.1. Use multiple queues for your VirtIO network interfaces
Copia collegamento

2.6.2. Use I/O threads for your virtual block devices
Copia collegamento

2.6.3. Avoid virtual SCSI devices
Copia collegamento

2.6.4. Configure guest caching for disk
Copia collegamento

2.6.5. Exclude the memory balloon device
Copia collegamento

2.6.6. Tune the CPU migration algorithm of the host scheduler
Copia collegamento

2.6.7. Disable the cpuset cgroup controller
Copia collegamento

2.6.8. Tune the polling period for idle virtual CPUs
Copia collegamento