Chapter 1. Real-time kernel tuning in RHEL 9
Latency, or response time, refers to the time from an event and to the system response. It is generally measured in microseconds (μs).
For most applications running under a Linux environment, basic performance tuning can improve latency sufficiently. For those industries where latency must be low, accountable, and predictable, Red Hat has a replacement kernel that can be tuned so that latency meets those requirements. The RHEL for Real Time kernel provides seamless integration with RHEL 9 and offers clients the opportunity to measure, configure, and record latency times within their organization.
Use the RHEL for Real Time kernel on well-tuned systems, for applications with extremely high determinism requirements. With the kernel system tuning, you can achieve good improvement in determinism. Before you begin, perform general system tuning of the standard RHEL 9 system and then deploy the RHEL for Real Time kernel.
Failure to perform these tasks might prevent a consistent performance from a RHEL Real Time deployment.
1.1. Tuning guidelines Copy linkLink copied to clipboard!
Real-time tuning is an iterative process; you will almost never be able to tweak a few variables and know that the change is the best that can be achieved. Be prepared to spend days or weeks narrowing down the set of tuning configurations that work best for your system.
Additionally, always make long test runs. Changing some tuning parameters then doing a five minute test run is not a good validation of a particular set of tuning changes. Make the length of your test runs adjustable and run them for longer than a few minutes. You can narrow down to a few different tuning configuration sets with test runs of a few hours, then run those sets for many hours or days at a time to detect corner-cases of highest latency or resource exhaustion.
- Build a measurement mechanism into your application, so that you can accurately gauge how a particular set of tuning changes affect the application’s performance. Anecdotal evidence, for example, "The mouse moves more smoothly" is usually wrong and can vary. Do hard measurements and record them for later analysis.
- It is very tempting to make multiple changes to tuning variables between test runs, but doing so means that you do not have a way to narrow down which tuning parameter affected your test results. Keep the tuning changes between test runs as small as you can.
- It is also tempting to make large changes when tuning, but it is almost always better to make incremental changes. You will find that working your way up from the lowest to highest priority values will yield better results in the long run.
-
Use the available tools. The
tunatuning tool makes it easy to change processor affinities for threads and interrupts, thread priorities and to isolate processors for application use. Thetasksetandchrtcommand line utilities allow you to do most of whattunadoes. If you run into performance problems, theftraceandperfutilities can help locate latency problems. - Rather than hard-coding values into your application, use external tools to change policy, priority and affinity. Using external tools allows you to try many different combinations and simplifies your logic. Once you have found some settings that give good results, you can either add them to your application, or set up startup logic to implement the settings when the application starts.
1.2. Thread scheduling policies Copy linkLink copied to clipboard!
Linux uses three main thread scheduling policies.
SCHED_OTHER(sometimes calledSCHED_NORMAL)This is the default thread policy and has dynamic priority controlled by the kernel. The priority is changed based on thread activity. Threads with this policy are considered to have a real-time priority of 0 (zero).
SCHED_FIFO(First in, first out)A real-time policy with a priority range of from
1 - 99, with1being the lowest and99the highest.SCHED_FIFOthreads always have a higher priority thanSCHED_OTHERthreads (for example, aSCHED_FIFOthread with a priority of1will have a higher priority than anySCHED_OTHERthread). Any thread created as aSCHED_FIFOthread has a fixed priority and will run until it is blocked or preempted by a higher priority thread.SCHED_RR(Round-Robin)SCHED_RRis a modification ofSCHED_FIFO. Threads with the same priority have a quantum and are round-robin scheduled among all equal prioritySCHED_RRthreads. This policy is rarely used.
1.3. Balancing logging parameters Copy linkLink copied to clipboard!
The syslog server forwards log messages from programs over a network. The less often this occurs, the larger the pending transaction is likely to be. If the transaction is very large, it can cause an I/O spike. To prevent this, keep the interval reasonably small.
The system logging daemon, syslogd, is used to collect messages from different programs. It also collects information reported by the kernel from the kernel logging daemon, klogd. Typically, syslogd logs to a local file, but it can also be configured to log over a network to a remote logging server.
Procedure
To enable remote logging:
- Configure the machine to which the logs will be sent. For more information, see Remote Syslogging with rsyslog on Red Hat Enterprise Linux.
Configure each system that will send logs to the remote log server, so that its
syslogoutput is written to the server, rather than to the local file system. To do so, edit the/etc/rsyslog.conffile on each client system. For each of the logging rules defined in that file, replace the local log file with the address of the remote logging server.Log all kernel messages to remote logging host.
# Log all kernel messages to remote logging host. kern.* @my.remote.logging.serverCopy to Clipboard Copied! Toggle word wrap Toggle overflow The example above configures the client system to log all kernel messages to the remote machine at
@my.remote.logging.server.Alternatively, you can configure
syslogdto log all locally generated system messages, by adding the following line to the/etc/rsyslog.conffile:Log all messages to a remote logging server:
# Log all messages to a remote logging server: . @my.remote.logging.serverCopy to Clipboard Copied! Toggle word wrap Toggle overflow
The syslogd daemon does not include built-in rate limiting on its generated network traffic. Therefore, Red Hat recommends that when using RHEL for Real Time systems, only log messages that are required to be remotely logged by your organization. For example, kernel warnings, authentication requests, and the like. Other messages should be logged locally.
1.4. Improving performance by avoiding running unnecessary applications Copy linkLink copied to clipboard!
Every running application uses system resources. Ensuring that there are no unnecessary applications running on your system can significantly improve performance.
Prerequisites
- You have root permissions on the system.
Procedure
Do not run the graphical interface where it is not absolutely required, especially on servers.
Check if the system is configured to boot into the GUI by default:
systemctl get-default
# systemctl get-defaultCopy to Clipboard Copied! Toggle word wrap Toggle overflow If the output of the command is
graphical.target, configure the system to boot to text mode:systemctl set-default multi-user.target
# systemctl set-default multi-user.targetCopy to Clipboard Copied! Toggle word wrap Toggle overflow Unless you are actively using a Mail Transfer Agent (MTA) on the system you are tuning, disable it. If the MTA is required, ensure it is well-tuned or consider moving it to a dedicated machine.
For more information, refer to the MTA’s documentation.
ImportantMTAs are used to send system-generated messages, which are executed by programs such as
cron. This includes reports generated by logging functions likelogwatch(). You will not be able to receive these messages if the MTAs on your machine are disabled.Peripheral devices, such as mice, keyboards, webcams send interrupts that may negatively affect latency. If you are not using a graphical interface, remove all unused peripheral devices and disable them.
For more information, refer to the devices' documentation.
Check for automated
cronjobs that might impact performance.crontab -l
# crontab -lCopy to Clipboard Copied! Toggle word wrap Toggle overflow Disable the
crondservice or any unneededcronjobs.- Check your system for third-party applications and any components added by external hardware vendors, and remove any that are unnecessary.
Additional resources
-
cron(8)man page on your system
1.5. Non-Uniform Memory Access Copy linkLink copied to clipboard!
The taskset utility only works on CPU affinity and has no knowledge of other NUMA resources such as memory nodes. If you want to perform process binding in conjunction with NUMA, use the numactl command instead of taskset.
For more information about the NUMA API, see Andi Kleen’s whitepaper An NUMA API for Linux.
1.6. Ensuring that debugfs is mounted Copy linkLink copied to clipboard!
The debugfs file system is specially designed for debugging and making information available to users. It is mounted automatically in RHEL 8 in the /sys/kernel/debug/ directory.
The debugfs file system is mounted using the ftrace and trace-cmd commands.
Procedure
To verify that debugfs is mounted:
Run the following command:
mount | grep ^debugfs
# mount | grep ^debugfs debugfs on /sys/kernel/debug type debugfs (rw,nosuid,nodev,noexec,relatime,seclabel)Copy to Clipboard Copied! Toggle word wrap Toggle overflow If
debugfsis mounted, the command displays the mount point and properties fordebugfs.If
debugfsis not mounted, the command returns nothing.
1.7. InfiniBand in RHEL for Real Time Copy linkLink copied to clipboard!
InfiniBand is a type of communications architecture often used to increase bandwidth, improve quality of service (QOS), and provide for failover. It can also be used to improve latency by using the Remote Direct Memory Access (RDMA) mechanism.
The support for InfiniBand on RHEL for Real Time is the same as the support available on Red Hat Enterprise Linux 9. For more information, see Configuring InfiniBand and RDMA networks.
1.8. Using RoCEE and High-Performance Networking Copy linkLink copied to clipboard!
RoCEE (RDMA over Converged Enhanced Ethernet) is a protocol that implements Remote Direct Memory Access (RDMA) over Ethernet networks. It allows you to maintain a consistent, high-speed environment in your data centers, while providing deterministic, low latency data transport for critical transactions.
High Performance Networking (HPN) is a set of shared libraries that provides RoCEE interfaces into the kernel. Instead of going through an independent network infrastructure, HPN places data directly into remote system memory using standard Ethernet infrastructure, resulting in less CPU overhead and reduced infrastructure costs.
Support for RoCEE and HPN under RHEL for Real Time does not differ from the support offered under RHEL 8.
1.9. Tuning containers for RHEL for real-time Copy linkLink copied to clipboard!
When testing the real-time workload in a container running on the main RHEL kernel, add the following options to the podman run command as necessary:
-
--cpuset-cpus=<cpu_list>specifies the list of isolated CPU cores to use. If you have more than one CPU, use a comma-separated or a hyphen-separated range of CPUs that a container can use. -
--cpuset-mems=<number-of-memory-nodes>specifies Non-Uniform Memory Access (NUMA) memory nodes to use, and, therefore avoids cross-NUMA node memory access. -
--memory-reservation=<limit> <my_rt_container_image>verifies that the minimal amount of memory required by the real-time workload running on the container, is available at container start time.
Procedure
Start the real-time workloads in a container:
podman run --cpuset-cpus=<cpu_list> --cpuset-mems=<number_of_memory_nodes> --memory-reservation=<limit> <my_rt_container_image>
# podman run --cpuset-cpus=<cpu_list> --cpuset-mems=<number_of_memory_nodes> --memory-reservation=<limit> <my_rt_container_image>Copy to Clipboard Copied! Toggle word wrap Toggle overflow