Chapter 1. Before You Start Tuning Your Red Hat Enterprise Linux for Real Time System
Red Hat Enterprise Linux for Real Time is designed to be used on well-tuned systems for applications with extremely high determinism requirements. Kernel system tuning offers the vast majority of the improvement in determinism. For example, in many workloads thorough system tuning improves consistency of results by around 90%. This is why we typically recommend that customers first perform the Chapter 2, General System Tuning of standard Red Hat Enterprise Linux before using Red Hat Enterprise Linux for Real Time.
Things to Remember While You Are Tuning Your Red Hat Enterprise Linux for Real Time Kernel
- Be PatientReal-time tuning is an iterative process; you will almost never be able to tweak a few variables and know that the change is the best that can be achieved. Be prepared to spend days or weeks narrowing down the set of tunings that work best for your system.Additionally, always make long test runs. Changing some tuning parameters then doing a five minute test run is not a good validation of a set of tunes. Make the length of your test runs adjustable and run them for longer than a few minutes. Try to narrow down to a few different tuning sets with test runs of a few hours, then run those sets for many hours or days at a time, to try and catch corner-cases of max latencies or resource exhaustion.
- Be AccurateBuild a measurement mechanism into your application, so that you can accurately gauge how a particular set of tuning changes affect the application's performance. Anecdotal evidence (for example, "The mouse moves more smoothly") is usually wrong and varies from person to person. Do hard measurements and record them for later analysis.
- Be MethodicalIt is very tempting to make multiple changes to tuning variables between test runs, but doing so means that you do not have a way to narrow down which tune affected your test results. Keep the tuning changes between test runs as small as you can.
- Be ConservativeIt is also tempting to make large changes when tuning, but it is almost always better to make incremental changes. You will find that working your way up from the lowest to highest priority values will yield better results in the long run.
- Be SmartUse the tools you have available. The Tuna graphical tuning tool makes it easy to change processor affinities for threads and interrupts, thread priorities and to isolate processors for application use. The
taskset
andchrt
command line utilities allow you to do most of what Tuna does. If you run into performance problems, theftrace
andperf
tools can help locate latency issues. - Be FlexibleRather than hard-coding values into your application, use external tools to change policy, priority and affinity. This allows you to try many different combinations and simplifies your logic. Once you have found some settings that give good results, you can either add them to your application, or set up some startup logic to implement the settings when the application starts.
Scheduling Policies
Linux uses three main scheduling policies:
SCHED_OTHER
(sometimes calledSCHED_NORMAL
)- This is the default thread policy and has dynamic priority controlled by the kernel. The priority is changed based on thread activity. Threads with this policy are considered to have a real-time priority of 0 (zero).
SCHED_FIFO
(First in, first out)- A real-time policy with a priority range of from 1 - 99, with 1 being the lowest and 99 the highest.
SCHED_FIFO
threads always have a higher priority thanSCHED_OTHER
threads (for example, aSCHED_FIFO
thread with a priority of1
will have a higher priority than anySCHED_OTHER
thread). Any thread created as aSCHED_FIFO
thread has a fixed priority and will run until it is blocked or preempted by a higher priority thread. SCHED_RR
(Round-Robin)SCHED_RR
is a modification ofSCHED_FIFO
. Threads with the same priority have a quantum and are round-robin scheduled among all equal prioritySCHED_RR
threads. This policy is rarely used.
1.1. Running Latency Tests and Interpreting Their Results
To verify that the potential hardware platform is suitable for real-time operations, you should run some latency and performance tests with the Real Time kernel. These tests can highlight BIOS or system tuning (including partitioning) issues that might be experienced under a load.
1.1.1. Preliminary Steps
Procedure 1.1. To successfully test your system and interpret the results:
- Check the vendor documentation for any tuning steps required for low latency operation.This step aims to reduce or remove any System Management Interrupts (SMIs) that would transition the system into System Management Mode (SMM). While a system is in SMM it is running firmware and not running operating system code, meaning any timers that expire while in SMM will have to wait until the system transitions back into normal operation. This can cause unexplained latencies since SMIs cannot be blocked by Linux and the only indication that we actually took an SMI may be found in vendor-specific performance counter registers.
Warning
Red Hat strongly recommends that you do not completely disable SMIs, as it can result in catastrophic hardware failure. - Ensure that RHEL-RT and
rt-tests
package is installed.This step verifies that you have tuned the system properly. - Run the
hwlatdetect
program.hwlatdetect
looks for hardware-firmware induced latencies by polling the clock-source and looking for unexplained gaps.Generally, you do not need to run any sort of load on the system while runninghwlatdetect
, since the program is looking for latencies introduced by hardware architecture or BIOS/EFI firmware.A typical output ofhwlatdetect
looks like this:#
hwlatdetect --duration=60s hwlatdetect: test duration 60 seconds detector: tracer parameters: Latency threshold: 10us Sample window: 1000000us Sample width: 500000us Non-sampling period: 500000us Output File: None Starting test test finished Max Latency: Below threshold Samples recorded: 0 Samples exceeding threshold: 0The above result represents a system that was tuned to minimize system interruptions from firmware.However, not all systems can be tuned to minimize system interruptions as shown below:#
hwlatdetect --duration=10s hwlatdetect: test duration 10 seconds detector: tracer parameters: Latency threshold: 10us Sample window: 1000000us Sample width: 500000us Non-sampling period: 500000us Output File: None Starting test test finished Max Latency: 18us Samples recorded: 10 Samples exceeding threshold: 10 SMIs during run: 0 ts: 1519674281.220664736, inner:17, outer:15 ts: 1519674282.721666674, inner:18, outer:17 ts: 1519674283.722667966, inner:16, outer:17 ts: 1519674284.723669259, inner:17, outer:18 ts: 1519674285.724670551, inner:16, outer:17 ts: 1519674286.725671843, inner:17, outer:17 ts: 1519674287.726673136, inner:17, outer:16 ts: 1519674288.727674428, inner:16, outer:18 ts: 1519674289.728675721, inner:17, outer:17 ts: 1519674290.729677013, inner:18, outer:17The above result shows that while doing consecutive reads of the systemclocksource
, there were 10 delays that showed up in the 15-18 us range.hwlatdetect
was using thetracer
mechanism as thedetector
for unexplained latencies. Previous versions used a kernel module rather thanftrace tracer
.parameters
report a latency and how the detection was run. The default latency threshold was 10 microseconds (10 us), the sample window was 1 second, the sampling window was 0.5 seconds.As a result,tracer
ran adetector
thread that ran for one half of each second of the specified duration.Thedetector
thread runs a loop which does the following pseudocode:t1 = timestamp() loop: t0 = timestamp() if (t0 - t1) > threshold outer = (t0 - t1) t1 = timestamp if (t1 - t0) > threshold inner = (t1 - t0) if inner or outer: print if t1 > duration: goto out goto loop out:
The inner loop comparison checks thatt0 - t1
does not exceed the specified threshold (10 us default). The outer loop comparison checks the time between the bottom of the loop and the topt1 - t0
. The time between consecutive reads of the timestamp register should be dozens of nanoseconds (essentially a register read, a comparison and a conditional jump) so any other delay between consecutive reads is introduced by firmware or by the way the system components were connected.Note
The values printed out by thehwlatdetector
forinner
andouter
are the best case maximum latency. The latency values are the deltas between consecutive reads of the current systemclocksource
(usually theTime Stamp Counter
orTSC
register, but potentially theHPET
orACPI
power management clock) and any delays between consecutive reads, introduced by the hardware-firmware combination.
After finding the suitable hardware-firmware combination, the next step is to test the real-time performance of the system while under a load.
1.1.2. Testing the System Real-time Performance under Load
RHEL-RT provides the
rteval
utility to test the system real-time performance under load. rteval
starts a heavy system load of SCHED_OTHER
tasks and then measures real-time response on each online CPU. The loads are a parallel make
of the Linux kernel tree in a loop and the hackbench
synthetic benchmark.
The goal is to bring the system into a state, where each core always has a job to schedule. The jobs perform various tasks, such as memory allocation/free, disk I/O, computational tasks, memory copies, and other.
Once the loads have started up,
rteval
then starts the cyclictest
measurement program. This program starts the SCHED_FIFO
real-time thread on each online core and then measures real-time scheduling response time. Each measurement thread takes a timestamp, sleeps for an interval, then takes another timestamp after waking up. The latency measured is t1 - (t0 + i)
, which is the difference between the actual wakeup time t1
, and the theoretical wakeup time of the first timestamp t0
plus the sleep interval i
.
The details for the
rteval
run are written to the XML
file along with the boot log for the system. Then the rteval-<date>-N.tar.bz2
file is generated. N
is a counter for the Nth run on <date>
. A report, generated from the XML
file, similar to the below, will be printed to the screen:
System: Statistics: Samples: 1440463955 Mean: 4.40624790712us Median: 0.0us Mode: 4us Range: 54us Min: 2us Max: 56us Mean Absolute Dev: 1.0776661507us Std.dev: 1.81821060672us CPU core 0 Priority: 95 Statistics: Samples: 36011847 Mean: 5.46434910711us Median: 4us Mode: 4us Range: 38us Min: 2us Max: 40us Mean Absolute Dev: 2.13785341159us Std.dev: 3.50155558554us
The report above brings details on the hardware, length of the run, options used, and the timing results, both per-cpu and system-wide. You can regenerate the report by running the
#
rteval --summarize rteval-<date>-n.tar.bz2
command.