Home
Products
Red Hat Enterprise Linux for Real Time
8
Optimizing RHEL 8 for Real Time for low latency operation
Chapter 8. Setting CPU affinity on RHEL for Real Time

Chapter 8. Setting CPU affinity on RHEL for Real Time

All threads and interrupt sources in the system has a processor affinity property. The operating system scheduler uses this information to determine the threads and interrupts to run on a CPU. By setting processor affinity, along with effective policy and priority settings, you can achieve maximum possible performance. Applications always compete for resources, especially CPU time, with other processes. Depending on the application, related threads are often run on the same core. Alternatively, one application thread can be allocated to one core.

Systems that perform multitasking are naturally more prone to indeterminism. Even high priority applications can be delayed from executing while a lower priority application is in a critical section of code. After the low priority application exits the critical section, the kernel safely preempts the low priority application and schedules the high priority application on the processor. Additionally, migrating processes from one CPU to another can be costly due to cache invalidation. RHEL for Real Time includes tools that address some of these issues and allows latency to be better controlled.

Affinity is represented as a bit mask, where each bit in the mask represents a CPU core. If the bit is set to 1, then the thread or interrupt runs on that core; if 0 then the thread or interrupt is excluded from running on the core. The default value for an affinity bit mask is all ones, meaning the thread or interrupt can run on any core in the system.

By default, processes can run on any CPU. However, by changing the affinity of the process, you can define a process to run on a predetermined set of CPUs. Child processes inherit the CPU affinities of their parents.

Setting the following typical affinity setups can achieve maximum possible performance:

Using a single CPU core for all system processes and setting the application to run on the remainder of the cores.
Configuring a thread application and a specific kernel thread, such as network softirq or a driver thread, on the same CPU.
Pairing the producer-consumer threads on each CPU. Producers and consumers are two classes of threads, where producers insert data into the buffer and consumers remove it from the buffer.

The usual good practice for tuning affinities on a real-time system is to determine the number of cores required to run the application and then isolate those cores. You can achieve this with the Tuna tool or with the shell scripts to modify the bit mask value, such as the taskset command. The taskset command changes the affinity of a process and modifying the /proc/ file system entry changes the affinity of an interrupt.

8.1. Tuning processor affinity using the taskset command
Copy link

On real-time, the taskset command helps to set or retrieve the CPU affinity of a running process. The taskset command takes -p and -c options. The -p or --pid option work an existing process and does not start a new task. The -c or --cpu-list specify a numerical list of processors instead of a bitmask. The list can contain more than one items, separated by comma, and a range of processors. For example, 0,5,7,9-11.

Prerequisites

You have root permissions on the system.

Procedure

To verify the process affinity for a specific process:
```
# taskset -p -c 1000
pid 1000’s current affinity list: 0,1
```
The command prints the affinity of the process with PID 1000. The process is set up to use CPU 0 or CPU 1.
- Optional: To configure a specific CPU to bind a process:
  # taskset -p -c 1 1000 pid 1000’s current affinity list: 0,1 pid 1000’s new affinity list: 1
- Optional: To define more than one CPU affinity:
  # taskset -p -c 0,1 1000 pid 1000’s current affinity list: 1 pid 1000’s new affinity list: 0,1
- Optional: To configure a priority level and a policy on a specific CPU:
  # taskset -c 5 chrt -f 78 /bin/my-app
  For further granularity, you can also specify the priority and policy. In the example, the command runs the /bin/my-app application on CPU 5 with SCHED_FIFO policy and a priority value of 78.

8.2. Setting processor affinity using the sched_setaffinity() system call
Copy link

You can also set processor affinity using the real-time sched_setaffinity() system call.

Prerequisites

You have root permissions on the system.

Procedure

To set the processor affinity with sched_setaffinity():

#define _GNU_SOURCE
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <errno.h>
#include <sched.h>

int main(int argc, char **argv)
{
  int i, online=0;
  ulong ncores = sysconf(_SC_NPROCESSORS_CONF);
  cpu_set_t *setp = CPU_ALLOC(ncores);
  ulong setsz = CPU_ALLOC_SIZE(ncores);

  CPU_ZERO_S(setsz, setp);

  if (sched_getaffinity(0, setsz, setp) == -1) {
    perror("sched_getaffinity(2) failed");
    exit(errno);
  }

  for (i=0; i < CPU_COUNT_S(setsz, setp); i) {
    if (CPU_ISSET_S(i, setsz, setp))
      online;
  }

  printf("%d cores configured, %d cpus allowed in affinity mask\n", ncores, online);
  CPU_FREE(setp);
}

8.3. Isolating a single CPU to run high utilization tasks
Copy link

With the cpusets mechanism, you can assign a set of CPUs and memory nodes for SCHED_DEADLINE tasks. In a task set that has high and low CPU utilizing tasks, isolating a CPU to run the high utilization task and scheduling small utilization tasks on different sets of CPU, enables all tasks to meet the assigned runtime. You must manually add the configuration for the `cpusets'

Prerequisites

You have root permissions on the system.

Procedure

Create two control groups named as cluster and partition:

# cd /sys/fs/cgroup
# echo +cpuset > cgroup.subtree_control
# mkdir cluster
# mkdir partition
# echo +cpuset | tee cluster/cgroup.subtree_control partition/cgroup.subtree_control

In the cluster control group, schedule the low utilization tasks to run on CPU 1 to 7. Verify the memory size, and name control group as exclusive:
```
# cd cluster
# echo 1-7 | tee cpuset.cpus cpuset.cpus.exclusive
# echo root > cpuset.cpus.partition
```

Move all low utilization tasks to the cluster control group:

# ps -eLo lwp | while read thread; do echo $thread > cgroup.procs ; done

In the partition control group, assign the high utilization task:

# echo 0 | tee cpuset.cpus cpuset.cpus.exclusive
# echo isolated > cpuset.cpus.partition

Add the shell to the partition control group and start:
```
# echo $$ > cgroup.procs
```
With this setup, the task isolated in the partition control group does not interfere with the task in the cluster control group. This enables all real-time tasks to meet the scheduler deadline. In case you are using deadline scheduler, the deadline typically met without this change. Note that other tasks have their own deadlines.

In case the application is prepared to use proper pinning the noise can be further reduced by adjusting the cgroups giving more cpus to the partition cgroup and assigning all real-time tasks to it:

# cd ..
# echo 4-7 | tee cluster/{cpuset.cpus,cpuset.cpus.exclusive}
# echo 0-3 | tee partition/{cpuset.cpus,cpuset.cpus.exclusive}

8.4. Reducing CPU performance spikes
Copy link

A common source of latency spikes is when multiple CPUs contend on common locks in the kernel timer tick handler. The usual lock responsible for the contention is xtime_lock, which is used by the timekeeping system and the Read-Copy-Update (RCU) structure locks. By using skew_tick=1, you can offset the timer tick per CPU to start at a different time and avoid potential lock conflicts.

The skew_tick kernel command line parameter might prevent latency fluctuations on moderate to large systems with large core-counts and have latency-sensitive workloads.

Prerequisites

You have administrator permissions.

Procedure

Enable the skew_tick=1 parameter with grubby.

# grubby --update-kernel=ALL --args="skew_tick=1"

Reboot for changes to take effect.
```
# reboot
```

Note

Enabling skew_tick=1 causes a significant increase in power consumption and, therefore, you must enable the skew boot parameter only if you are running latency sensitive real-time workloads and consistent latency is an important consideration over power consumption.

Verification

Display the /proc/cmdline file and ensure skew_tick=1 is specified. The /proc/cmdline file shows the parameters passed to the kernel.

Check the new settings in the /proc/cmdline file.
```
# cat /proc/cmdline
```

8.5. Lowering CPU usage by disabling the PC card daemon
Copy link

The pcscd daemon manages connections to parallel communication (PC or PCMCIA) and smart card (SC) readers. Although pcscd is usually a low priority task, it can often use more CPU than any other daemon. Therefore, the additional background noise can lead to higher preemption costs to real-time tasks and other undesirable impacts on determinism.

Prerequisites

You have root permissions on the system.

Procedure

Check the status of the pcscd daemon.

# systemctl status pcscd
● pcscd.service - PC/SC Smart Card Daemon
     Loaded: loaded (/usr/lib/systemd/system/pcscd.service; indirect; vendor preset: disabled)
     Active: active (running) since Mon 2021-03-01 17:15:06 IST; 4s ago
TriggeredBy: ● pcscd.socket
       Docs: man:pcscd(8)
   Main PID: 2504609 (pcscd)
      Tasks: 3 (limit: 18732)
     Memory: 1.1M
        CPU: 24ms
     CGroup: /system.slice/pcscd.service
             └─2504609 /usr/sbin/pcscd --foreground --auto-exit

The Active parameter shows the status of the pcsd daemon.

If the pcsd daemon is running, stop it.

# systemctl stop pcscd
Warning: Stopping pcscd.service, but it can still be activated by:
  pcscd.socket

Configure the system to ensure that the pcsd daemon does not restart when the system boots.

# systemctl disable pcscd
Removed /etc/systemd/system/sockets.target.wants/pcscd.socket.

Verification

Check the status of the pcscd daemon.

# systemctl status pcscd
● pcscd.service - PC/SC Smart Card Daemon
     Loaded: loaded (/usr/lib/systemd/system/pcscd.service; indirect; vendor preset: disabled)
     Active: inactive (dead) since Mon 2021-03-01 17:10:56 IST; 1min 22s ago
TriggeredBy: ● pcscd.socket
       Docs: man:pcscd(8)
   Main PID: 4494 (code=exited, status=0/SUCCESS)
        CPU: 37ms

Ensure that the value for the Active parameter is inactive (dead).

Chapter 8. Setting CPU affinity on RHEL for Real Time

8.1. Tuning processor affinity using the taskset command
Copy link

8.2. Setting processor affinity using the sched_setaffinity() system call
Copy link

8.3. Isolating a single CPU to run high utilization tasks
Copy link

8.4. Reducing CPU performance spikes
Copy link

8.5. Lowering CPU usage by disabling the PC card daemon
Copy link

Learn

Try, buy, & sell

Communities

About Red Hat

Making open source more inclusive

About Red Hat Documentation

Theme

Red Hat legal and privacy links

Red Hat legal and privacy links

Chapter 8. Setting CPU affinity on RHEL for Real Time

8.1. Tuning processor affinity using the taskset commandCopy linkLink copied to clipboard!

8.2. Setting processor affinity using the sched_setaffinity() system callCopy linkLink copied to clipboard!

8.3. Isolating a single CPU to run high utilization tasksCopy linkLink copied to clipboard!

8.4. Reducing CPU performance spikesCopy linkLink copied to clipboard!

8.5. Lowering CPU usage by disabling the PC card daemonCopy linkLink copied to clipboard!

Learn

Try, buy, & sell

Communities

About Red Hat

Making open source more inclusive

About Red Hat Documentation

Theme

Red Hat legal and privacy links

Red Hat legal and privacy links

8.1. Tuning processor affinity using the taskset command
Copy link

8.2. Setting processor affinity using the sched_setaffinity() system call
Copy link

8.3. Isolating a single CPU to run high utilization tasks
Copy link

8.4. Reducing CPU performance spikes
Copy link

8.5. Lowering CPU usage by disabling the PC card daemon
Copy link