Ce contenu n'est pas disponible dans la langue sélectionnée.

Chapter 27. Tuning scheduling policy


In Red Hat Enterprise Linux (RHEL), thread is the smallest unit of process execution. The system scheduler chooses which processor runs a thread and for how long, prioritizing overall system utilization. Consequently, thread scheduling might not optimized for specific application performance policy.

For example, if an application on a NUMA system is running on Node A when a processor on Node B becomes available. To keep the processor on Node B busy, the scheduler moves one of the application’s threads to Node B. However, the application thread still requires access to memory on Node A. Accessing this memory takes longer. The thread now runs on Node B, making Node A memory remote rather than local. Running on Node B might take longer than waiting for Node A. Local memory access often outweighs the benefit of migration.

27.1. Categories of scheduling policies

Performance-sensitive applications often benefit from the designer or administrator determining where threads are run. The Linux scheduler implements a number of scheduling policies that determine where and for how long a thread runs. The following are the two major categories of scheduling policies:

Normal policies
Normal threads are used for tasks of normal priority.
Realtime policies
Real-time policies are used for time-sensitive tasks that must be completed without interruptions. Real-time threads are not subject to time slicing. This means the thread runs until they block, exit, voluntarily yield, or are preempted by a higher priority thread.

The lowest priority realtime thread is scheduled before any thread with a normal policy. For more information, see the sched(7), sched_setaffinity(2), sched_getaffinity(2), sched_setscheduler(2), and sched_getscheduler(2) man pages on your system.

27.2. Static priority scheduling with SCHED_FIFO

The SCHED_FIFO, also called static priority scheduling, is a real-time policy that defines a fixed priority for each thread. This policy enables administrators to improve event response time and reduce latency. Avoid using this policy for an extended period of time for time-sensitive tasks.

The scheduler runs the highest-priority SCHED_FIFO thread from the list of ready threads. The SCHED_FIFO priority levels range from integer 1 to 99, where 99 is the highest. Start with a lower number and increase priority only when you identify latency issues.

Warning

Because real-time threads are not subject to time slicing, avoid setting a priority of 99. This keeps your process at the same priority level as migration and watchdog threads. If your thread goes into a computational loop and these threads are blocked, they cannot run. Systems with a single processor will eventually hang in this situation.

Administrators can limit SCHED_FIFO bandwidth to prevent realtime application programmers from initiating realtime tasks that monopolize the processor. The following are some of the parameters used in this policy:

/proc/sys/kernel/sched_rt_period_us
This parameter defines the time period, in microseconds, that is considered to be one hundred percent of the processor bandwidth. The default value is 1000000 μs or 1 second.
/proc/sys/kernel/sched_rt_runtime_us
This parameter defines the time period, in microseconds, that is devoted to running real-time threads. The default value is 950000 μs or 0.95 seconds.

27.3. Round robin priority scheduling with SCHED_RR

The SCHED_RR is a round-robin variant of the SCHED_FIFO. This policy is useful when multiple threads need to run at the same priority level. Like SCHED_FIFO, SCHED_RR is a realtime policy that defines a fixed priority for each thread. The scheduler scans the list of all SCHED_RR threads in order of priority. It then schedules the highest priority thread that is ready to run.

However, unlike SCHED_FIFO, threads that have the same priority are scheduled in a round-robin style within a certain time slice. You can set the value of this time slice in milliseconds with the sched_rr_timeslice_ms kernel parameter in the /proc/sys/kernel/sched_rr_timeslice_ms file. The lowest value is 1 millisecond.

27.4. Normal scheduling with SCHED_OTHER

The SCHED_OTHER is the default scheduling policy. SCHED_OTHER uses the Completely Fair Scheduler (CFS) to enable fair processor access to all threads scheduled with this policy. SCHED_OTHER is useful when there are a large number of threads or data throughput is a priority. It enables more efficient scheduling of threads over time.

The scheduler creates a dynamic priority list based on each thread’s niceness value. Administrators can change the niceness value of a process, but cannot change the scheduler’s dynamic priority list directly.

27.5. Setting scheduler policies

You can check and adjust scheduler policies and priorities by using the chrt command line tool. It can start new processes with the expected properties or change the properties of a running process. It can also be used for setting the policy at runtime.

Procedure

  1. View the process ID (PID) of the active processes:

    # ps
  2. Use the --pid or -p option with the ps command to view the details of the particular PID.
  3. Check the scheduling policy, PID, and priority of a particular process:

    # chrt -p 468
    pid 468's current scheduling policy: SCHED_FIFO
    pid 468's current scheduling priority: 85
    # chrt -p 476
    pid 476's current scheduling policy: SCHED_OTHER
    pid 476's current scheduling priority: 0
  4. Set the scheduling policy of a process, for example:

    1. To set the process with PID 1000 to SCHED_FIFO, with a priority of 50:

      # chrt -f -p 50 1000
    2. To set the process with PID 1000 to SCHED_OTHER, with a priority of 0:

      # chrt -o -p 0 1000
    3. To set the process with PID 1000 to SCHED_RR, with a priority of 10:

      # chrt -r -p 10 1000
    4. To start a new application with a particular policy and priority, specify the name of the application:

      # chrt -f 36 /bin/my-app

27.6. Policy options for the chrt command

You can view and set the scheduling policy of processes by using the chrt command. The following table describes the appropriate policy options, which can be used to set the scheduling policy of a process.

Expand
Table 27.1. Policy options for chrt command
Short optionLong optionDescription

-f

--fifo

Set schedule to SCHED_FIFO

-o

--other

Set schedule to SCHED_OTHER

-r

--rr

Set schedule to SCHED_RR

You can configure real-time priorities for services launched during the boot process by using the systemd service. The unit configuration directives are used to change the priority of a service during the boot process. The boot process priority can be changed by using the following directives in the service section:

  • CPUSchedulingPolicy=: Sets the CPU scheduling policy for executed processes. It is used to set other, fifo, and rr policies.
  • CPUSchedulingPriority=: Sets the CPU scheduling priority for executed processes. The available priority range depends on the selected CPU scheduling policy. For real-time scheduling policies, an integer between 1 (lowest priority) and 99 (highest priority) can be used.

You can change the priority of a service during the boot process and by using the mcelog service.

Prerequisites

Procedure

  1. View the scheduling priorities of running threads:

    # tuna --show_threads
        thread       ctxt_switches
        pid SCHED_ rtpri affinity voluntary nonvoluntary             cmd
      1      OTHER     0     0xff      3181          292         systemd
      2      OTHER     0     0xff       254            0        kthreadd
      3      OTHER     0     0xff         2            0          rcu_gp
      4      OTHER     0     0xff         2            0      rcu_par_gp
      6      OTHER     0        0         9            0 kworker/0:0H-kblockd
      7      OTHER     0     0xff      1301            1 kworker/u16:0-events_unbound
      8      OTHER     0     0xff         2            0    mm_percpu_wq
      9      OTHER     0        0       266            0     ksoftirqd/0
    [...]
  2. Create a supplementary mcelog service configuration directory file and insert the policy name and priority in this file:

    # cat << EOF > /etc/systemd/system/mcelog.service.d/priority.conf
    [Service]
    CPUSchedulingPolicy=fifo
    CPUSchedulingPriority=20
    EOF
  3. Reload the systemd scripts configuration:

    # systemctl daemon-reload
  4. Restart the mcelog service:

    # systemctl restart mcelog

Verification

  • Display the mcelog priority set by systemd issue:

    # tuna -t mcelog -P
    thread       ctxt_switches
    pid SCHED_ rtpri affinity voluntary nonvoluntary             cmd
    826     FIFO    20  0,1,2,3        13            0          mcelog

27.8. Priority map

Priorities are defined in groups, with some groups dedicated to certain kernel functions. For real-time scheduling policies, an integer between 1 (lowest priority) and 99 (highest priority) is used.

The following table describes the priority range, which can be used while setting the scheduling policy of a process.

Expand
Table 27.2. Description of the priority range
PriorityThreadsDescription

1

Low priority kernel threads

This priority is usually reserved for the tasks that need to be just prior to SCHED_OTHER.

2 - 49

Available for use

The range used for typical application priorities.

50

Default hard-IRQ value

 

51 - 98

High priority threads

Use this range for threads that run periodically and must have quick response times. Do not use this range for CPU-bound threads as you will starve interrupts.

99

Watchdogs and migration

System threads that must run at the highest priority.

27.9. TuneD cpu-partitioning profile

For tuning Red Hat Enterprise Linux for latency-sensitive workloads, use the cpu-partitioning TuneD profile. In RHEL 9 and later, you can perform low-latency tuning more efficiently by using the cpu-partitioning TuneD profile.

This profile is easily customizable according to the requirements for individual low-latency applications. The following figure is an example to demonstrate how to use the cpu-partitioning profile. This example uses the CPU and node layout.

cpu-partitioning

Configure the cpu-partitioning profile in the /etc/tuned/cpu-partitioning-variables.conf file by using the following configuration options:

Isolated CPUs with load balancing

In the cpu-partitioning figure, the blocks numbered from 4 to 23, are the default isolated CPUs. The kernel scheduler’s process load balancing is enabled on these CPUs. It is designed for low-latency processes with multiple threads that need the kernel scheduler load balancing. Configure the cpu-partitioning profile in the /etc/tuned/cpu-partitioning-variables.conf file by using the isolated_cores=cpu-list option. This option lists CPUs to isolate that will use the kernel scheduler load balancing.

The list of isolated CPUs is comma-separated or specify a range using a dash, such as 3-5. This option is mandatory. Any CPU missing from this list is automatically considered a housekeeping CPU.

Isolated CPUs without load balancing

In the cpu-partitioning figure, CPUs 2 and 3 are isolated and exclude kernel scheduler load balancing.

You can configure the cpu-partitioning profile in the /etc/tuned/cpu-partitioning-variables.conf file by using the no_balance_cores=cpu-list option, which lists CPUs to isolate that will not use the kernel scheduler load balancing.

Specifying the no_balance_cores option is optional, however any CPUs in this list must be a subset of the CPUs listed in the isolated_cores list. Application threads using these CPUs need to be pinned individually to each CPU.

Housekeeping CPUs
Any CPU not isolated in the cpu-partitioning-variables.conf file is automatically considered a housekeeping CPU. On the housekeeping CPUs, all services, daemons, user processes, movable kernel threads, interrupt handlers, and kernel timers are permitted to run.

You can tune a system for low-latency by using the TuneD’s cpu-partitioning profile. The application in this case uses:

  • One dedicated reader thread that reads data from the network will be pinned to CPU 2.
  • A large number of threads that process this network data will be pinned to CPUs 4-23.
  • A dedicated writer thread that writes the processed data to the network is pinned to CPU 3.

Prerequisites

  • You have installed the cpu-partitioning TuneD profile by using the dnf install tuned-profiles-cpu-partitioning command as root.

Procedure

  1. Edit the /etc/tuned/cpu-partitioning-variables.conf file with the following changes:

    1. Comment the isolated_cores=${f:calc_isolated_cores:1} line:

      # isolated_cores=${f:calc_isolated_cores:1}
    2. Add the following information for isolated CPUS:

      # All isolated CPUs:
      isolated_cores=2-23
      # Isolated CPUs without the kernel’s scheduler load balancing:
      no_balance_cores=2,3
    3. Set the cpu-partitioning TuneD profile:

      # tuned-adm profile cpu-partitioning
  2. Reboot the system.

    After rebooting, the system is tuned for low-latency, according to the isolation in the cpu-partitioning figure. The application can use taskset to pin the reader and writer threads to CPUs 2 and 3, and the remaining application threads on CPUs 4-23.

Verification

  • Verify that the isolated CPUs are not reflected in the Cpus_allowed_list field:

    # cat /proc/self/status | grep Cpu
    Cpus_allowed:	003
    Cpus_allowed_list:	0-1
  • To see affinity of all processes, enter:

    # ps -ae -o pid= | xargs -n 1 taskset -cp
    pid 1's current affinity list: 0,1
    pid 2's current affinity list: 0,1
    pid 3's current affinity list: 0,1
    pid 4's current affinity list: 0-5
    pid 5's current affinity list: 0,1
    pid 6's current affinity list: 0,1
    pid 7's current affinity list: 0,1
    pid 9's current affinity list: 0
    ...
    Note

    TuneD cannot change the affinity of some processes, mostly kernel processes. In this example, processes with PID 4 and 9 remain unchanged.

27.11. Customizing the cpu-partitioning TuneD profile

You can extend the TuneD profile to make additional tuning changes. For example, the cpu-partitioning profile sets the CPUs to use cstate=1. To use the cpu-partitioning profile but to additionally change the CPU cstate from cstate1 to cstate0, follow this procedure.

Procedure

  1. Create the /etc/tuned/my_profile directory:

    # mkdir /etc/tuned/profiles/my_profile
  2. Create a tuned.conf file in this directory, and add the following content:

    # vi /etc/tuned/profiles/my_profile/tuned.conf
    [main]
    summary=Customized tuning on top of cpu-partitioning
    include=cpu-partitioning
    [cpu]
    force_latency=cstate.id:0|1
  3. Use the new profile.

    # tuned-adm profile my_profile
    Note

    In the shared example, a reboot is not required. However, if the changes in the my_profile profile require a reboot to take effect, then reboot your machine.

Red Hat logoGithubredditYoutubeTwitter

Apprendre

Essayez, achetez et vendez

Communautés

À propos de la documentation Red Hat

Nous aidons les utilisateurs de Red Hat à innover et à atteindre leurs objectifs grâce à nos produits et services avec un contenu auquel ils peuvent faire confiance. Découvrez nos récentes mises à jour.

Rendre l’open source plus inclusif

Red Hat s'engage à remplacer le langage problématique dans notre code, notre documentation et nos propriétés Web. Pour plus de détails, consultez le Blog Red Hat.

À propos de Red Hat

Nous proposons des solutions renforcées qui facilitent le travail des entreprises sur plusieurs plates-formes et environnements, du centre de données central à la périphérie du réseau.

Theme

© 2026 Red Hat
Retour au début