Chapter 2. Tuning IRQ balancing
On multi-core hosts, you can increase the performance by ensuring that Red Hat Enterprise Linux balances interrupt queues (IRQs) to distribute the interrupts across CPU cores.
2.1. Interrupts and interrupt handlers
When a network interface controller (NIC) receives incoming data, it copies the data into kernel buffers by using Direct Memory Access (DMA). The NIC then notifies the kernel about this data by triggering a hard interrupt. These interrupts are processed by interrupt handlers which do minimal work, as they have already interrupted another task and the handlers cannot interrupt themselves. Hard interrupts can be costly in terms of CPU usage, especially if they use kernel locks.
The hard interrupt handler then leaves the majority of packet reception to a software interrupt request (SoftIRQ) process. The kernel can schedule these processes more fairly.
Example 2.1. Displaying hardware interrupts
The kernel stores the interrupt counters in the /proc/interrupts
file. To display the counters for a specific NIC, such as enp1s0
, enter:
grep -E "CPU|enp1s0" /proc/interrupts
# grep -E "CPU|enp1s0" /proc/interrupts
CPU0 CPU1 CPU2 CPU3 CPU4 CPU5
105: 141606 0 0 0 0 0 IR-PCI-MSI-edge enp1s0-rx-0
106: 0 141091 0 0 0 0 IR-PCI-MSI-edge enp1s0-rx-1
107: 2 0 163785 0 0 0 IR-PCI-MSI-edge enp1s0-rx-2
108: 3 0 0 194370 0 0 IR-PCI-MSI-edge enp1s0-rx-3
109: 0 0 0 0 0 0 IR-PCI-MSI-edge enp1s0-tx
Each queue has an interrupt vector in the first column assigned to it. The kernel initializes these vectors when the system boots or when a user loads the NIC driver module. Each receive (RX
) and transmit (TX
) queue is assigned a unique vector that informs the interrupt handler which NIC or queue the interrupt is coming from. The columns represent the number of incoming interrupts for every CPU core.
2.2. Software interrupt requests
Software interrupt requests (SoftIRQs) clear the receive ring buffers of network adapters. The kernel schedules SoftIRQ routines to run at a time when other tasks will not be interrupted. On Red Hat Enterprise Linux, processes named ksoftirqd/cpu-number
run these routines and call driver-specific code functions.
To monitor the SoftIRQ counters for each CPU core, enter:
watch -n1 'grep -E "CPU|NET_RX|NET_TX" /proc/softirqs'
# watch -n1 'grep -E "CPU|NET_RX|NET_TX" /proc/softirqs'
CPU0 CPU1 CPU2 CPU3 CPU4 CPU5 CPU6 CPU7
NET_TX: 49672 52610 28175 97288 12633 19843 18746 220689
NET_RX: 96 1615 789 46 31 1735 1315 470798
The command dynamically updates the output. Press Ctrl+C to interrupt the output.
2.3. NAPI Polling
New API (NAPI) is an extension to the device driver packet processing framework to improve the efficiency of incoming network packets. Hard interrupts are expensive because they usually cause a context switch from the kernel space to the user space and back again, and cannot interrupt themselves. Even with interrupt coalescence, the interrupt handler monopolizes a CPU core completely. With NAPI, the driver can use a polling mode instead of being hard-interrupted by the kernel for every packet that is received.
Under normal operation, the kernel issues an initial hard interrupt, followed by a soft interrupt request (SoftIRQ) handler that polls the network card by using NAPI routines. To prevent SoftIRQs from monopolizing a CPU core, the polling routine has a budget that determines the CPU time the SoftIRQ can consume. On completion of the SoftIRQ poll routine, the kernel exits the routine and schedules it to run again at a later time to repeat the process of receiving packets from the network card.
2.4. The irqbalance service
On systems both with and without Non-Uniform Memory Access (NUMA) architecture, the irqbalance
service balances interrupts effectively across CPU cores, based on system conditions. The irqbalance
service runs in the background and monitors the CPU load every 10 seconds. The service moves interrupts to other CPU cores when a CPU’s load is too high. As a result, the system performs well and handles load more efficiently.
If irqbalance
is not running, usually the CPU core 0 handles most of the interrupts. Even at moderate load, this CPU core can become busy trying to handle the workload of all the hardware in the system. As a consequence, interrupts or interrupt-based work can be missed or delayed. This can result in low network and storage performance, packet loss, and potentially other issues.
Disabling irqbalance
can negatively impact the network throughput.
On systems with only a single CPU core, the irqbalance
service provides no benefit and exits on its own.
By default, the irqbalance
service is enabled and running on Red Hat Enterprise Linux. To re-enable the service if you disabled it, enter:
systemctl enable --now irqbalance
# systemctl enable --now irqbalance
2.5. Increasing the time SoftIRQs can run on the CPU
If SoftIRQs do not run long enough, the rate of incoming data could exceed the kernel’s capability to drain the buffer fast enough. As a result, the network interface controller (NIC) buffers overflow and packets are lost.
If softirqd
processes could not retrieve all packets from interfaces in one NAPI polling cycle, it is an indicator that the SoftIRQs do not have enough CPU time. This could be the case on hosts with fast NICs, such as 10 Gbps and faster. If you increase the values of the net.core.netdev_budget
and net.core.netdev_budget_usecs
kernel parameters, you can control the time and number of packets softirqd
can process in a polling cycle.
Procedure
To determine whether tuning the
net.core.netdev_budget
parameter is needed, display the counters in the/proc/net/softnet_stat
file:awk '{for (i=1; i<=NF; i++) printf strtonum("0x" $i) (i==NF?"\n":" ")}' /proc/net/softnet_stat | column -t
# awk '{for (i=1; i<=NF; i++) printf strtonum("0x" $i) (i==NF?"\n":" ")}' /proc/net/softnet_stat | column -t 221951548 0 0 0 0 0 0 0 0 0 0 0 0 192058677 0 20380 0 0 0 0 0 0 0 0 0 1 455324886 0 0 0 0 0 0 0 0 0 0 0 2 ...
Copy to Clipboard Copied! This
awk
command converts the values in/proc/net/softnet_stat
from hexadecimal to decimal format and displays them in the table format. Each line represents a CPU core starting with core 0.The relevant columns are:
- First column: The total number of received frames.
-
Third column: The number times
softirqd
processes that could not retrieve all packets from interfaces in one NAPI polling cycle. - Last column: The CPU core number.
If the counters in the third column of the
/proc/net/softnet_stat
file increment over time, tune the system:Display the current values of the
net.core.netdev_budget_usecs
andnet.core.netdev_budget
parameters:sysctl net.core.netdev_budget_usecs net.core.netdev_budget
# sysctl net.core.netdev_budget_usecs net.core.netdev_budget net.core.netdev_budget_usecs = 2000 net.core.netdev_budget = 300
Copy to Clipboard Copied! With these settings,
softirqd
processes have up to 2000 microseconds to process up to 300 messages from the NIC in one polling cycle. Polling ends based on which condition is met first.Create the
/etc/sysctl.d/10-netdev_budget.conf
file with the following content:net.core.netdev_budget = 600 net.core.netdev_budget_usecs = 4000
net.core.netdev_budget = 600 net.core.netdev_budget_usecs = 4000
Copy to Clipboard Copied! Set the parameters to a double of their current values.
Load the settings from the
/etc/sysctl.d/10-netdev_budget.conf
file:sysctl -p /etc/sysctl.d/10-netdev_budget.conf
# sysctl -p /etc/sysctl.d/10-netdev_budget.conf
Copy to Clipboard Copied!
Verification
Monitor the third column in the
/proc/net/softnet_stat
file:awk '{for (i=1; i<=NF; i++) printf strtonum("0x" $i) (i==NF?"\n":" ")}' /proc/net/softnet_stat | column -t
# awk '{for (i=1; i<=NF; i++) printf strtonum("0x" $i) (i==NF?"\n":" ")}' /proc/net/softnet_stat | column -t
Copy to Clipboard Copied! If the values still increase, set
net.core.netdev_budget_usecs
andnet.core.netdev_budget
to higher values. Repeat this process until the counters no longer increase.