Chapter 1. Tuning network adapter settings
In high-speed networks with 40 Gbps and faster, certain default values of network adapter-related kernel settings can be a cause of packet drops and performance degradation. Tuning these settings can prevent such problems.
1.1. Increasing the ring buffer size to reduce a high packet drop rate by using nmcli Copy linkLink copied to clipboard!
Increase the size of an Ethernet device’s ring buffers if the packet drop rate causes applications to report a loss of data, timeouts, or other issues.
Receive ring buffers are shared between the device driver and network interface controller (NIC). The card assigns a transmit (TX) and receive (RX) ring buffer. As the name implies, the ring buffer is a circular buffer where an overflow overwrites existing data. There are two ways to move data from the NIC to the kernel, hardware interrupts and software interrupts, also called SoftIRQs.
The kernel uses the RX ring buffer to store incoming packets until the device driver can process them. The device driver drains the RX ring, typically by using SoftIRQs, which puts the incoming packets into a kernel data structure called an sk_buff or skb to begin its journey through the kernel and up to the application that owns the relevant socket.
The kernel uses the TX ring buffer to hold outgoing packets which should be sent to the network. These ring buffers reside at the bottom of the stack and are a crucial point at which packet drop can occur, which in turn will adversely affect network performance.
Procedure
Display the packet drop statistics of the interface:
# ethtool -S enp1s0 ... rx_queue_0_drops: 97326 rx_queue_1_drops: 63783 ...Note that the output of the command depends on the network card and the driver.
High values in
discardordropcounters indicate that the available buffer fills up faster than the kernel can process the packets. Increasing the ring buffers can help to avoid such loss.Display the maximum ring buffer sizes:
# ethtool -g enp1s0 Ring parameters for enp1s0: Pre-set maximums: RX: 4096 RX Mini: 0 RX Jumbo: 16320 TX: 4096 Current hardware settings: RX: 255 RX Mini: 0 RX Jumbo: 0 TX: 255If the values in the
Pre-set maximumssection are higher than in theCurrent hardware settingssection, you can change the settings in the next steps.Identify the NetworkManager connection profile that uses the interface:
# nmcli connection show NAME UUID TYPE DEVICE Example-Connection a5eb6490-cc20-3668-81f8-0314a27f3f75 ethernet enp1s0Update the connection profile, and increase the ring buffers:
To increase the RX ring buffer, enter:
# nmcli connection modify Example-Connection ethtool.ring-rx 4096To increase the TX ring buffer, enter:
# nmcli connection modify Example-Connection ethtool.ring-tx 4096
Reload the NetworkManager connection:
# nmcli connection up Example-ConnectionImportantDepending on the driver your NIC uses, changing in the ring buffer can shortly interrupt the network connection.
1.2. Increasing the ring buffer size to reduce a high packet drop rate by using the network RHEL system role Copy linkLink copied to clipboard!
Increase the size of an Ethernet device’s ring buffers if the packet drop rate causes applications to report a loss of data, timeouts, or other issues.
Ring buffers are circular buffers where an overflow overwrites existing data. The network card assigns a transmit (TX) and receive (RX) ring buffer. Receive ring buffers are shared between the device driver and the network interface controller (NIC). Data can move from NIC to the kernel through either hardware interrupts or software interrupts, also called SoftIRQs.
The kernel uses the RX ring buffer to store incoming packets until the device driver can process them. The device driver drains the RX ring, typically by using SoftIRQs, which puts the incoming packets into a kernel data structure called an sk_buff or skb to begin its journey through the kernel and up to the application that owns the relevant socket.
The kernel uses the TX ring buffer to hold outgoing packets which should be sent to the network. These ring buffers reside at the bottom of the stack and are a crucial point at which packet drop can occur, which in turn will adversely affect network performance.
You configure ring buffer settings in the NetworkManager connection profiles. By using Ansible and the network RHEL system role, you can automate this process and remotely configure connection profiles on the hosts defined in a playbook.
You cannot use the network RHEL system role to update only specific values in an existing connection profile. The role ensures that a connection profile exactly matches the settings in a playbook. If a connection profile with the same name already exists, the role applies the settings from the playbook and resets all other settings in the profile to their defaults. To prevent resetting values, always specify the whole configuration of the network connection profile in the playbook, including the settings that you do not want to change.
Prerequisites
- You have prepared the control node and the managed nodes.
- You are logged in to the control node as a user who can run playbooks on the managed nodes.
-
The account you use to connect to the managed nodes has
sudopermissions for these nodes. - You know the maximum ring buffer sizes that the device supports.
Procedure
Create a playbook file, for example,
~/playbook.yml, with the following content:--- - name: Configure the network hosts: managed-node-01.example.com tasks: - name: Ethernet connection profile with dynamic IP address setting and increased ring buffer sizes ansible.builtin.include_role: name: redhat.rhel_system_roles.network vars: network_connections: - name: enp1s0 type: ethernet autoconnect: yes ip: dhcp4: yes auto6: yes ethtool: ring: rx: 4096 tx: 4096 state: upThe settings specified in the example playbook include the following:
rx: <value>- Sets the maximum number of received ring buffer entries.
tx: <value>- Sets the maximum number of transmitted ring buffer entries.
For details about all variables used in the playbook, see the
/usr/share/ansible/roles/rhel-system-roles.network/README.mdfile on the control node.Validate the playbook syntax:
$ ansible-playbook --syntax-check ~/playbook.ymlNote that this command only validates the syntax and does not protect against a wrong but valid configuration.
Run the playbook:
$ ansible-playbook ~/playbook.yml
Verification
Display the maximum ring buffer sizes:
# ansible managed-node-01.example.com -m command -a 'ethtool -g enp1s0' managed-node-01.example.com | CHANGED | rc=0 >> ... Current hardware settings: RX: 4096 RX Mini: 0 RX Jumbo: 0 TX: 4096
1.3. Tuning the network device backlog queue to avoid packet drops Copy linkLink copied to clipboard!
When a network card receives packets and before the kernel protocol stack processes them, the kernel stores these packets in backlog queues. The kernel maintains a separate queue for each CPU core.
If the backlog queue for a core is full, the kernel drops all further incoming packets that the netif_receive_skb() kernel function assigns to this queue. If the server contains a 10 Gbps or faster network adapter or multiple 1 Gbps adapters, tune the backlog queue size to avoid this problem.
Prerequisites
- A 10 Gbps or faster or multiple 1 Gbps network adapters
Procedure
Determine whether tuning the backlog queue is needed, display the counters in the
/proc/net/softnet_statfile:# awk '{for (i=1; i<=NF; i++) printf strtonum("0x" $i) (i==NF?"\n":" ")}' /proc/net/softnet_stat | column -t 221951548 0 0 0 0 0 0 0 0 0 0 0 0 192058677 18862 0 0 0 0 0 0 0 0 0 0 1 455324886 0 0 0 0 0 0 0 0 0 0 0 2 ...This
awkcommand converts the values in/proc/net/softnet_statfrom hexadecimal to decimal format and displays them in table format. Each line represents a CPU core starting with core 0.The relevant columns are:
- First column: The total number of received frames
- Second column: The number of dropped frames because of a full backlog queue
- Last column: The CPU core number
If the values in the second column of the
/proc/net/softnet_statfile increment over time, increase the size of the backlog queue:Display the current backlog queue size:
# sysctl net.core.netdev_max_backlog net.core.netdev_max_backlog = 1000Create the
/etc/sysctl.d/10-netdev_max_backlog.conffile with the following content:net.core.netdev_max_backlog = 2000Set the
net.core.netdev_max_backlogparameter to a double of the current value.Load the settings from the
/etc/sysctl.d/10-netdev_max_backlog.conffile:# sysctl -p /etc/sysctl.d/10-netdev_max_backlog.conf
Verification
Monitor the second column in the
/proc/net/softnet_statfile:# awk '{for (i=1; i<=NF; i++) printf strtonum("0x" $i) (i==NF?"\n":" ")}' /proc/net/softnet_stat | column -tIf the values still increase, double the
net.core.netdev_max_backlogvalue again. Repeat this process until the packet drop counters no longer increase.
1.4. Increasing the transmit queue length of a NIC to reduce the number of transmit errors Copy linkLink copied to clipboard!
The kernel stores packets in a transmit queue before transmitting them. The default queue length (1000 packets) is often sufficient for 10 Gbps networks. However, in faster networks, or if you encounter an increasing number of transmit errors on an adapter, increase the queue length.
Procedure
Display the current transmit queue length:
# ip -s link show enp1s0 2: enp1s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP mode DEFAULT group default qlen 1000 ...In this example, the transmit queue length (
qlen) of theenp1s0interface is1000.Monitor the dropped packets counter of a network interface’s software transmit queue:
# tc -s qdisc show dev enp1s0 qdisc fq_codel 0: root refcnt 2 limit 10240p flows 1024 quantum 1514 target 5ms interval 100ms memory_limit 32Mb ecn drop_batch 64 Sent 16889923 bytes 426862765 pkt (dropped 191980, overlimits 0 requeues 2) ...If you encounter a high or increasing transmit error count, set a higher transmit queue length:
Identify the NetworkManager connection profile that uses this interface:
# nmcli connection show NAME UUID TYPE DEVICE Example-Connection a5eb6490-cc20-3668-81f8-0314a27f3f75 ethernet enp1s0Update the connection profile, and increase the transmit queue length:
# nmcli connection modify Example-Connection link.tx-queue-length 2000Set the queue length to double of the current value.
Apply the changes:
# nmcli connection up Example-Connection
Verification
Display the transmit queue length:
# ip -s link show enp1s0 2: enp1s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP mode DEFAULT group default qlen 2000 ...Monitor the dropped packets counter:
# tc -s qdisc show dev enp1s0If the
droppedcounter still increases, double the transmit queue length again. Repeat this process until the counter no longer increases.