8.4. Resolving Common Queuing/Frame Loss Issues
By far, the most common reason for frame loss is a queue overrun. The kernel sets a limit to the length of a queue, and in some cases the queue fills faster than it drains. When this occurs for too long, frames start to get dropped.
As illustrated in Figure 8.1, “Network receive path diagram”, there are two major queues in the receive path: the NIC hardware buffer and the socket queue. Both queues need to be configured accordingly to protect against queue overruns.
8.4.1. NIC Hardware Buffer
The NIC fills its hardware buffer with frames; the buffer is then drained by the
softirq
, which the NIC asserts via an interrupt. To interrogate the status of this queue, use the following command:
ethtool -S ethX
Replace
ethX
with the NIC's corresponding device name. This will display how many frames have been dropped within ethX
. Often, a drop occurs because the queue runs out of buffer space in which to store frames.
There are different ways to address this problem, namely:
- Input traffic
- You can help prevent queue overruns by slowing down input traffic. This can be achieved by filtering, reducing the number of joined multicast groups, lowering broadcast traffic, and the like.
- Queue length
- Alternatively, you can also increase the queue length. This involves increasing the number of buffers in a specified queue to whatever maximum the driver will allow. To do so, edit the
rx
/tx
ring parameters ofethX
using:ethtool --set-ring ethX
Append the appropriaterx
ortx
values to the aforementioned command. For more information, refer toman ethtool
. - Device weight
- You can also increase the rate at which a queue is drained. To do this, adjust the NIC's device weight accordingly. This attribute refers to the maximum number of frames that the NIC can receive before the
softirq
context has to yield the CPU and reschedule itself. It is controlled by the/proc/sys/net/core/dev_weight
variable.
Most administrators have a tendency to choose the third option. However, keep in mind that there are consequences for doing so. Increasing the number of frames that can be received from a NIC in one iteration implies extra CPU cycles, during which no applications can be scheduled on that CPU.