Chapter 25. Linux traffic control
Linux offers tools for managing and manipulating the transmission of packets. The Linux Traffic Control (TC) subsystem helps in policing, classifying, shaping, and scheduling network traffic. TC also mangles the packet content during classification by using filters and actions. The TC subsystem achieves this by using queuing disciplines (qdisc
), a fundamental element of the TC architecture.
The scheduling mechanism arranges or rearranges the packets before they enter or exit different queues. The most common scheduler is the First-In-First-Out (FIFO) scheduler. You can do the qdiscs
operations temporarily using the tc
utility or permanently using NetworkManager.
In Red Hat Enterprise Linux, you can configure default queueing disciplines in various ways to manage the traffic on a network interface.
25.1. Overview of queuing disciplines
Queuing disciplines (qdiscs
) help with queuing up and, later, scheduling of traffic transmission by a network interface. A qdisc
has two operations;
- enqueue requests so that a packet can be queued up for later transmission and
- dequeue requests so that one of the queued-up packets can be chosen for immediate transmission.
Every qdisc
has a 16-bit hexadecimal identification number called a handle
, with an attached colon, such as 1:
or abcd:
. This number is called the qdisc
major number. If a qdisc
has classes, then the identifiers are formed as a pair of two numbers with the major number before the minor, <major>:<minor>
, for example abcd:1
. The numbering scheme for the minor numbers depends on the qdisc
type. Sometimes the numbering is systematic, where the first-class has the ID <major>:1
, the second one <major>:2
, and so on. Some qdiscs
allow the user to set class minor numbers arbitrarily when creating the class.
- Classful
qdiscs
Different types of
qdiscs
exist and help in the transfer of packets to and from a networking interface. You can configureqdiscs
with root, parent, or child classes. The point where children can be attached are called classes. Classes inqdisc
are flexible and can always contain either multiple children classes or a single child,qdisc
. There is no prohibition against a class containing a classfulqdisc
itself, this facilitates complex traffic control scenarios.Classful
qdiscs
do not store any packets themselves. Instead, they enqueue and dequeue requests down to one of their children according to criteria specific to theqdisc
. Eventually, this recursive packet passing ends up where the packets are stored (or picked up from in the case of dequeuing).- Classless
qdiscs
-
Some
qdiscs
contain no child classes and they are called classlessqdiscs
. Classlessqdiscs
require less customization compared to classfulqdiscs
. It is usually enough to attach them to an interface.
Additional resources
-
tc(8)
andtc-actions(8)
man pages on your system
25.2. Introduction to connection tracking
At a firewall, the Netfilter
framework filters packets from an external network. After a packet arrives, Netfilter
assigns a connection tracking entry. Connection tracking is a Linux kernel networking feature for logical networks that tracks connections and identifies packet flow in those connections. This feature filters and analyzes every packet, sets up the connection tracking table to store connection status, and updates the connection status based on identified packets. For example, in the case of FTP connection, Netfilter
assigns a connection tracking entry to ensure all packets of FTP connection work in the same manner. The connection tracking entry stores a Netfilter
mark and tracks the connection state information in the memory table in which a new packet tuple maps with an existing entry. If the packet tuple does not map with an existing entry, the packet adds a new connection tracking entry that groups packets of the same connection.
You can control and analyze traffic on the network interface. The tc
traffic controller utility uses the qdisc
discipline to configure the packet scheduler in the network. The qdisc
kernel-configured queuing discipline enqueues packets to the interface. By using qdisc
, Kernel catches all the traffic before a network interface transmits it. Also, to limit the bandwidth rate of packets belonging to the same connection, use the tc qdisc
command.
To retrieve data from connection tracking marks into various fields, use the tc
utility with the ctinfo
module and the connmark
functionality. For storing packet mark information, the ctinfo
module copies the Netfilter
mark and the connection state information into a socket buffer (skb
) mark metadata field.
Transmitting a packet over a physical medium removes all the metadata of a packet. Before the packet loses its metadata, the ctinfo
module maps and copies the Netfilter
mark value to a specific value of the Diffserv code point (DSCP) in the packet’s IP
field.
Additional resources
-
tc(8)
andtc-ctinfo(8)
man pages on your system
25.3. Inspecting qdiscs of a network interface by using the tc
utility
By default, Red Hat Enterprise Linux systems use fq_codel
qdisc
. You can inspect the qdisc
counters using the tc
utility.
Procedure
Optional: View your current
qdisc
:# tc qdisc show dev enp0s1
Inspect the current
qdisc
counters:# tc -s qdisc show dev enp0s1 qdisc fq_codel 0: root refcnt 2 limit 10240p flows 1024 quantum 1514 target 5.0ms interval 100.0ms memory_limit 32Mb ecn Sent 1008193 bytes 5559 pkt (dropped 233, overlimits 55 requeues 77) backlog 0b 0p requeues 0
-
dropped
- the number of times a packet is dropped because all queues are full -
overlimits
- the number of times the configured link capacity is filled -
sent
- the number of dequeues
-
25.4. Updating the default qdisc
If you observe networking packet losses with the current qdisc
, you can change the qdisc
based on your network-requirements.
Procedure
View the current default
qdisc
:# sysctl -a | grep qdisc net.core.default_qdisc = fq_codel
View the
qdisc
of current Ethernet connection:# tc -s qdisc show dev enp0s1 qdisc fq_codel 0: root refcnt 2 limit 10240p flows 1024 quantum 1514 target 5.0ms interval 100.0ms memory_limit 32Mb ecn Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0) backlog 0b 0p requeues 0 maxpacket 0 drop_overlimit 0 new_flow_count 0 ecn_mark 0 new_flows_len 0 old_flows_len 0
Update the existing
qdisc
:# sysctl -w net.core.default_qdisc=pfifo_fast
To apply the changes, reload the network driver:
# modprobe -r NETWORKDRIVERNAME # modprobe NETWORKDRIVERNAME
Start the network interface:
# ip link set enp0s1 up
Verification
View the
qdisc
of the Ethernet connection:# tc -s qdisc show dev enp0s1 qdisc pfifo_fast 0: root refcnt 2 bands 3 priomap 1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1 Sent 373186 bytes 5333 pkt (dropped 0, overlimits 0 requeues 0) backlog 0b 0p requeues 0 ...
Additional resources
-
How to set
sysctl
variables on Red Hat Enterprise Linux (Red Hat Knowledgebase)
25.5. Temporarily setting the current qdisc of a network interface by using the tc utility
You can update the current qdisc
without changing the default one.
Procedure
Optional: View the current
qdisc
:# tc -s qdisc show dev enp0s1
Update the current
qdisc
:# tc qdisc replace dev enp0s1 root htb
Verification
View the updated current
qdisc
:# tc -s qdisc show dev enp0s1 qdisc htb 8001: root refcnt 2 r2q 10 default 0 direct_packets_stat 0 direct_qlen 1000 Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0) backlog 0b 0p requeues 0
25.6. Permanently setting the current qdisc of a network interface by using NetworkManager
You can update the current qdisc
value of a NetworkManager connection.
Procedure
Optional: View the current
qdisc
:# tc qdisc show dev enp0s1 qdisc fq_codel 0: root refcnt 2
Update the current
qdisc
:# nmcli connection modify enp0s1 tc.qdiscs 'root pfifo_fast'
Optional: To add another
qdisc
over the existingqdisc
, use the+tc.qdisc
option:# nmcli connection modify enp0s1 +tc.qdisc 'ingress handle ffff:'
Activate the changes:
# nmcli connection up enp0s1
Verification
View current
qdisc
the network interface:# tc qdisc show dev enp0s1 qdisc pfifo_fast 8001: root refcnt 2 bands 3 priomap 1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1 qdisc ingress ffff: parent ffff:fff1 ................
Additional resources
-
nm-settings(5)
man page on your system
25.7. Configuring the rate limiting of packets by using the tc-ctinfo utility
You can limit network traffic and prevent the exhaustion of resources in the network by using rate limiting. With rate limiting, you can also reduce the load on servers by limiting repetitive packet requests in a specific time frame. In addition, you can manage bandwidth rate by configuring traffic control in the kernel with the tc-ctinfo
utility.
The connection tracking entry stores the Netfilter
mark and connection information. When a router forwards a packet from the firewall, the router either removes or modifies the connection tracking entry from the packet. The connection tracking information (ctinfo
) module retrieves data from connection tracking marks into various fields. This kernel module preserves the Netfilter
mark by copying it into a socket buffer (skb
) mark metadata field.
Prerequisites
-
The
iperf3
utility is installed on a server and a client.
Procedure
Perform the following steps on the server:
Add a virtual link to the network interface:
# ip link add name ifb4eth0 numtxqueues 48 numrxqueues 48 type ifb
This command has the following parameters:
name ifb4eth0
- Sets a new virtual device interface.
numtxqueues 48
- Sets the number of transmit queues.
numrxqueues 48
- Sets the number of receive queues.
type ifb
- Sets the type of the new device.
Change the state of the interface:
# ip link set dev ifb4eth0 up
Add the
qdisc
attribute on the physical network interface and apply it to the incoming traffic:# tc qdisc add dev enp1s0 handle ffff: ingress
In the
handle ffff:
option, thehandle
parameter assigns the major numberffff:
as a default value to a classfulqdisc
on theenp1s0
physical network interface, whereqdisc
is a queueing discipline parameter to analyze traffic control.Add a filter on the physical interface of the
ip
protocol to classify packets:# tc filter add dev enp1s0 parent ffff: protocol ip u32 match u32 0 0 action ctinfo cpmark 100 action mirred egress redirect dev ifb4eth0
This command has the following attributes:
parent ffff:
-
Sets major number
ffff:
for the parentqdisc
. u32 match u32 0 0
-
Sets the
u32
filter tomatch
the IP headers of theu32
pattern. The first0
represents the second byte of IP header while the other0
is for the mask match telling the filter which bits to match. action ctinfo
- Sets action to retrieve data from the connection tracking mark into various fields.
cpmark 100
-
Copies the connection tracking mark (connmark)
100
into the packet IP header field. action mirred egress redirect dev ifb4eth0
-
Sets
action
mirred
to redirect the received packets to theifb4eth0
destination interface.
Add a classful
qdisc
to the interface:# tc qdisc add dev ifb4eth0 root handle 1: htb default 1000
This command sets the major number
1
to rootqdisc
and uses thehtb
hierarchy token bucket with classfulqdisc
of minor-id1000
.Limit the traffic on the interface to 1 Mbit/s with an upper limit of 2 Mbit/s:
# tc class add dev ifb4eth0 parent 1:1 classid 1:100 htb ceil 2mbit rate 1mbit prio 100
This command has the following parameters:
parent 1:1
-
Sets
parent
withclassid
as1
androot
as1
. classid 1:100
-
Sets
classid
as1:100
where1
is the number of parentqdisc
and100
is the number of classes of the parentqdisc
. htb ceil 2mbit
-
The
htb
classfulqdisc
allows upper limit bandwidth of2 Mbit/s
as theceil
rate limit.
Apply the Stochastic Fairness Queuing (
sfq
) of classlessqdisc
to interface with a time interval of60
seconds to reduce queue algorithm perturbation:# tc qdisc add dev ifb4eth0 parent 1:100 sfq perturb 60
Add the firewall mark (
fw
) filter to the interface:# tc filter add dev ifb4eth0 parent 1:0 protocol ip prio 100 handle 100 fw classid 1:100
Restore the packet meta mark from the connection mark (
CONNMARK
):# nft add rule ip mangle PREROUTING counter meta mark set ct mark
In this command, the
nft
utility has amangle
table with thePREROUTING
chain rule specification that alters incoming packets before routing to replace the packet mark withCONNMARK
.If no
nft
table and chain exist, create a table and add a chain rule:# nft add table ip mangle # nft add chain ip mangle PREROUTING {type filter hook prerouting priority mangle \;}
Set the meta mark on
tcp
packets that are received on the specified destination address192.0.2.3
:# nft add rule ip mangle PREROUTING ip daddr 192.0.2.3 counter meta mark set 0x64
Save the packet mark into the connection mark:
# nft add rule ip mangle PREROUTING counter ct mark set mark
Run the
iperf3
utility as the server on a system by using the-s
parameter and the server then waits for the response of the client connection:# iperf3 -s
On the client, run
iperf3
as a client and connect to the server that listens on IP address192.0.2.3
for periodic HTTP request-response timestamp:# iperf3 -c 192.0.2.3 -t TCP_STREAM | tee rate
192.0.2.3
is the IP address of the server while192.0.2.4
is the IP address of the client.Terminate the
iperf3
utility on the server by pressing Ctrl+C:Accepted connection from 192.0.2.4, port 52128 [5] local 192.0.2.3 port 5201 connected to 192.0.2.4 port 52130 [ID] Interval Transfer Bitrate [5] 0.00-1.00 sec 119 KBytes 973 Kbits/sec [5] 1.00-2.00 sec 116 KBytes 950 Kbits/sec ... [ID] Interval Transfer Bitrate [5] 0.00-14.81 sec 1.51 MBytes 853 Kbits/sec receiver iperf3: interrupt - the server has terminated
Terminate the
iperf3
utility on the client by pressing Ctrl+C:Connecting to host 192.0.2.3, port 5201 [5] local 192.0.2.4 port 52130 connected to 192.0.2.3 port 5201 [ID] Interval Transfer Bitrate Retr Cwnd [5] 0.00-1.00 sec 481 KBytes 3.94 Mbits/sec 0 76.4 KBytes [5] 1.00-2.00 sec 223 KBytes 1.83 Mbits/sec 0 82.0 KBytes ... [ID] Interval Transfer Bitrate Retr [5] 0.00-14.00 sec 3.92 MBytes 2.35 Mbits/sec 32 sender [5] 0.00-14.00 sec 0.00 Bytes 0.00 bits/sec receiver iperf3: error - the server has terminated
Verification
Display the statistics about packet counts of the
htb
andsfq
classes on the interface:# tc -s qdisc show dev ifb4eth0 qdisc htb 1: root ... Sent 26611455 bytes 3054 pkt (dropped 76, overlimits 4887 requeues 0) ... qdisc sfq 8001: parent ... Sent 26535030 bytes 2296 pkt (dropped 76, overlimits 0 requeues 0) ...
Display the statistics of packet counts for the
mirred
andctinfo
actions:# tc -s filter show dev enp1s0 ingress filter parent ffff: protocol ip pref 49152 u32 chain 0 filter parent ffff: protocol ip pref 49152 u32 chain 0 fh 800: ht divisor 1 filter parent ffff: protocol ip pref 49152 u32 chain 0 fh 800::800 order 2048 key ht 800 bkt 0 terminal flowid not_in_hw (rule hit 8075 success 8075) match 00000000/00000000 at 0 (success 8075 ) action order 1: ctinfo zone 0 pipe index 1 ref 1 bind 1 cpmark 0x00000064 installed 3105 sec firstused 3105 sec DSCP set 0 error 0 CPMARK set 7712 Action statistics: Sent 25891504 bytes 3137 pkt (dropped 0, overlimits 0 requeues 0) backlog 0b 0p requeues 0 action order 2: mirred (Egress Redirect to device ifb4eth0) stolen index 1 ref 1 bind 1 installed 3105 sec firstused 3105 sec Action statistics: Sent 25891504 bytes 3137 pkt (dropped 0, overlimits 61 requeues 0) backlog 0b 0p requeues 0
Display the statistics of the
htb
rate-limiter and its configuration:# tc -s class show dev ifb4eth0 class htb 1:100 root leaf 8001: prio 7 rate 1Mbit ceil 2Mbit burst 1600b cburst 1600b Sent 26541716 bytes 2373 pkt (dropped 61, overlimits 4887 requeues 0) backlog 0b 0p requeues 0 lended: 7248 borrowed: 0 giants: 0 tokens: 187250 ctokens: 93625
Additional resources
-
tc(8)
,tc-ctinfo(8)
,nft(8)
man pages on your system
25.8. Available qdiscs in RHEL
Each qdisc
addresses unique networking-related issues. The following is the list of qdiscs
available in RHEL. You can use any of the following qdisc
to shape network traffic based on your networking requirements.
qdisc name | Included in | Offload support |
---|---|---|
Asynchronous Transfer Mode (ATM) |
| |
Class-Based Queueing |
| |
Credit-Based Shaper |
| Yes |
CHOose and Keep for responsive flows, CHOose and Kill for unresponsive flows (CHOKE) |
| |
Controlled Delay (CoDel) |
| |
Deficit Round Robin (DRR) |
| |
Differentiated Services marker (DSMARK) |
| |
Enhanced Transmission Selection (ETS) |
| Yes |
Fair Queue (FQ) |
| |
Fair Queuing Controlled Delay (FQ_CODel) |
| |
Generalized Random Early Detection (GRED) |
| |
Hierarchical Fair Service Curve (HSFC) |
| |
Heavy-Hitter Filter (HHF) |
| |
Hierarchy Token Bucket (HTB) |
| |
INGRESS |
| Yes |
Multi Queue Priority (MQPRIO) |
| Yes |
Multiqueue (MULTIQ) |
| Yes |
Network Emulator (NETEM) |
| |
Proportional Integral-controller Enhanced (PIE) |
| |
PLUG |
| |
Quick Fair Queueing (QFQ) |
| |
Random Early Detection (RED) |
| Yes |
Stochastic Fair Blue (SFB) |
| |
Stochastic Fairness Queueing (SFQ) |
| |
Token Bucket Filter (TBF) |
| Yes |
Trivial Link Equalizer (TEQL) |
|
The qdisc
offload requires hardware and driver support on NIC.
Additional resources
-
tc(8)
man page on your system