7.7. Using Channel Bonding
To enhance performance, adjust available module options to ascertain what combination works best. Pay particular attention to the
miimon
or arp_interval
and the arp_ip_target
parameters. See Section 7.7.1, “Bonding Module Directives” for a list of available options and how to quickly determine the best ones for your bonded interface.
7.7.1. Bonding Module Directives
It is a good idea to test which channel bonding module parameters work best for your bonded interfaces before adding them to the
BONDING_OPTS="bonding parameters"
directive in your bonding interface configuration file (ifcfg-bond0
for example). Parameters to bonded interfaces can be configured without unloading (and reloading) the bonding module by manipulating files in the sysfs
file system.
sysfs
is a virtual file system that represents kernel objects as directories, files and symbolic links. sysfs
can be used to query for information about kernel objects, and can also manipulate those objects through the use of normal file system commands. The sysfs
virtual file system is mounted under the /sys/
directory. All bonding interfaces can be configured dynamically by interacting with and manipulating files under the /sys/class/net/
directory.
In order to determine the best parameters for your bonding interface, create a channel bonding interface file such as
ifcfg-bond0
by following the instructions in Section 7.4.2, “Create a Channel Bonding Interface”. Insert the SLAVE=yes
and MASTER=bond0
directives in the configuration files for each interface bonded to bond0
. Once this is completed, you can proceed to testing the parameters.
First, open the bond you created by running
ifup bondN
as root
:
~]# ifup bond0
If you have correctly created the
ifcfg-bond0
bonding interface file, you will be able to see bond0
listed in the output of running ip link show
as root
:
~]# ip link show
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: enp1s0: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master bond0 state UP mode DEFAULT qlen 1000
link/ether 52:54:00:e9:ce:d2 brd ff:ff:ff:ff:ff:ff
3: enp2s0: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master bond0 state UP mode DEFAULT qlen 1000
link/ether 52:54:00:38:a6:4c brd ff:ff:ff:ff:ff:ff
4: bond0: <BROADCAST,MULTICAST,MASTER,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT
link/ether 52:54:00:38:a6:4c brd ff:ff:ff:ff:ff:ff
To view all existing bonds, even if they are not up, run:
~]$ cat /sys/class/net/bonding_masters
bond0
You can configure each bond individually by manipulating the files located in the
/sys/class/net/bondN/bonding/
directory. First, the bond you are configuring must be taken down:
~]# ifdown bond0
As an example, to enable MII monitoring on bond0 with a 1 second interval, run as
root
:
~]# echo 1000 > /sys/class/net/bond0/bonding/miimon
To configure bond0 for
balance-alb
mode, run either:
~]# echo 6 > /sys/class/net/bond0/bonding/mode
...or, using the name of the mode:
~]# echo balance-alb > /sys/class/net/bond0/bonding/mode
After configuring options for the bond in question, you can bring it up and test it by running
ifup bondN
. If you decide to change the options, take the interface down, modify its parameters using sysfs
, bring it back up, and re-test.
Once you have determined the best set of parameters for your bond, add those parameters as a space-separated list to the
BONDING_OPTS=
directive of the /etc/sysconfig/network-scripts/ifcfg-bondN
file for the bonding interface you are configuring. Whenever that bond is brought up (for example, by the system during the boot sequence if the ONBOOT=yes
directive is set), the bonding options specified in the BONDING_OPTS
will take effect for that bond.
The following list provides the names of many of the more common channel bonding parameters, along with a description of what they do. For more information, see the brief descriptions for each
parm
in modinfo bonding
output, or for more detailed information, see https://www.kernel.org/doc/Documentation/networking/bonding.txt.
Bonding Interface Parameters
-
ad_select=value
- Specifies the 802.3ad aggregation selection logic to use. Possible values are:The
stable
or0
— Default setting. The active aggregator is chosen by largest aggregate bandwidth. Reselection of the active aggregator occurs only when all ports of the active aggregator are down or if the active aggregator has no ports.bandwidth
or1
— The active aggregator is chosen by largest aggregate bandwidth. Reselection occurs if:- A port is added to or removed from the bond;
- Any port's link state changes;
- Any port's 802.3ad association state changes;
- The bond's administrative state changes to up.
count
or2
— The active aggregator is chosen by the largest number of ports. Reselection occurs as described for thebandwidth
setting above.
bandwidth
andcount
selection policies permit failover of 802.3ad aggregations when partial failure of the active aggregator occurs. This keeps the aggregator with the highest availability, either in bandwidth or in number of ports, active at all times. -
arp_interval=time_in_milliseconds
- Specifies, in milliseconds, how often
ARP
monitoring occurs.Important
It is essential that botharp_interval
andarp_ip_target
parameters are specified, or, alternatively, themiimon
parameter is specified. Failure to do so can cause degradation of network performance in the event that a link fails.If using this setting while inmode=0
ormode=2
(the two load-balancing modes), the network switch must be configured to distribute packets evenly across the NICs. For more information on how to accomplish this, see https://www.kernel.org/doc/Documentation/networking/bonding.txt.The value is set to0
by default, which disables it. -
arp_ip_target=ip_address[,ip_address_2,…ip_address_16]
- Specifies the target
IP
address ofARP
requests when thearp_interval
parameter is enabled. Up to 16IP
addresses can be specified in a comma separated list. -
arp_validate=value
- Validate source/distribution of
ARP
probes; default isnone
. Other valid values areactive
,backup
, andall
. -
downdelay=time_in_milliseconds
- Specifies (in milliseconds) how long to wait after link failure before disabling the link. The value must be a multiple of the value specified in the
miimon
parameter. The value is set to0
by default, which disables it. -
fail_over_mac=value
- Specifies whether active-backup mode should set all ports to the same MAC address at the point of assignment (the traditional behavior), or, when enabled, perform special handling of the bond's MAC address in accordance with the selected policy. Possible values are:
none
or0
— Default setting. This setting disablesfail_over_mac
, and causes bonding to set all ports of an active-backup bond to the same MAC address at the point of assignment.active
or1
— The “active”fail_over_mac
policy indicates that the MAC address of the bond should always be the MAC address of the currently active port. The MAC address of the ports is not changed; instead, the MAC address of the bond changes during a failover.This policy is useful for devices that cannot ever alter their MAC address, or for devices that refuse incoming broadcasts with their own source MAC (which interferes with the ARP monitor). The disadvantage of this policy is that every device on the network must be updated by gratuitous ARP, as opposed to the normal method of switches snooping incoming traffic to update their ARP tables. If the gratuitous ARP is lost, communication may be disrupted.When this policy is used in conjunction with the MII monitor, devices which assert link up prior to being able to actually transmit and receive are particularly susceptible to loss of the gratuitous ARP, and an appropriate updelay setting may be required.follow
or2
— The “follow”fail_over_mac
policy causes the MAC address of the bond to be selected normally (normally the MAC address of the first port added to the bond). However, the second and subsequent ports are not set to this MAC address while they are in a backup role; a port is programmed with the bond's MAC address at failover time (and the formerly active port receives the newly active port's MAC address).This policy is useful for multiport devices that either become confused or incur a performance penalty when multiple ports are programmed with the same MAC address.
- lacp_rate=value
- Specifies the rate at which link partners should transmit LACPDU packets in 802.3ad mode. Possible values are:
slow
or0
— Default setting. This specifies that partners should transmit LACPDUs every 30 seconds.fast
or1
— Specifies that partners should transmit LACPDUs every 1 second.
-
miimon=time_in_milliseconds
- Specifies (in milliseconds) how often MII link monitoring occurs. This is useful if high availability is required because MII is used to verify that the NIC is active. To verify that the driver for a particular NIC supports the MII tool, type the following command as root:
~]#
ethtool interface_name | grep "Link detected:"
In this command, replace interface_name with the name of the device interface, such asenp1s0
, not the bond interface. If MII is supported, the command returns:Link detected: yes
If using a bonded interface for high availability, the module for each NIC must support MII. Setting the value to0
(the default), turns this feature off. When configuring this setting, a good starting point for this parameter is100
.Important
It is essential that botharp_interval
andarp_ip_target
parameters are specified, or, alternatively, themiimon
parameter is specified. Failure to do so can cause degradation of network performance in the event that a link fails. -
mode=value
- Allows you to specify the bonding policy. The value can be one of:
balance-rr
or0
— Sets a round-robin policy for fault tolerance and load balancing. Transmissions are received and sent out sequentially on each bonded port interface beginning with the first one available.active-backup
or1
— Sets an active-backup policy for fault tolerance. Transmissions are received and sent out through the first available bonded port interface. Another bonded port interface is only used if the active bonded port interface fails.balance-xor
or2
— Transmissions are based on the selected hash policy. The default is to derive a hash by XOR of the source and destination MAC addresses multiplied by the modulo of the number of port interfaces. In this mode traffic destined for specific peers will always be sent over the same interface. As the destination is determined by the MAC addresses this method works best for traffic to peers on the same link or local network. If traffic has to pass through a single router then this mode of traffic balancing will be suboptimal.broadcast
or3
— Sets a broadcast policy for fault tolerance. All transmissions are sent on all port interfaces.802.3ad
or4
— Sets an IEEE 802.3ad dynamic link aggregation policy. Creates aggregation groups that share the same speed and duplex settings. Transmits and receives on all ports in the active aggregator. Requires a switch that is 802.3ad compliant.balance-tlb
or5
— Sets a Transmit Load Balancing (TLB) policy for fault tolerance and load balancing. The outgoing traffic is distributed according to the current load on each port interface. Incoming traffic is received by the current port. If the receiving port fails, another port takes over the MAC address of the failed port. This mode is only suitable for local addresses known to the kernel bonding module and therefore cannot be used behind a bridge with virtual machines.balance-alb
or6
— Sets an Adaptive Load Balancing (ALB) policy for fault tolerance and load balancing. Includes transmit and receive load balancing forIPv4
traffic. Receive load balancing is achieved throughARP
negotiation. This mode is only suitable for local addresses known to the kernel bonding module and therefore cannot be used behind a bridge with virtual machines.
For details about required settings on the upstream switch, see Section 7.6, “Overview of Bonding Modes and the Required Settings on the Switch”. -
primary=interface_name
- Specifies the interface name, such as
enp1s0
, of the primary device. Theprimary
device is the first of the bonding interfaces to be used and is not abandoned unless it fails. This setting is particularly useful when one NIC in the bonding interface is faster and, therefore, able to handle a bigger load.This setting is only valid when the bonding interface is inactive-backup
mode. See https://www.kernel.org/doc/Documentation/networking/bonding.txt for more information. -
primary_reselect=value
- Specifies the reselection policy for the primary port. This affects how the primary port is chosen to become the active port when failure of the active port or recovery of the primary port occurs. This parameter is designed to prevent flip-flopping between the primary port and other ports. Possible values are:
always
or0
(default) — The primary port becomes the active port whenever it comes back up.better
or1
— The primary port becomes the active port when it comes back up, if the speed and duplex of the primary port is better than the speed and duplex of the current active port.failure
or2
— The primary port becomes the active port only if the current active port fails and the primary port is up.
Theprimary_reselect
setting is ignored in two cases:- If no ports are active, the first port to recover is made the active port.
- When initially assigned to a bond, the primary port is always made the active port.
Changing theprimary_reselect
policy throughsysfs
will cause an immediate selection of the best active port according to the new policy. This may or may not result in a change of the active port, depending upon the circumstances -
resend_igmp=range
- Specifies the number of IGMP membership reports to be issued after a failover event. One membership report is issued immediately after the failover, subsequent packets are sent in each 200ms interval.The valid range is
0
to255
; the default value is1
. A value of0
prevents the IGMP membership report from being issued in response to the failover event.This option is useful for bonding modesbalance-rr
(mode 0),active-backup
(mode 1),balance-tlb
(mode 5) andbalance-alb
(mode 6), in which a failover can switch the IGMP traffic from one port to another. Therefore a fresh IGMP report must be issued to cause the switch to forward the incoming IGMP traffic over the newly selected port. -
updelay=time_in_milliseconds
- Specifies (in milliseconds) how long to wait before enabling a link. The value must be a multiple of the value specified in the
miimon
parameter. The value is set to0
by default, which disables it. -
use_carrier=number
- Specifies whether or not
miimon
should use MII/ETHTOOL ioctls ornetif_carrier_ok()
to determine the link state. Thenetif_carrier_ok()
function relies on the device driver to maintains its state withnetif_carrier_on/off
; most device drivers support this function.The MII/ETHTOOL ioctls tools utilize a deprecated calling sequence within the kernel. However, this is still configurable in case your device driver does not supportnetif_carrier_on/off
.Valid values are:1
— Default setting. Enables the use ofnetif_carrier_ok()
.0
— Enables the use of MII/ETHTOOL ioctls.
Note
If the bonding interface insists that the link is up when it should not be, it is possible that your network device driver does not supportnetif_carrier_on/off
. -
xmit_hash_policy=value
- Selects the transmit hash policy used for port selection in
balance-xor
and802.3ad
modes. Possible values are:0
orlayer2
— Default setting. This parameter uses the XOR of hardware MAC addresses to generate the hash. The formula used is:(source_MAC_address XOR destination_MAC) MODULO slave_count
This algorithm will place all traffic to a particular network peer on the same port, and is 802.3ad compliant.1
orlayer3+4
— Uses upper layer protocol information (when available) to generate the hash. This allows for traffic to a particular network peer to span multiple ports, although a single connection will not span multiple ports.The formula for unfragmented TCP and UDP packets used is:((source_port XOR dest_port) XOR ((source_IP XOR dest_IP) AND
0xffff
) MODULO slave_countFor fragmented TCP or UDP packets and all otherIP
protocol traffic, the source and destination port information is omitted. For non-IP
traffic, the formula is the same as thelayer2
transmit hash policy.This policy intends to mimic the behavior of certain switches; particularly, Cisco switches with PFC2 as well as some Foundry and IBM products.The algorithm used by this policy is not 802.3ad compliant.2
orlayer2+3
— Uses a combination of layer2 and layer3 protocol information to generate the hash.Uses XOR of hardware MAC addresses andIP
addresses to generate the hash. The formula is:(((source_IP XOR dest_IP) AND
0xffff
) XOR ( source_MAC XOR destination_MAC )) MODULO slave_countThis algorithm will place all traffic to a particular network peer on the same port. For non-IP
traffic, the formula is the same as for the layer2 transmit hash policy.This policy is intended to provide a more balanced distribution of traffic than layer2 alone, especially in environments where a layer3 gateway device is required to reach most destinations.This algorithm is 802.3ad compliant.