Este conteúdo não está disponível no idioma selecionado.
Chapter 3. Configuring IPoIB
By default, InfiniBand does not use the internet protocol (IP) for communication. However, IP over InfiniBand (IPoIB) provides an IP network emulation layer on top of InfiniBand remote direct memory access (RDMA) networks. This allows existing unmodified applications to transmit data over InfiniBand networks, but the performance is lower than if the application would use RDMA natively.
The Mellanox devices, starting from ConnectX-4 and above, on RHEL 8 and later use Enhanced IPoIB mode by default (datagram only). Connected mode is not supported on these devices.
3.1. The IPoIB communication modes
An IPoIB device is configurable in either Datagram
or Connected
mode. The difference is the type of queue pair the IPoIB layer attempts to open with the machine at the other end of the communication:
In the
Datagram
mode, the system opens an unreliable, disconnected queue pair.This mode does not support packages larger than Maximum Transmission Unit (MTU) of the InfiniBand link layer. During transmission of data, the IPoIB layer adds a 4-byte IPoIB header on top of the IP packet. As a result, the IPoIB MTU is 4 bytes less than the InfiniBand link-layer MTU. As
2048
is a common InfiniBand link-layer MTU, the common IPoIB device MTU inDatagram
mode is2044
.In the
Connected
mode, the system opens a reliable, connected queue pair.This mode allows messages larger than the InfiniBand link-layer MTU. The host adapter handles packet segmentation and reassembly. As a result, in the
Connected
mode, the messages sent from Infiniband adapters have no size limits. However, there are limited IP packets due to thedata
field and TCP/IPheader
field. For this reason, the IPoIB MTU in theConnected
mode is65520
bytes.The
Connected
mode has a higher performance but consumes more kernel memory.
Though a system is configured to use the Connected
mode, a system still sends multicast traffic by using the Datagram
mode because InfiniBand switches and fabric cannot pass multicast traffic in the Connected
mode. Also, when the host is not configured to use the Connected
mode, the system falls back to the Datagram
mode.
While running an application that sends multicast data up to the MTU on the interface, configure the interface in Datagram
mode or configure the application to cap the send size of a packet that will fit in datagram-sized packets.
3.2. Understanding IPoIB hardware addresses
IPoIB devices have a 20
byte hardware address that consists of the following parts:
- The first 4 bytes are flags and queue pair numbers
The next 8 bytes are the subnet prefix
The default subnet prefix is
0xfe:80:00:00:00:00:00:00
. After the device connects to the subnet manager, the device changes this prefix to match with the configured subnet manager.- The last 8 bytes are the Globally Unique Identifier (GUID) of the InfiniBand port that attaches to the IPoIB device
As the first 12 bytes can change, do not use them in the udev
device manager rules.
3.3. Renaming IPoIB devices by using systemd link file
By default, the kernel names Internet Protocol over InfiniBand (IPoIB) devices, for example, ib0
, ib1
, and so on. To avoid conflicts, create a systemd
link file to create persistent and meaningful names such as mlx4_ib0
.
Prerequisites
- You have installed an InfiniBand device.
Procedure
Display the hardware address of the device
ib0
:Copy to Clipboard Copied! Toggle word wrap Toggle overflow ip addr show ib0
# ip addr show ib0 7: ib0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 65520 qdisc fq_codel state UP group default qlen 256 link/infiniband 80:00:0a:28:fe:80:00:00:00:00:00:00:f4:52:14:03:00:7b:e1:b1 brd 00:ff:ff:ff:ff:12:40:1b:ff:ff:00:00:00:00:00:00:ff:ff:ff:ff altname ibp7s0 altname ibs2 inet 172.31.0.181/24 brd 172.31.0.255 scope global dynamic noprefixroute ib0 valid_lft 2899sec preferred_lft 2899sec inet6 fe80::f652:1403:7b:e1b1/64 scope link noprefixroute valid_lft forever preferred_lft forever
For naming the interface with MAC address
80:00:0a:28:fe:80:00:00:00:00:00:00:f4:52:14:03:00:7b:e1:b1
tomlx4_ib0
, create the/etc/systemd/network/70-custom-ifnames.link
file with following contents:Copy to Clipboard Copied! Toggle word wrap Toggle overflow [Match] MACAddress=80:00:0a:28:fe:80:00:00:00:00:00:00:f4:52:14:03:00:7b:e1:b1 [Link] Name=mlx4_ib0
[Match] MACAddress=80:00:0a:28:fe:80:00:00:00:00:00:00:f4:52:14:03:00:7b:e1:b1 [Link] Name=mlx4_ib0
This link file matches a MAC address and renames the network interface to the name set in the
Name
parameter.
Verification
Reboot the host:
Copy to Clipboard Copied! Toggle word wrap Toggle overflow reboot
# reboot
Verify that the device with the MAC address you specified in the link file has been assigned to
mlx4_ib0
:Copy to Clipboard Copied! Toggle word wrap Toggle overflow ip addr show mlx4_ib0
# ip addr show mlx4_ib0 7: mlx4_ib0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 65520 qdisc fq_codel state UP group default qlen 256 link/infiniband 80:00:0a:28:fe:80:00:00:00:00:00:00:f4:52:14:03:00:7b:e1:b1 brd 00:ff:ff:ff:ff:12:40:1b:ff:ff:00:00:00:00:00:00:ff:ff:ff:ff altname ibp7s0 altname ibs2 inet 172.31.0.181/24 brd 172.31.0.255 scope global dynamic noprefixroute mlx4_ib0 valid_lft 2899sec preferred_lft 2899sec inet6 fe80::f652:1403:7b:e1b1/64 scope link noprefixroute valid_lft forever preferred_lft forever
Additional resources
-
systemd.link(5)
man page on your system
3.4. Configuring an IPoIB connection by using nmcli
You can use the nmcli
utility to create an IP over InfiniBand connection on the command line.
Prerequisites
- An InfiniBand device is installed on the server
- The corresponding kernel module is loaded
Procedure
Create the InfiniBand connection to use the
mlx4_ib0
interface in theConnected
transport mode and the maximum MTU of65520
bytes:Copy to Clipboard Copied! Toggle word wrap Toggle overflow nmcli connection add type infiniband con-name mlx4_ib0 ifname mlx4_ib0 transport-mode Connected mtu 65520
# nmcli connection add type infiniband con-name mlx4_ib0 ifname mlx4_ib0 transport-mode Connected mtu 65520
Set a
P_Key
, for example:Copy to Clipboard Copied! Toggle word wrap Toggle overflow nmcli connection modify mlx4_ib0 infiniband.p-key 0x8002
# nmcli connection modify mlx4_ib0 infiniband.p-key 0x8002
Configure the IPv4 settings:
To use DHCP, enter:
Copy to Clipboard Copied! Toggle word wrap Toggle overflow nmcli connection modify mlx4_ib0 ipv4.method auto
# nmcli connection modify mlx4_ib0 ipv4.method auto
Skip this step if
ipv4.method
is already set toauto
(default).To set a static IPv4 address, network mask, default gateway, DNS servers, and search domain, enter:
Copy to Clipboard Copied! Toggle word wrap Toggle overflow nmcli connection modify mlx4_ib0 ipv4.method manual ipv4.addresses 192.0.2.1/24 ipv4.gateway 192.0.2.254 ipv4.dns 192.0.2.200 ipv4.dns-search example.com
# nmcli connection modify mlx4_ib0 ipv4.method manual ipv4.addresses 192.0.2.1/24 ipv4.gateway 192.0.2.254 ipv4.dns 192.0.2.200 ipv4.dns-search example.com
Configure the IPv6 settings:
To use stateless address autoconfiguration (SLAAC), enter:
Copy to Clipboard Copied! Toggle word wrap Toggle overflow nmcli connection modify mlx4_ib0 ipv6.method auto
# nmcli connection modify mlx4_ib0 ipv6.method auto
Skip this step if
ipv6.method
is already set toauto
(default).To set a static IPv6 address, network mask, default gateway, DNS servers, and search domain, enter:
Copy to Clipboard Copied! Toggle word wrap Toggle overflow nmcli connection modify mlx4_ib0 ipv6.method manual ipv6.addresses 2001:db8:1::fffe/64 ipv6.gateway 2001:db8:1::fffe ipv6.dns 2001:db8:1::ffbb ipv6.dns-search example.com
# nmcli connection modify mlx4_ib0 ipv6.method manual ipv6.addresses 2001:db8:1::fffe/64 ipv6.gateway 2001:db8:1::fffe ipv6.dns 2001:db8:1::ffbb ipv6.dns-search example.com
To customize other settings in the profile, use the following command:
Copy to Clipboard Copied! Toggle word wrap Toggle overflow nmcli connection modify mlx4_ib0 <setting> <value>
# nmcli connection modify mlx4_ib0 <setting> <value>
Enclose values with spaces or semicolons in quotes.
Activate the profile:
Copy to Clipboard Copied! Toggle word wrap Toggle overflow nmcli connection up mlx4_ib0
# nmcli connection up mlx4_ib0
Verification
Use the
ping
utility to send ICMP packets to the remote host’s InfiniBand adapter, for example:Copy to Clipboard Copied! Toggle word wrap Toggle overflow ping -c5 192.0.2.2
# ping -c5 192.0.2.2
3.5. Configuring an IPoIB connection by using the network
RHEL system role
You can use IP over InfiniBand (IPoIB) to send IP packets over an InfiniBand interface. To configure IPoIB, create a NetworkManager connection profile. By using Ansible and the network
system role, you can automate this process and remotely configure connection profiles on the hosts defined in a playbook.
You can use the network
RHEL system role to configure IPoIB and, if a connection profile for the InfiniBand’s parent device does not exist, the role can create it as well.
Prerequisites
- You have prepared the control node and the managed nodes
- You are logged in to the control node as a user who can run playbooks on the managed nodes.
-
The account you use to connect to the managed nodes has
sudo
permissions on them. -
An InfiniBand device named
mlx4_ib0
is installed in the managed nodes. - The managed nodes use NetworkManager to configure the network.
Procedure
Create a playbook file, for example
~/playbook.yml
, with the following content:Copy to Clipboard Copied! Toggle word wrap Toggle overflow --- - name: Configure the network hosts: managed-node-01.example.com tasks: - name: IPoIB connection profile with static IP address settings ansible.builtin.include_role: name: redhat.rhel_system_roles.network vars: network_connections: # InfiniBand connection mlx4_ib0 - name: mlx4_ib0 interface_name: mlx4_ib0 type: infiniband # IPoIB device mlx4_ib0.8002 on top of mlx4_ib0 - name: mlx4_ib0.8002 type: infiniband autoconnect: yes infiniband: p_key: 0x8002 transport_mode: datagram parent: mlx4_ib0 ip: address: - 192.0.2.1/24 - 2001:db8:1::1/64 state: up
--- - name: Configure the network hosts: managed-node-01.example.com tasks: - name: IPoIB connection profile with static IP address settings ansible.builtin.include_role: name: redhat.rhel_system_roles.network vars: network_connections: # InfiniBand connection mlx4_ib0 - name: mlx4_ib0 interface_name: mlx4_ib0 type: infiniband # IPoIB device mlx4_ib0.8002 on top of mlx4_ib0 - name: mlx4_ib0.8002 type: infiniband autoconnect: yes infiniband: p_key: 0x8002 transport_mode: datagram parent: mlx4_ib0 ip: address: - 192.0.2.1/24 - 2001:db8:1::1/64 state: up
The settings specified in the example playbook include the following:
type: <profile_type>
- Sets the type of the profile to create. The example playbook creates two connection profiles: One for the InfiniBand connection and one for the IPoIB device.
parent: <parent_device>
- Sets the parent device of the IPoIB connection profile.
p_key: <value>
-
Sets the InfiniBand partition key. If you set this variable, do not set
interface_name
on the IPoIB device. transport_mode: <mode>
-
Sets the IPoIB connection operation mode. You can set this variable to
datagram
(default) orconnected
.
For details about all variables used in the playbook, see the
/usr/share/ansible/roles/rhel-system-roles.network/README.md
file on the control node.Validate the playbook syntax:
Copy to Clipboard Copied! Toggle word wrap Toggle overflow ansible-playbook --syntax-check ~/playbook.yml
$ ansible-playbook --syntax-check ~/playbook.yml
Note that this command only validates the syntax and does not protect against a wrong but valid configuration.
Run the playbook:
Copy to Clipboard Copied! Toggle word wrap Toggle overflow ansible-playbook ~/playbook.yml
$ ansible-playbook ~/playbook.yml
Verification
Display the IP settings of the
mlx4_ib0.8002
device:Copy to Clipboard Copied! Toggle word wrap Toggle overflow ansible managed-node-01.example.com -m command -a 'ip address show mlx4_ib0.8002'
# ansible managed-node-01.example.com -m command -a 'ip address show mlx4_ib0.8002' managed-node-01.example.com | CHANGED | rc=0 >> ... inet 192.0.2.1/24 brd 192.0.2.255 scope global noprefixroute ib0.8002 valid_lft forever preferred_lft forever inet6 2001:db8:1::1/64 scope link tentative noprefixroute valid_lft forever preferred_lft forever
Display the partition key (P_Key) of the
mlx4_ib0.8002
device:Copy to Clipboard Copied! Toggle word wrap Toggle overflow ansible managed-node-01.example.com -m command -a 'cat /sys/class/net/mlx4_ib0.8002/pkey' managed-node-01.example.com | CHANGED | rc=0 >> 0x8002
# ansible managed-node-01.example.com -m command -a 'cat /sys/class/net/mlx4_ib0.8002/pkey' managed-node-01.example.com | CHANGED | rc=0 >> 0x8002
Display the mode of the
mlx4_ib0.8002
device:Copy to Clipboard Copied! Toggle word wrap Toggle overflow ansible managed-node-01.example.com -m command -a 'cat /sys/class/net/mlx4_ib0.8002/mode' managed-node-01.example.com | CHANGED | rc=0 >> datagram
# ansible managed-node-01.example.com -m command -a 'cat /sys/class/net/mlx4_ib0.8002/mode' managed-node-01.example.com | CHANGED | rc=0 >> datagram
Additional resources
-
/usr/share/ansible/roles/rhel-system-roles.network/README.md
file -
/usr/share/doc/rhel-system-roles/network/
directory
3.6. Configuring an IPoIB connection by using nmstatectl
You can use the nmstatectl
utility to configure an IP over InfiniBand (IPoIB) connection through the Nmstate API. The Nmstate API ensures that, after setting the configuration, the result matches the configuration file. If anything fails, nmstatectl
automatically rolls back the changes to avoid leaving the system in an incorrect state.
Prerequisites
- An InfiniBand device is installed on the server.
- The kernel module for the InfiniBand device is loaded.
Procedure
Create a YAML file, for example
~/create-IPoIB-profile.yml
, with the following content:Copy to Clipboard Copied! Toggle word wrap Toggle overflow interfaces: - name: mlx4_ib0.8002 type: infiniband state: up ipv4: enabled: true address: - ip: 192.0.2.1 prefix-length: 24 dhcp: false ipv6: enabled: true address: - ip: 2001:db8:1::1 prefix-length: 64 autoconf: false dhcp: false infiniband: base-iface: "mlx4_ib0" mode: datagram pkey: "0x8002" routes: config: - destination: 0.0.0.0/0 next-hop-address: 192.0.2.254 next-hop-interface: mlx4_ib0.8002 - destination: ::/0 next-hop-address: 2001:db8:1::fffe next-hop-interface: mlx4_ib0.8002
interfaces: - name: mlx4_ib0.8002 type: infiniband state: up ipv4: enabled: true address: - ip: 192.0.2.1 prefix-length: 24 dhcp: false ipv6: enabled: true address: - ip: 2001:db8:1::1 prefix-length: 64 autoconf: false dhcp: false infiniband: base-iface: "mlx4_ib0" mode: datagram pkey: "0x8002" routes: config: - destination: 0.0.0.0/0 next-hop-address: 192.0.2.254 next-hop-interface: mlx4_ib0.8002 - destination: ::/0 next-hop-address: 2001:db8:1::fffe next-hop-interface: mlx4_ib0.8002
An IPoIB connection has now the following settings:
-
IPOIB device name:
mlx4_ib0.8002
-
Base interface (parent):
mlx4_ib0
-
InfiniBand partition key:
0x8002
-
Transport mode:
datagram
-
Static IPv4 address:
192.0.2.1
with the/24
subnet mask -
Static IPv6 address:
2001:db8:1::1
with the/64
subnet mask -
IPv4 default gateway:
192.0.2.254
-
IPv6 default gateway:
2001:db8:1::fffe
-
IPOIB device name:
Apply the settings to the system:
Copy to Clipboard Copied! Toggle word wrap Toggle overflow nmstatectl apply ~/create-IPoIB-profile.yml
# nmstatectl apply ~/create-IPoIB-profile.yml
Verification
Display the IP settings of the
mlx4_ib0.8002
device:Copy to Clipboard Copied! Toggle word wrap Toggle overflow ip address show mlx4_ib0.8002
# ip address show mlx4_ib0.8002 ... inet 192.0.2.1/24 brd 192.0.2.255 scope global noprefixroute ib0.8002 valid_lft forever preferred_lft forever inet6 2001:db8:1::1/64 scope link tentative noprefixroute valid_lft forever preferred_lft forever
Display the partition key (P_Key) of the
mlx4_ib0.8002
device:Copy to Clipboard Copied! Toggle word wrap Toggle overflow cat /sys/class/net/mlx4_ib0.8002/pkey 0x8002
# cat /sys/class/net/mlx4_ib0.8002/pkey 0x8002
Display the mode of the
mlx4_ib0.8002
device:Copy to Clipboard Copied! Toggle word wrap Toggle overflow cat /sys/class/net/mlx4_ib0.8002/mode datagram
# cat /sys/class/net/mlx4_ib0.8002/mode datagram
Additional resources
-
nmstatectl(8)
man page on your system -
/usr/share/doc/nmstate/examples/
directory
3.7. Configuring an IPoIB connection by using nm-connection-editor
The nmcli-connection-editor
application configures and manages network connections stored by NetworkManager by using the management console.
Prerequisites
- An InfiniBand device is installed on the server.
- The corresponding kernel module is loaded.
-
The
nm-connection-editor
package is installed.
Procedure
Enter the command:
Copy to Clipboard Copied! Toggle word wrap Toggle overflow nm-connection-editor
$ nm-connection-editor
- Click the + button to add a new connection.
-
Select the
InfiniBand
connection type and click . On the
InfiniBand
tab:- Change the connection name if you want to.
- Select the transport mode.
- Select the device.
- Set an MTU if needed.
-
On the
IPv4 Settings
tab, configure the IPv4 settings. For example, set a static IPv4 address, network mask, default gateway, and DNS server: -
On the
IPv6 Settings
tab, configure the IPv6 settings. For example, set a static IPv6 address, network mask, default gateway, and DNS server: - Click to save the team connection.
-
Close
nm-connection-editor
. You can set a
P_Key
interface. As this setting is not available innm-connection-editor
, you must set this parameter on the command line.For example, to set
0x8002
asP_Key
interface of themlx4_ib0
connection:Copy to Clipboard Copied! Toggle word wrap Toggle overflow nmcli connection modify mlx4_ib0 infiniband.p-key 0x8002
# nmcli connection modify mlx4_ib0 infiniband.p-key 0x8002
3.8. Testing an RDMA network by using iperf3 after IPoIB is configured
In the following example, the large buffer size is used to perform a 60 seconds test to measure maximum throughput and fully use the bandwidth and latency between two hosts by using the iperf3
utility.
Prerequisites
- You have configured IPoIB on both hosts.
Procedure
To run
iperf3
as a server on a system, define a time interval to provide periodic bandwidth updates-i
to listen as a server-s
that waits for the response of the client connection:Copy to Clipboard Copied! Toggle word wrap Toggle overflow iperf3 -i 5 -s
# iperf3 -i 5 -s
To run
iperf3
as a client on another system, define a time interval to provide periodic bandwidth updates-i
to connect to the listening server-c
of IP address192.168.2.2
with-t
time in seconds:Copy to Clipboard Copied! Toggle word wrap Toggle overflow iperf3 -i 5 -t 60 -c 192.168.2.2
# iperf3 -i 5 -t 60 -c 192.168.2.2
Use the following commands:
Display test results on the system that acts as a server:
Copy to Clipboard Copied! Toggle word wrap Toggle overflow iperf3 -i 10 -s
# iperf3 -i 10 -s ----------------------------------------------------------- Server listening on 5201 ----------------------------------------------------------- Accepted connection from 192.168.2.3, port 22216 [5] local 192.168.2.2 port 5201 connected to 192.168.2.3 port 22218 [ID] Interval Transfer Bandwidth [5] 0.00-10.00 sec 17.5 GBytes 15.0 Gbits/sec [5] 10.00-20.00 sec 17.6 GBytes 15.2 Gbits/sec [5] 20.00-30.00 sec 18.4 GBytes 15.8 Gbits/sec [5] 30.00-40.00 sec 18.0 GBytes 15.5 Gbits/sec [5] 40.00-50.00 sec 17.5 GBytes 15.1 Gbits/sec [5] 50.00-60.00 sec 18.1 GBytes 15.5 Gbits/sec [5] 60.00-60.04 sec 82.2 MBytes 17.3 Gbits/sec - - - - - - - - - - - - - - - - - - - - - - - - - [ID] Interval Transfer Bandwidth [5] 0.00-60.04 sec 0.00 Bytes 0.00 bits/sec sender [5] 0.00-60.04 sec 107 GBytes 15.3 Gbits/sec receiver
Display test results on the system that acts as a client:
Copy to Clipboard Copied! Toggle word wrap Toggle overflow iperf3 -i 1 -t 60 -c 192.168.2.2
# iperf3 -i 1 -t 60 -c 192.168.2.2 Connecting to host 192.168.2.2, port 5201 [4] local 192.168.2.3 port 22218 connected to 192.168.2.2 port 5201 [ID] Interval Transfer Bandwidth Retr Cwnd [4] 0.00-10.00 sec 17.6 GBytes 15.1 Gbits/sec 0 6.01 MBytes [4] 10.00-20.00 sec 17.6 GBytes 15.1 Gbits/sec 0 6.01 MBytes [4] 20.00-30.00 sec 18.4 GBytes 15.8 Gbits/sec 0 6.01 MBytes [4] 30.00-40.00 sec 18.0 GBytes 15.5 Gbits/sec 0 6.01 MBytes [4] 40.00-50.00 sec 17.5 GBytes 15.1 Gbits/sec 0 6.01 MBytes [4] 50.00-60.00 sec 18.1 GBytes 15.5 Gbits/sec 0 6.01 MBytes - - - - - - - - - - - - - - - - - - - - - - - - - [ID] Interval Transfer Bandwidth Retr [4] 0.00-60.00 sec 107 GBytes 15.4 Gbits/sec 0 sender [4] 0.00-60.00 sec 107 GBytes 15.4 Gbits/sec receiver
Additional resources
-
iperf3
man page on your system