Pesquisar

Este conteúdo não está disponível no idioma selecionado.

Chapter 3. Configuring IPoIB

download PDF

By default, InfiniBand does not use the internet protocol (IP) for communication. However, IP over InfiniBand (IPoIB) provides an IP network emulation layer on top of InfiniBand remote direct memory access (RDMA) networks. This allows existing unmodified applications to transmit data over InfiniBand networks, but the performance is lower than if the application would use RDMA natively.

Note

The Mellanox devices, starting from ConnectX-4 and above, on RHEL 8 and later use Enhanced IPoIB mode by default (datagram only). Connected mode is not supported on these devices.

3.1. The IPoIB communication modes

An IPoIB device is configurable in either Datagram or Connected mode. The difference is the type of queue pair the IPoIB layer attempts to open with the machine at the other end of the communication:

  • In the Datagram mode, the system opens an unreliable, disconnected queue pair.

    This mode does not support packages larger than Maximum Transmission Unit (MTU) of the InfiniBand link layer. During transmission of data, the IPoIB layer adds a 4-byte IPoIB header on top of the IP packet. As a result, the IPoIB MTU is 4 bytes less than the InfiniBand link-layer MTU. As 2048 is a common InfiniBand link-layer MTU, the common IPoIB device MTU in Datagram mode is 2044.

  • In the Connected mode, the system opens a reliable, connected queue pair.

    This mode allows messages larger than the InfiniBand link-layer MTU. The host adapter handles packet segmentation and reassembly. As a result, in the Connected mode, the messages sent from Infiniband adapters have no size limits. However, there are limited IP packets due to the data field and TCP/IP header field. For this reason, the IPoIB MTU in the Connected mode is 65520 bytes.

    The Connected mode has a higher performance but consumes more kernel memory.

Though a system is configured to use the Connected mode, a system still sends multicast traffic by using the Datagram mode because InfiniBand switches and fabric cannot pass multicast traffic in the Connected mode. Also, when the host is not configured to use the Connected mode, the system falls back to the Datagram mode.

While running an application that sends multicast data up to MTU on the interface, configures the interface in Datagram mode or configure the application to cap the send size of a packet that will fit in datagram-sized packets.

3.2. Understanding IPoIB hardware addresses

IPoIB devices have a 20 byte hardware address that consists of the following parts:

  • The first 4 bytes are flags and queue pair numbers
  • The next 8 bytes are the subnet prefix

    The default subnet prefix is 0xfe:80:00:00:00:00:00:00. After the device connects to the subnet manager, the device changes this prefix to match with the configured subnet manager.

  • The last 8 bytes are the Globally Unique Identifier (GUID) of the InfiniBand port that attaches to the IPoIB device
Note

As the first 12 bytes can change, do not use them in the udev device manager rules.

3.4. Configuring an IPoIB connection by using nmcli commands

The nmcli command-line utility controls the NetworkManager and reports network status by using CLI.

Prerequisites

  • An InfiniBand device is installed on the server
  • The corresponding kernel module is loaded

Procedure

  1. Create the InfiniBand connection to use the mlx4_ib0 interface in the Connected transport mode and the maximum MTU of 65520 bytes:

    # nmcli connection add type infiniband con-name mlx4_ib0 ifname mlx4_ib0 transport-mode Connected mtu 65520
  2. Set a P_Key, for example:

    # nmcli connection modify mlx4_ib0 infiniband.p-key 0x8002
  3. Configure the IPv4 settings:

    • To use DHCP, enter:

      # nmcli connection modify mlx4_ib0 ipv4.method auto

      Skip this step if ipv4.method is already set to auto (default).

    • To set a static IPv4 address, network mask, default gateway, DNS servers, and search domain, enter:

      # nmcli connection modify mlx4_ib0 ipv4.method manual ipv4.addresses 192.0.2.1/24 ipv4.gateway 192.0.2.254 ipv4.dns 192.0.2.200 ipv4.dns-search example.com
  4. Configure the IPv6 settings:

    • To use stateless address autoconfiguration (SLAAC), enter:

      # nmcli connection modify mlx4_ib0 ipv6.method auto

      Skip this step if ipv6.method is already set to auto (default).

    • To set a static IPv6 address, network mask, default gateway, DNS servers, and search domain, enter:

      # nmcli connection modify mlx4_ib0 ipv6.method manual ipv6.addresses 2001:db8:1::fffe/64 ipv6.gateway 2001:db8:1::fffe ipv6.dns 2001:db8:1::ffbb ipv6.dns-search example.com
  5. To customize other settings in the profile, use the following command:

    # nmcli connection modify mlx4_ib0 <setting> <value>

    Enclose values with spaces or semicolons in quotes.

  6. Activate the profile:

    # nmcli connection up mlx4_ib0

Verification

  • Use the ping utility to send ICMP packets to the remote host’s InfiniBand adapter, for example:

    # ping -c5 192.0.2.2

3.5. Configuring an IPoIB connection by using the network RHEL system role

You can use IP over InfiniBand (IPoIB) to send IP packets over an InfiniBand interface. To configure IPoIB, create a NetworkManager connection profile. By using Ansible and the network system role, you can automate this process and remotely configure connection profiles on the hosts defined in a playbook.

You can use the network RHEL system role to configure IPoIB and, if a connection profile for the InfiniBand’s parent device does not exists, the role can create it as well.

Prerequisites

  • You have prepared the control node and the managed nodes
  • You are logged in to the control node as a user who can run playbooks on the managed nodes.
  • The account you use to connect to the managed nodes has sudo permissions on them.
  • An InfiniBand device named mlx4_ib0 is installed in the managed nodes.
  • The managed nodes use NetworkManager to configure the network.

Procedure

  1. Create a playbook file, for example ~/playbook.yml, with the following content:

    ---
    - name: Configure the network
      hosts: managed-node-01.example.com
      tasks:
        - name: IPoIB connection profile with static IP address settings
          ansible.builtin.include_role:
            name: rhel-system-roles.network
          vars:
            network_connections:
              # InfiniBand connection mlx4_ib0
              - name: mlx4_ib0
                interface_name: mlx4_ib0
                type: infiniband
    
              # IPoIB device mlx4_ib0.8002 on top of mlx4_ib0
              - name: mlx4_ib0.8002
                type: infiniband
                autoconnect: yes
                infiniband:
                  p_key: 0x8002
                  transport_mode: datagram
                parent: mlx4_ib0
                ip:
                  address:
                    - 192.0.2.1/24
                    - 2001:db8:1::1/64
                state: up

    The settings specified in the example playbook include the following:

    type: <profile_type>
    Sets the type of the profile to create. The example playbook creates two connection profiles: One for the InfiniBand connection and one for the IPoIB device.
    parent: <parent_device>
    Sets the parent device of the IPoIB connection profile.
    p_key: <value>
    Sets the InfiniBand partition key. If you set this variable, do not set interface_name on the IPoIB device.
    transport_mode: <mode>
    Sets the IPoIB connection operation mode. You can set this variable to datagram (default) or connected.

    For details about all variables used in the playbook, see the /usr/share/ansible/roles/rhel-system-roles.network/README.md file on the control node.

  2. Validate the playbook syntax:

    $ ansible-playbook --syntax-check ~/playbook.yml

    Note that this command only validates the syntax and does not protect against a wrong but valid configuration.

  3. Run the playbook:

    $ ansible-playbook ~/playbook.yml

Verification

  1. Display the IP settings of the mlx4_ib0.8002 device:

    # ansible managed-node-01.example.com -m command -a 'ip address show mlx4_ib0.8002'
    managed-node-01.example.com | CHANGED | rc=0 >>
    ...
    inet 192.0.2.1/24 brd 192.0.2.255 scope global noprefixroute ib0.8002
       valid_lft forever preferred_lft forever
    inet6 2001:db8:1::1/64 scope link tentative noprefixroute
       valid_lft forever preferred_lft forever
  2. Display the partition key (P_Key) of the mlx4_ib0.8002 device:

    # ansible managed-node-01.example.com -m command -a 'cat /sys/class/net/mlx4_ib0.8002/pkey'
    managed-node-01.example.com | CHANGED | rc=0 >>
    0x8002
  3. Display the mode of the mlx4_ib0.8002 device:

    # ansible managed-node-01.example.com -m command -a 'cat /sys/class/net/mlx4_ib0.8002/mode'
    managed-node-01.example.com | CHANGED | rc=0 >>
    datagram

Additional resources

  • /usr/share/ansible/roles/rhel-system-roles.network/README.md file
  • /usr/share/doc/rhel-system-roles/network/ directory

3.6. Configuring an IPoIB connection by using nm-connection-editor

The nmcli-connection-editor application configures and manages network connections stored by NetworkManager by using the management console.

Prerequisites

  • An InfiniBand device is installed on the server.
  • Corresponding kernel module is loaded
  • The nm-connection-editor package is installed.

Procedure

  1. Enter the command:

    $ nm-connection-editor
  2. Click the + button to add a new connection.
  3. Select the InfiniBand connection type and click Create.
  4. On the InfiniBand tab:

    1. Change the connection name if you want to.
    2. Select the transport mode.
    3. Select the device.
    4. Set an MTU if needed.
  5. On the IPv4 Settings tab, configure the IPv4 settings. For example, set a static IPv4 address, network mask, default gateway, and DNS server: infiniband IPv4 settings nm connection editor
  6. On the IPv6 Settings tab, configure the IPv6 settings. For example, set a static IPv6 address, network mask, default gateway, and DNS server: infiniband IPv6 settings nm connection editor
  7. Click Save to save the team connection.
  8. Close nm-connection-editor.
  9. You can set a P_Key interface. As this setting is not available in nm-connection-editor, you must set this parameter on the command line.

    For example, to set 0x8002 as P_Key interface of the mlx4_ib0 connection:

    # nmcli connection modify mlx4_ib0 infiniband.p-key 0x8002

3.7. Testing an RDMA network by using iperf3 after IPoIB is configured

In the following example, the large buffer size is used to perform a 60 seconds test to measure maximum throughput and fully use the bandwidth and latency between two hosts by using the iperf3 utility.

Prerequisites

  • You have configured IPoIB on both hosts.

Procedure

  1. To run iperf3 as a server on a system, define a time interval to provide periodic bandwidth updates -i to listen as a server -s that waits for the response of the client connection:

    # iperf3 -i 5 -s
  2. To run iperf3 as a client on another system, define a time interval to provide periodic bandwidth updates -i to connect to the listening server -c of IP address 192.168.2.2 with -t time in seconds:

    # iperf3 -i 5 -t 60 -c 192.168.2.2
  3. Use the following commands:

    1. Display test results on the system that acts as a server:

      # iperf3 -i 10 -s
      -----------------------------------------------------------
      Server listening on 5201
      -----------------------------------------------------------
      Accepted connection from 192.168.2.3, port 22216
      [5] local 192.168.2.2 port 5201 connected to 192.168.2.3 port 22218
      [ID] Interval           Transfer     Bandwidth
      [5]   0.00-10.00  sec  17.5 GBytes  15.0 Gbits/sec
      [5]  10.00-20.00  sec  17.6 GBytes  15.2 Gbits/sec
      [5]  20.00-30.00  sec  18.4 GBytes  15.8 Gbits/sec
      [5]  30.00-40.00  sec  18.0 GBytes  15.5 Gbits/sec
      [5]  40.00-50.00  sec  17.5 GBytes  15.1 Gbits/sec
      [5]  50.00-60.00  sec  18.1 GBytes  15.5 Gbits/sec
      [5]  60.00-60.04  sec  82.2 MBytes  17.3 Gbits/sec
      - - - - - - - - - - - - - - - - - - - - - - - - -
      [ID] Interval           Transfer     Bandwidth
      [5]   0.00-60.04  sec  0.00 Bytes    0.00 bits/sec  sender
      [5]   0.00-60.04  sec   107 GBytes  15.3 Gbits/sec  receiver
    2. Display test results on the system that acts as a client:

      # iperf3 -i 1 -t 60 -c 192.168.2.2
      
      Connecting to host 192.168.2.2, port 5201
      [4] local 192.168.2.3 port 22218 connected to 192.168.2.2 port 5201
      [ID] Interval           Transfer     Bandwidth       Retr  Cwnd
      [4]   0.00-10.00  sec  17.6 GBytes  15.1 Gbits/sec    0   6.01 MBytes
      [4]  10.00-20.00  sec  17.6 GBytes  15.1 Gbits/sec    0   6.01 MBytes
      [4]  20.00-30.00  sec  18.4 GBytes  15.8 Gbits/sec    0   6.01 MBytes
      [4]  30.00-40.00  sec  18.0 GBytes  15.5 Gbits/sec    0   6.01 MBytes
      [4]  40.00-50.00  sec  17.5 GBytes  15.1 Gbits/sec    0   6.01 MBytes
      [4]  50.00-60.00  sec  18.1 GBytes  15.5 Gbits/sec    0   6.01 MBytes
      - - - - - - - - - - - - - - - - - - - - - - - - -
      [ID] Interval           Transfer     Bandwidth       Retr
      [4]   0.00-60.00  sec   107 GBytes  15.4 Gbits/sec    0   sender
      [4]   0.00-60.00  sec   107 GBytes  15.4 Gbits/sec        receiver

Additional resources

  • iperf3 man page
Red Hat logoGithubRedditYoutubeTwitter

Aprender

Experimente, compre e venda

Comunidades

Sobre a documentação da Red Hat

Ajudamos os usuários da Red Hat a inovar e atingir seus objetivos com nossos produtos e serviços com conteúdo em que podem confiar.

Tornando o open source mais inclusivo

A Red Hat está comprometida em substituir a linguagem problemática em nosso código, documentação e propriedades da web. Para mais detalhes veja oBlog da Red Hat.

Sobre a Red Hat

Fornecemos soluções robustas que facilitam o trabalho das empresas em plataformas e ambientes, desde o data center principal até a borda da rede.

© 2024 Red Hat, Inc.