Chapter 26. Configuring InfiniBand and RDMA networks


You can configure and manage Remote Directory Memory Access (RDMA) networks and InfiniBand hardware at an enterprise level by using various protocols. These include InfiniBand (IB), RDMA over Converged Ethernet (RoCE), the IP networks protocol such as iWARP, and Omni-Path Architecture (OPA) protocol as a native support on RDMA-supported hardware. For low-latency and high-throughput connections, you can configure IP over InfiniBand (IPoIB).

26.1. Introduction to InfiniBand and RDMA

InfiniBand refers to two distinct things:

  • The physical link-layer protocol for InfiniBand networks
  • The InfiniBand Verbs API, an implementation of the remote direct memory access (RDMA) technology

RDMA provides access between the main memory of two computers without involving an operating system, cache, or storage. By using RDMA, data transfers with high-throughput, low-latency, and low CPU utilization.

In a typical IP data transfer, when an application on one machine sends data to an application on another machine, the following actions happen on the receiving end:

  1. The kernel must receive the data.
  2. The kernel must determine that the data belongs to the application.
  3. The kernel wakes up the application.
  4. The kernel waits for the application to perform a system call into the kernel.
  5. The application copies the data from the internal memory space of the kernel into the buffer provided by the application.

This process means that most network traffic is copied across the main memory of the system if the host adapter uses direct memory access (DMA) or otherwise at least twice. Additionally, the computer executes some context switches to switch between the kernel and application. These context switches can cause a higher CPU load with high traffic rates while slowing down the other tasks.

Unlike traditional IP communication, RDMA communication bypasses the kernel intervention in the communication process. This reduces the CPU overhead. After a packet enters a network, the RDMA protocol enables the host adapter to decide which application should receive it and where to store it in the memory space of that application. Instead of sending the packet for processing to the kernel and copying it into the memory of the user application, the host adapter directly places the packet contents in the application buffer. This process requires a separate API, the InfiniBand Verbs API, and applications need to implement the InfiniBand Verbs API to use RDMA.

Red Hat Enterprise Linux supports both the InfiniBand hardware and the InfiniBand Verbs API. Additionally, it supports the following technologies to use the InfiniBand Verbs API on non-InfiniBand hardware:

  • iWARP: A network protocol that implements RDMA over IP networks
  • RDMA over Converged Ethernet (RoCE), which is also known as InfiniBand over Ethernet (IBoE): A network protocol that implements RDMA over Ethernet networks

26.2. Configuring the core RDMA subsystem

The rdma service configuration manages the network protocols and communication standards such as InfiniBand, iWARP, and RoCE.

Procedure

  • Install the rdma-core package:

    # dnf install rdma-core
    Copy to Clipboard Toggle word wrap

Verification

  1. Install the libibverbs-utils and infiniband-diags packages:

    # dnf install libibverbs-utils infiniband-diags
    Copy to Clipboard Toggle word wrap
  2. List the available InfiniBand devices:

    # ibv_devices
    
        mlx5_0          	946dae0300418002
        mlx5_1          	946dae0300418003
    Copy to Clipboard Toggle word wrap
  3. Display the information of the mlx5_0 device:

    # ibv_devinfo -d mlx5_0
    
    hca_id:	mlx5_0
    	transport:			InfiniBand (0)
    	fw_ver:				28.42.1000
    	node_guid:			946d:ae03:0041:8002
    	sys_image_guid:			946d:ae03:0041:8002
    	vendor_id:			0x02c9
    	vendor_part_id:			4129
    	hw_ver:				0x0
    	board_id:			MT_0000000834
    	phys_port_cnt:			1
    		port:	1
    			state:			PORT_ACTIVE (4)
    			max_mtu:		4096 (5)
    			active_mtu:		4096 (5)
    			sm_lid:			3
    			port_lid:		66
    			port_lmc:		0x00
    			link_layer:		InfiniBand
    Copy to Clipboard Toggle word wrap
  4. Display the status of the mlx5_0 device:

    # ibstat mlx5_0
    
    CA 'mlx5_0'
    	CA type: MT4129
    	Number of ports: 1
    	Firmware version: 28.42.1000
    	Hardware version: 0
    	Node GUID: 0x946dae0300418002
    	System image GUID: 0x946dae0300418002
    	Port 1:
    		State: Active
    		Physical state: LinkUp
    		Rate: 100
    		Base lid: 66
    		LMC: 0
    		SM lid: 3
    		Capability mask: 0xa759e848
    		Port GUID: 0x946dae0300418002
    		Link layer: InfiniBand
    Copy to Clipboard Toggle word wrap
  5. The ibping utility pings an InfiniBand address and runs as a client/server by configuring the parameters.

    1. Start server mode -S on port number -P with -C InfiniBand channel adapter (CA) name on the host:

      # ibping -S -C mlx5_0 -P 1
      Copy to Clipboard Toggle word wrap
    2. Start client mode, send some packets -c on port number -P by using -C InfiniBand channel adapter (CA) name with -L Local Identifier (LID) on the host:

      # ibping -c 50 -C mlx5_0 -P 1 -L 2
      Copy to Clipboard Toggle word wrap

26.3. Configuring IPoIB

By default, InfiniBand does not use the internet protocol (IP) for communication. However, IP over InfiniBand (IPoIB) provides an IP network emulation layer on top of InfiniBand remote direct memory access (RDMA) networks. This allows existing unmodified applications to transmit data over InfiniBand networks, but the performance is lower than if the application would use RDMA natively.

Note

The Mellanox devices, starting from ConnectX-4 and above, on RHEL 8 and later use Enhanced IPoIB mode by default (datagram only). Connected mode is not supported on these devices.

26.3.1. The IPoIB communication modes

An IPoIB device is configurable in either Datagram or Connected mode. The difference is the type of queue pair the IPoIB layer attempts to open with the machine at the other end of the communication:

  • In the Datagram mode, the system opens an unreliable, disconnected queue pair.

    This mode does not support packages larger than Maximum Transmission Unit (MTU) of the InfiniBand link layer. During transmission of data, the IPoIB layer adds a 4-byte IPoIB header on top of the IP packet. As a result, the IPoIB MTU is 4 bytes less than the InfiniBand link-layer MTU. As 2048 is a common InfiniBand link-layer MTU, the common IPoIB device MTU in Datagram mode is 2044.

  • In the Connected mode, the system opens a reliable, connected queue pair.

    This mode allows messages larger than the InfiniBand link-layer MTU. The host adapter handles packet segmentation and reassembly. As a result, in the Connected mode, the messages sent from Infiniband adapters have no size limits. However, there are limited IP packets due to the data field and TCP/IP header field. For this reason, the IPoIB MTU in the Connected mode is 65520 bytes.

    The Connected mode has a higher performance but consumes more kernel memory.

Though a system is configured to use the Connected mode, a system still sends multicast traffic by using the Datagram mode because InfiniBand switches and fabric cannot pass multicast traffic in the Connected mode. Also, when the host is not configured to use the Connected mode, the system falls back to the Datagram mode.

While running an application that sends multicast data up to the MTU on the interface, configure the interface in Datagram mode or configure the application to cap the send size of a packet that will fit in datagram-sized packets.

26.3.2. Understanding IPoIB hardware addresses

IPoIB devices have a 20 byte hardware address that consists of the following parts:

  • The first 4 bytes are flags and queue pair numbers
  • The next 8 bytes are the subnet prefix

    The default subnet prefix is 0xfe:80:00:00:00:00:00:00. After the device connects to the subnet manager, the device changes this prefix to match with the configured subnet manager.

  • The last 8 bytes are the Globally Unique Identifier (GUID) of the InfiniBand port that attaches to the IPoIB device
Note

As the first 12 bytes can change, do not use them in the udev device manager rules.

You can use the nmcli utility to create an IP over InfiniBand connection on the command line.

Prerequisites

  • An InfiniBand device is installed on the server
  • The corresponding kernel module is loaded

Procedure

  1. Create the InfiniBand connection to use the mlx5_ib0 interface in the Connected transport mode and the maximum MTU of 65520 bytes:

    # nmcli connection add type infiniband con-name mlx5_ib0 ifname mlx5_ib0 transport-mode Connected mtu 65520
    Copy to Clipboard Toggle word wrap
  2. Set a P_Key, for example:

    # nmcli connection modify mlx5_ib0 infiniband.p-key 0x8002
    Copy to Clipboard Toggle word wrap
  3. Configure the IPv4 settings:

    • To use DHCP, enter:

      # nmcli connection modify mlx5_ib0 ipv4.method auto
      Copy to Clipboard Toggle word wrap

      Skip this step if ipv4.method is already set to auto (default).

    • To set a static IPv4 address, network mask, default gateway, DNS servers, and search domain, enter:

      # nmcli connection modify mlx5_ib0 ipv4.method manual ipv4.addresses 192.0.2.1/24 ipv4.gateway 192.0.2.254 ipv4.dns 192.0.2.200 ipv4.dns-search example.com
      Copy to Clipboard Toggle word wrap
  4. Configure the IPv6 settings:

    • To use stateless address autoconfiguration (SLAAC), enter:

      # nmcli connection modify mlx5_ib0 ipv6.method auto
      Copy to Clipboard Toggle word wrap

      Skip this step if ipv6.method is already set to auto (default).

    • To set a static IPv6 address, network mask, default gateway, DNS servers, and search domain, enter:

      # nmcli connection modify mlx5_ib0 ipv6.method manual ipv6.addresses 2001:db8:1::fffe/64 ipv6.gateway 2001:db8:1::fffe ipv6.dns 2001:db8:1::ffbb ipv6.dns-search example.com
      Copy to Clipboard Toggle word wrap
  5. To customize other settings in the profile, use the following command:

    # nmcli connection modify mlx5_ib0 <setting> <value>
    Copy to Clipboard Toggle word wrap

    Enclose values with spaces or semicolons in quotes.

  6. Activate the profile:

    # nmcli connection up mlx5_ib0
    Copy to Clipboard Toggle word wrap

Verification

  • Use the ping utility to send ICMP packets to the remote host’s InfiniBand adapter, for example:

    # ping -c5 192.0.2.2
    Copy to Clipboard Toggle word wrap

You can use IP over InfiniBand (IPoIB) to send IP packets over an InfiniBand interface. To configure IPoIB, create a NetworkManager connection profile. By using Ansible and the network system role, you can automate this process and remotely configure connection profiles on the hosts defined in a playbook.

You can use the network RHEL system role to configure IPoIB and, if a connection profile for the InfiniBand’s parent device does not exist, the role can create it as well.

Prerequisites

  • You have prepared the control node and the managed nodes.
  • You are logged in to the control node as a user who can run playbooks on the managed nodes.
  • The account you use to connect to the managed nodes has sudo permissions on them.
  • An InfiniBand device named mlx5_ib0 is installed in the managed nodes.
  • The managed nodes use NetworkManager to configure the network.

Procedure

  1. Create a playbook file, for example, ~/playbook.yml, with the following content:

    ---
    - name: Configure the network
      hosts: managed-node-01.example.com
      tasks:
        - name: IPoIB connection profile with static IP address settings
          ansible.builtin.include_role:
            name: redhat.rhel_system_roles.network
          vars:
            network_connections:
              # InfiniBand connection mlx5_ib0
              - name: mlx5_ib0
                interface_name: mlx5_ib0
                type: infiniband
    
              # IPoIB device mlx5_ib0.8002 on top of mlx5_ib0
              - name: mlx5_ib0.8002
                type: infiniband
                autoconnect: yes
                infiniband:
                  p_key: 0x8002
                  transport_mode: datagram
                parent: mlx5_ib0
                ip:
                  address:
                    - 192.0.2.1/24
                    - 2001:db8:1::1/64
                state: up
    Copy to Clipboard Toggle word wrap

    The settings specified in the example playbook include the following:

    type: <profile_type>
    Sets the type of the profile to create. The example playbook creates two connection profiles: One for the InfiniBand connection and one for the IPoIB device.
    parent: <parent_device>
    Sets the parent device of the IPoIB connection profile.
    p_key: <value>
    Sets the InfiniBand partition key. If you set this variable, do not set interface_name on the IPoIB device.
    transport_mode: <mode>
    Sets the IPoIB connection operation mode. You can set this variable to datagram (default) or connected.

    For details about all variables used in the playbook, see the /usr/share/ansible/roles/rhel-system-roles.network/README.md file on the control node.

  2. Validate the playbook syntax:

    $ ansible-playbook --syntax-check ~/playbook.yml
    Copy to Clipboard Toggle word wrap

    Note that this command only validates the syntax and does not protect against a wrong but valid configuration.

  3. Run the playbook:

    $ ansible-playbook ~/playbook.yml
    Copy to Clipboard Toggle word wrap

Verification

  1. Display the IP settings of the mlx5_ib0.8002 device:

    # ansible managed-node-01.example.com -m command -a 'ip address show mlx5_ib0.8002'
    managed-node-01.example.com | CHANGED | rc=0 >>
    ...
    inet 192.0.2.1/24 brd 192.0.2.255 scope global noprefixroute ib0.8002
       valid_lft forever preferred_lft forever
    inet6 2001:db8:1::1/64 scope link tentative noprefixroute
       valid_lft forever preferred_lft forever
    Copy to Clipboard Toggle word wrap
  2. Display the partition key (P_Key) of the mlx5_ib0.8002 device:

    # ansible managed-node-01.example.com -m command -a 'cat /sys/class/net/mlx5_ib0.8002/pkey'
    managed-node-01.example.com | CHANGED | rc=0 >>
    0x8002
    Copy to Clipboard Toggle word wrap
  3. Display the mode of the mlx5_ib0.8002 device:

    # ansible managed-node-01.example.com -m command -a 'cat /sys/class/net/mlx5_ib0.8002/mode'
    managed-node-01.example.com | CHANGED | rc=0 >>
    datagram
    Copy to Clipboard Toggle word wrap

You can use the nmstatectl utility to configure an IP over InfiniBand (IPoIB) connection through the Nmstate API. The Nmstate API ensures that, after setting the configuration, the result matches the configuration file. If anything fails, nmstatectl automatically rolls back the changes to avoid leaving the system in an incorrect state.

Prerequisites

  • An InfiniBand device is installed on the server.
  • The kernel module for the InfiniBand device is loaded.

Procedure

  1. Create a YAML file, for example ~/create-IPoIB-profile.yml, with the following content:

    interfaces:
    - name: mlx5_ib0.8002
      type: infiniband
      state: up
      ipv4:
        enabled: true
        address:
        - ip: 192.0.2.1
          prefix-length: 24
        dhcp: false
      ipv6:
        enabled: true
        address:
        - ip: 2001:db8:1::1
          prefix-length: 64
        autoconf: false
        dhcp: false
      infiniband:
        base-iface: "mlx5_ib0"
        mode: datagram
        pkey: "0x8002"
    
    routes:
      config:
      - destination: 0.0.0.0/0
        next-hop-address: 192.0.2.254
        next-hop-interface: mlx5_ib0.8002
      - destination: ::/0
        next-hop-address: 2001:db8:1::fffe
        next-hop-interface: mlx5_ib0.8002
    Copy to Clipboard Toggle word wrap

    An IPoIB connection has now the following settings:

    • IPOIB device name: mlx5_ib0.8002
    • Base interface (parent): mlx5_ib0
    • InfiniBand partition key: 0x8002
    • Transport mode: datagram
    • Static IPv4 address: 192.0.2.1 with the /24 subnet mask
    • Static IPv6 address: 2001:db8:1::1 with the /64 subnet mask
    • IPv4 default gateway: 192.0.2.254
    • IPv6 default gateway: 2001:db8:1::fffe
  2. Apply the settings to the system:

    # nmstatectl apply ~/create-IPoIB-profile.yml
    Copy to Clipboard Toggle word wrap

Verification

  1. Display the IP settings of the mlx5_ib0.8002 device:

    # ip address show mlx5_ib0.8002
    ...
    inet 192.0.2.1/24 brd 192.0.2.255 scope global noprefixroute ib0.8002
       valid_lft forever preferred_lft forever
    inet6 2001:db8:1::1/64 scope link tentative noprefixroute
       valid_lft forever preferred_lft forever
    Copy to Clipboard Toggle word wrap
  2. Display the partition key (P_Key) of the mlx5_ib0.8002 device:

    # cat /sys/class/net/mlx5_ib0.8002/pkey
    0x8002
    Copy to Clipboard Toggle word wrap
  3. Display the mode of the mlx5_ib0.8002 device:

    # cat /sys/class/net/mlx5_ib0.8002/mode
    datagram
    Copy to Clipboard Toggle word wrap

In the following example, the large buffer size is used to perform a 60 seconds test to measure maximum throughput and fully use the bandwidth and latency between two hosts by using the iperf3 utility.

Prerequisites

  • You have configured IPoIB on both hosts.

Procedure

  1. To run iperf3 as a server on a system, define a time interval to provide periodic bandwidth updates -i to listen as a server -s that waits for the response of the client connection:

    # iperf3 -i 5 -s
    Copy to Clipboard Toggle word wrap
  2. To run iperf3 as a client on another system, define a time interval to provide periodic bandwidth updates -i to connect to the listening server -c of IP address 192.168.2.2 with -t time in seconds:

    # iperf3 -i 5 -t 60 -c 192.168.2.2
    Copy to Clipboard Toggle word wrap
  3. Use the following commands:

    1. Display test results on the system that acts as a server:

      # iperf3 -i 10 -s
      -----------------------------------------------------------
      Server listening on 5201
      -----------------------------------------------------------
      Accepted connection from 192.168.2.3, port 22216
      [5] local 192.168.2.2 port 5201 connected to 192.168.2.3 port 22218
      [ID] Interval           Transfer     Bandwidth
      [5]   0.00-10.00  sec  17.5 GBytes  15.0 Gbits/sec
      [5]  10.00-20.00  sec  17.6 GBytes  15.2 Gbits/sec
      [5]  20.00-30.00  sec  18.4 GBytes  15.8 Gbits/sec
      [5]  30.00-40.00  sec  18.0 GBytes  15.5 Gbits/sec
      [5]  40.00-50.00  sec  17.5 GBytes  15.1 Gbits/sec
      [5]  50.00-60.00  sec  18.1 GBytes  15.5 Gbits/sec
      [5]  60.00-60.04  sec  82.2 MBytes  17.3 Gbits/sec
      - - - - - - - - - - - - - - - - - - - - - - - - -
      [ID] Interval           Transfer     Bandwidth
      [5]   0.00-60.04  sec  0.00 Bytes    0.00 bits/sec  sender
      [5]   0.00-60.04  sec   107 GBytes  15.3 Gbits/sec  receiver
      Copy to Clipboard Toggle word wrap
    2. Display test results on the system that acts as a client:

      # iperf3 -i 1 -t 60 -c 192.168.2.2
      
      Connecting to host 192.168.2.2, port 5201
      [4] local 192.168.2.3 port 22218 connected to 192.168.2.2 port 5201
      [ID] Interval           Transfer     Bandwidth       Retr  Cwnd
      [4]   0.00-10.00  sec  17.6 GBytes  15.1 Gbits/sec    0   6.01 MBytes
      [4]  10.00-20.00  sec  17.6 GBytes  15.1 Gbits/sec    0   6.01 MBytes
      [4]  20.00-30.00  sec  18.4 GBytes  15.8 Gbits/sec    0   6.01 MBytes
      [4]  30.00-40.00  sec  18.0 GBytes  15.5 Gbits/sec    0   6.01 MBytes
      [4]  40.00-50.00  sec  17.5 GBytes  15.1 Gbits/sec    0   6.01 MBytes
      [4]  50.00-60.00  sec  18.1 GBytes  15.5 Gbits/sec    0   6.01 MBytes
      - - - - - - - - - - - - - - - - - - - - - - - - -
      [ID] Interval           Transfer     Bandwidth       Retr
      [4]   0.00-60.00  sec   107 GBytes  15.4 Gbits/sec    0   sender
      [4]   0.00-60.00  sec   107 GBytes  15.4 Gbits/sec        receiver
      Copy to Clipboard Toggle word wrap

26.4. Configuring RoCE

Remote Direct Memory Access (RDMA) over Converged Ethernet (RoCE) is a network protocol that utilizes RDMA over an Ethernet network. For configuration, RoCE requires specific hardware and some of the hardware vendors are Mellanox, Broadcom, and QLogic.

26.4.1. Overview of RoCE protocol versions

The following are the different RoCE versions:

RoCE v1
The RoCE version 1 protocol is an Ethernet link layer protocol with Ethertype 0x8915 that enables the communication between any two hosts in the same Ethernet broadcast domain.
RoCE v2
The RoCE version 2 protocol exists on the top of either the UDP over IPv4 or the UDP over IPv6 protocol. For RoCE v2, the UDP destination port number is 4791.

The RDMA_CM sets up a reliable connection between a client and a server for transferring data. RDMA_CM provides an RDMA transport-neutral interface for establishing connections. The communication uses a specific RDMA device and message-based data transfers.

Important

Using different versions, such as RoCE v2 on the client and RoCE v1 on the server is not supported. In such a case, configure both the server and client to communicate over RoCE v1.

RoCE v1 works at the Data Link layer (Layer 2) and only supports the communication of two machines in the same network. By default, RoCE v2 is available. It works at the Network Layer (Layer 3). RoCE v2 supports packet routing that provides a connection with multiple Ethernet.

Using the RoCE v2 protocol on the client and RoCE v1 on the server is not supported. If the hardware in your server supports RoCE v1 only, configure your clients for RoCE v1 to communicate with the server. For example, you can configure a client that uses the mlx5_0 driver for the Mellanox ConnectX-5 InfiniBand device that only supports RoCE v1.

Note

The changes described here will remain effective until you reboot the host.

Prerequisites

  • The client uses an InfiniBand device with RoCE v2 protocol.
  • The server uses an InfiniBand device that only supports RoCE v1.

Procedure

  1. Create the /sys/kernel/config/rdma_cm/mlx5_0/ directory:

    # mkdir /sys/kernel/config/rdma_cm/mlx5_0/
    Copy to Clipboard Toggle word wrap
  2. Display the default RoCE mode:

    # cat /sys/kernel/config/rdma_cm/mlx5_0/ports/1/default_roce_mode
    
    RoCE v2
    Copy to Clipboard Toggle word wrap
  3. Change the default RoCE mode to version 1:

    # echo "IB/RoCE v1" > /sys/kernel/config/rdma_cm/mlx5_0/ports/1/default_roce_mode
    Copy to Clipboard Toggle word wrap

Remote direct memory access (RDMA) operations require the pinning of physical memory. As a consequence, the kernel is not allowed to write memory into the swap space. If a user pins too much memory, the system can run out of memory, and the kernel terminates processes to free up more memory. Therefore, memory pinning is a privileged operation.

If non-root users need to run large RDMA applications, it is necessary to increase the amount of memory to maintain pages in primary memory pinned all the time.

Procedure

  • As the root user, create the file /etc/security/limits.conf with the following contents:

    @rdma soft memlock unlimited
    @rdma hard memlock unlimited
    Copy to Clipboard Toggle word wrap

    For further details, see the limits.conf(5) man page on your system.

Verification

  1. Log in as a member of the rdma group after editing the /etc/security/limits.conf file.

    Note that Red Hat Enterprise Linux applies updated ulimit settings when the user logs in.

  2. Use the ulimit -l command to display the limit:

    $ ulimit -l
    unlimited
    Copy to Clipboard Toggle word wrap

    If the command returns unlimited, the user can pin an unlimited amount of memory.

26.6. Enabling NFS over RDMA on an NFS server

Remote Direct Memory Access (RDMA) is a protocol that enables a client system to directly transfer data from the memory of a storage server into its own memory. This enhances storage throughput, decreases latency in data transfer between the server and client, and reduces CPU load on both ends. If both the NFS server and clients are connected over RDMA, clients can use NFSoRDMA to mount an exported directory.

Prerequisites

  • The NFS service is running and configured
  • An InfiniBand or RDMA over Converged Ethernet (RoCE) device is installed on the server.
  • IP over InfiniBand (IPoIB) is configured on the server, and the InfiniBand device has an IP address assigned.

Procedure

  1. Install the rdma-core package:

    # dnf install rdma-core
    Copy to Clipboard Toggle word wrap
  2. If the package was already installed, verify that the xprtrdma and svcrdma modules in the /etc/rdma/modules/rdma.conf file are uncommented:

    # NFS over RDMA client support
    xprtrdma
    # NFS over RDMA server support
    svcrdma
    Copy to Clipboard Toggle word wrap
  3. Optional: By default, NFS over RDMA uses port 20049. If you want to use a different port, set the rdma-port setting in the [nfsd] section of the /etc/nfs.conf file:

    rdma-port=<port>
    Copy to Clipboard Toggle word wrap
  4. Open the NFSoRDMA port in firewalld:

    # firewall-cmd --permanent --add-port={20049/tcp,20049/udp}
    # firewall-cmd --reload
    Copy to Clipboard Toggle word wrap

    Adjust the port numbers if you set a different port than 20049.

  5. Restart the nfs-server service:

    # systemctl restart nfs-server
    Copy to Clipboard Toggle word wrap

Verification

  1. On a client with InfiniBand hardware, perform the following steps:

    1. Install the following packages:

      # dnf install nfs-utils rdma-core
      Copy to Clipboard Toggle word wrap
    2. Mount an exported NFS share over RDMA:

      # mount -o rdma server.example.com:/nfs/projects/ /mnt/
      Copy to Clipboard Toggle word wrap

      If you set a port number other than the default (20049), pass port=<port_number> to the command:

      # mount -o rdma,port=<port_number> server.example.com:/nfs/projects/ /mnt/
      Copy to Clipboard Toggle word wrap
    3. Verify that the share was mounted with the rdma option:

      # mount | grep "/mnt"
      server.example.com:/nfs/projects/ on /mnt type nfs (...,proto=rdma,...)
      Copy to Clipboard Toggle word wrap

26.7. InfiniBand subnet manager

All InfiniBand networks must have a subnet manager running for the network to function. This is true even if two machines are connected directly with no switch involved.

It is possible to have more than one subnet manager. In that case, one acts as a controller and another subnet manager acts as a port that will take over in case the master subnet manager fails.

Red Hat Enterprise Linux provides OpenSM, an implementation of an InfiniBand subnet manager. However, the features of OpenSM are limited and there is no active upstream development. Typically, embedded subnet managers in InfiniBand switches provide more features and support up-to-date InfiniBand hardware. For further details, see Installing and configuring the OpenSM InfiniBand subnet manager.

Back to top
Red Hat logoGithubredditYoutubeTwitter

Learn

Try, buy, & sell

Communities

About Red Hat Documentation

We help Red Hat users innovate and achieve their goals with our products and services with content they can trust. Explore our recent updates.

Making open source more inclusive

Red Hat is committed to replacing problematic language in our code, documentation, and web properties. For more details, see the Red Hat Blog.

About Red Hat

We deliver hardened solutions that make it easier for enterprises to work across platforms and environments, from the core datacenter to the network edge.

Theme

© 2025 Red Hat