Chapter 4. Configuring the core RDMA subsystem
The rdma
service configuration manages the network protocols and communication standards such as InfiniBand, iWARP, and RoCE.
4.1. Renaming IPoIB devices
By default, the kernel names Internet Protocol over InfiniBand (IPoIB) devices, for example, ib0
, ib1
, and so on. To avoid conflicts, Red Hat recommends creating a rule in the udev
device manager to create persistent and meaningful names such as mlx4_ib0
.
Prerequisites
- You have installed an InfiniBand device.
Procedure
Display the hardware address of the device
ib0
:# ip link show ib0 8: ib0: >BROADCAST,MULTICAST,UP,LOWER_UP< mtu 65520 qdisc pfifo_fast state UP mode DEFAULT qlen 256 link/infiniband 80:00:02:00:fe:80:00:00:00:00:00:00:00:02:c9:03:00:31:78:f2 brd 00:ff:ff:ff:ff:12:40:1b:ff:ff:00:00:00:00:00:00:ff:ff:ff:ff
The last eight bytes of the address are required to create a
udev
rule in the next step.To configure a rule that renames the device with the
00:02:c9:03:00:31:78:f2
hardware address tomlx4_ib0
, edit the/etc/udev/rules.d/70-persistent-ipoib.rules
file and add anACTION
rule:ACTION=="add", SUBSYSTEM=="net", DRIVERS=="?*", ATTR{type}=="32", ATTR{address}=="?*00:02:c9:03:00:31:78:f2", NAME="mlx4_ib0"
Reboot the host:
# reboot
Additional resources
-
udev(7)
man page - Understanding IPoIB hardware addresses
4.2. Increasing the amount of memory that users are allowed to pin in the system
Remote direct memory access (RDMA) operations require the pinning of physical memory. As a consequence, the kernel is not allowed to write memory into the swap space. If a user pins too much memory, the system can run out of memory, and the kernel terminates processes to free up more memory. Therefore, memory pinning is a privileged operation.
If non-root users need to run large RDMA applications, it is necessary to increase the amount of memory to maintain pages in primary memory pinned all the time.
Procedure
As the
root
user, create the file/etc/security/limits.conf
with the following contents:@rdma soft memlock unlimited @rdma hard memlock unlimited
Verification
Log in as a member of the
rdma
group after editing the/etc/security/limits.conf
file.Note that Red Hat Enterprise Linux applies updated
ulimit
settings when the user logs in.Use the
ulimit -l
command to display the limit:$ ulimit -l unlimited
If the command returns
unlimited
, the user can pin an unlimited amount of memory.
Additional resources
-
limits.conf(5)
man page
4.3. Configuring the rdma service
With the Remote Direct Memory Access (RDMA) protocol, you can transfer data between the RDMA enabled systems over the network by using the main memory. The RDMA protocol provides low latency and high throughput. To manage supported network protocols and communication standards, you need to configure the rdma
service. This configuration includes high speed network protocols such as RoCE and iWARP, and communication standards such as Soft-RoCE and Soft-iWARP. When Red Hat Enterprise Linux detects InfiniBand, iWARP, or RoCE devices and their configuration files residing at the /etc/rdma/modules/*
directory, the udev
device manager instructs systemd
to start the rdma
service. Configuration of modules in the /etc/rdma/modules/rdma.conf
file remains persistent after reboot. You need to restart the rdma-load-modules@rdma.service
configuration service to apply changes.
Procedure
Edit the
/etc/rdma/modules/rdma.conf
file and uncomment the modules that you want to enable:# These modules are loaded by the system if any RDMA devices is installed # iSCSI over RDMA client support ib_iser # iSCSI over RDMA target support ib_isert # SCSI RDMA Protocol target driver ib_srpt # User access to RDMA verbs (supports libibverbs) ib_uverbs # User access to RDMA connection management (supports librdmacm) rdma_ucm # RDS over RDMA support # rds_rdma # NFS over RDMA client support xprtrdma # NFS over RDMA server support svcrdma
Restart the service to make the changes effective:
# systemctl restart <rdma-load-modules@rdma.service>
Verification
After a reboot, check the service status:
# systemctl status <rdma-load-modules@rdma.service>
4.4. Enabling NFS over RDMA on an NFS server
Remote Direct Memory Access (RDMA) is a protocol that enables a client system to directly transfer data from the memory of a storage server into its own memory. This enhances storage throughput, decreases latency in data transfer between the server and client, and reduces CPU load on both ends. If both the NFS server and clients are connected over RDMA, clients can use NFSoRDMA to mount an exported directory.
Prerequisites
- The NFS service is running and configured
- An InfiniBand or RDMA over Converged Ethernet (RoCE) device is installed on the server.
- IP over InfiniBand (IPoIB) is configured on the server, and the InfiniBand device has an IP address assigned.
Procedure
Install the
rdma-core
package:# dnf install rdma-core
If the package was already installed, verify that the
xprtrdma
andsvcrdma
modules in the/etc/rdma/modules/rdma.conf
file are uncommented:# NFS over RDMA client support xprtrdma # NFS over RDMA server support svcrdma
Optional. By default, NFS over RDMA uses port 20049. If you want to use a different port, set the
rdma-port
setting in the[nfsd]
section of the/etc/nfs.conf
file:rdma-port=<port>
Open the NFSoRDMA port in
firewalld
:# firewall-cmd --permanent --add-port={20049/tcp,20049/udp} # firewall-cmd --reload
Adjust the port numbers if you set a different port than 20049.
Restart the
nfs-server
service:# systemctl restart nfs-server
Verification
On a client with InfiniBand hardware, perform the following steps:
Install the following packages:
# dnf install nfs-utils rdma-core
Mount an exported NFS share over RDMA:
# mount -o rdma server.example.com:/nfs/projects/ /mnt/
If you set a port number other than the default (20049), pass
port=<port_number>
to the command:# mount -o rdma,port=<port_number> server.example.com:/nfs/projects/ /mnt/
Verify that the share was mounted with the
rdma
option:# mount | grep "/mnt" server.example.com:/nfs/projects/ on /mnt type nfs (...,proto=rdma,...)
Additional resources