13.5. Configuring the Base RDMA Subsystem
Startup of the
rdma
service is automatic. When RDMA capable hardware, whether InfiniBand or iWARP or RoCE/IBoE is detected, udev instructs systemd
to start the rdma
service.
~]# systemctl status rdma
● rdma.service - Initialize the iWARP/InfiniBand/RDMA stack in the kernel
Loaded: loaded (/usr/lib/systemd/system/rdma.service; disabled; vendor preset: disabled)
Active: inactive (dead)
Docs: file:/etc/rdma/rdma.conf
Users need not enable the
rdma
service, but they can if they want to force it on all the time. To do that, enter the following command as root:
~]# systemctl enable rdma
13.5.1. Configuration of the rdma.conf file
The
rdma
service reads /etc/rdma/rdma.conf
to find out which kernel-level and user-level RDMA protocols the administrator wants to be loaded by default. Users should edit this file to turn various drivers on or off.
The various drivers that can be enabled and disabled are:
IPoIB
— This is anIP
network emulation layer that allowsIP
applications to run over InfiniBand networks.SRP
— This is the SCSI Request Protocol. It allows a machine to mount a remote drive or drive array that is exported through theSRP
protocol on the machine as though it were a local hard disk.SRPT
— This is the target mode, or server mode, of theSRP
protocol. This loads the kernel support necessary for exporting a drive or drive array for other machines to mount as though it were local on their machine. Further configuration of the target mode support is required before any devices will actually be exported. See the documentation in the targetd and targetcli packages for further information.ISER
— This is a low-level driver for the general iSCSI layer of the Linux kernel that provides transport over InfiniBand networks for iSCSI devices.RDS
— This is the Reliable Datagram Service in the Linux kernel. It is not enabled in Red Hat Enterprise Linux 7 kernels and so cannot be loaded.
13.5.2. Usage of 70-persistent-ipoib.rules
The rdma package provides the file
/etc/udev.d/rules.d/70-persistent-ipoib.rules
. This udev rules file is used to rename IPoIB devices from their default names (such as ib0
and ib1
) to more descriptive names. Users must edit this file to change how their devices are named. First, find out the GUID address for the device to be renamed:
~]$ ip link show ib0
8: ib0: >BROADCAST,MULTICAST,UP,LOWER_UP< mtu 65520 qdisc pfifo_fast state UP mode DEFAULT qlen 256
link/infiniband 80:00:02:00:fe:80:00:00:00:00:00:00:f4:52:14:03:00:7b:cb:a1 brd 00:ff:ff:ff:ff:12:40:1b:ff:ff:00:00:00:00:00:00:ff:ff:ff:ff
Immediately after link/infiniband
is the 20 byte hardware address for the IPoIB interface. The final 8 bytes of the address, marked in bold above, is all that is required to make a new name. Users can make up whatever naming scheme suits them. For example, use a device_fabric naming convention such as mlx4_ib0
if a mlx4
device is connected to an ib0
subnet fabric. The only thing that is not recommended is to use the standard names, like ib0
or ib1
, as these can conflict with the kernel assigned automatic names. Next, add an entry in the rules file. Copy the existing example in the rules file, replace the 8 bytes in the ATTR{address}
entry with the highlighted 8 bytes from the device to be renamed, and enter the new name to be used in the NAME
field.
13.5.3. Relaxing memlock restrictions for users
RDMA communications require that physical memory in the computer be pinned (meaning that the kernel is not allowed to swap that memory out to a paging file in the event that the overall computer starts running short on available memory). Pinning memory is normally a very privileged operation. In order to allow users other than
root
to run large RDMA applications, it will likely be necessary to increase the amount of memory that non-root
users are allowed to pin in the system. This is done by adding a file in the /etc/security/limits.d/
directory with contents such as the following:
~]$ more /etc/security/limits.d/rdma.conf
# configuration for rdma tuning
* soft memlock unlimited
* hard memlock unlimited
# rdma tuning end
13.5.4. Configuring Mellanox cards for Ethernet operation
Certain hardware from Mellanox is capable of running in either InfiniBand or Ethernet mode. These cards generally default to InfiniBand. Users can set the cards to Ethernet mode. There is currently support for setting the mode only on ConnectX family hardware (which uses either the mlx5 or mlx4 driver).
To configure Mellanox mlx5 cards, use the mstconfig program from the mstflint package. For more details, see the Configuring Mellanox mlx5 cards in Red Hat Enterprise Linux 7 Knowledge Base article on the Red Hat Customer Portal.
To configure Mellanox mlx4 cards, use mstconfig to set the port types on the card as described in the Knowledge Base article. If mstconfig does not support your card, edit the
/etc/rdma/mlx4.conf
file and follow the instructions in that file to set the port types properly for RoCE/IBoE usage. In this case is also necessary to rebuild the initramfs
to make sure the updated port settings are copied into the initramfs
.
Once the port type has been set, if one or both ports are set to Ethernet and mstconfig was not used to set the port types, then users might see this message in their logs:
mlx4_core 0000:05:00.0: Requested port type for port 1 is not supported on this HCAThis is normal and does not affect operation. The script responsible for setting the port type has no way of knowing when the driver has finished switching port 2 to the requested type internally, and from the time that the script issues a request for port 2 to switch until that switch is complete, the attempts to set port 1 to a different type get rejected. The script retries until the command succeeds or until a timeout has passed indicating that the port switch never completed.
13.5.5. Connecting to a Remote Linux SRP Target
The SCSI RDMA Protocol (SRP) is a network protocol that enables a system to use RDMA to access SCSI devices that are attached to another system. To allow an SRP initiator to connect an SRP target on the SRP target side, you must add an access control list (ACL) entry for the host channel adapter (HCA) port used in the initiator.
ACL IDs for HCA ports are not unique. The ACL IDs depend on the GID format of the HCAs. HCAs that use the same driver, for example
ib_qib
, can have different format of GIDs. The ACL ID also depends on how you initiate the connection request.
Connecting to a Remote Linux SRP Target: High-Level Overview
- Prepare the target side:
- Create storage back end. For example get the /dev/sdc1 partition:
/> /backstores/block create vol1 /dev/sdc1
- Create an SRP target:
/> /srpt create 0xfe80000000000000001175000077dd7e
- Create a LUN based on the back end created in step a:
/> /srpt/ib.fe80000000000000001175000077dd7e/luns create /backstores/block/vol1
- Create a Node ACL for the remote SRP client:
/> /srpt/ib.fe80000000000000001175000077dd7e/acls create 0x7edd770000751100001175000077d708
Note that the Node ACL is different forsrp_daemon
andibsrpdm
.
- Initiate an SRP connection with
srp_daemon
oribsrpdm
for the client side:[root@initiator]# srp_daemon -e -n -i qib0 -p 1 -R 60 &
[root@initiator]# ibsrpdm -c -d /dev/infiniband/umad0 > /sys/class/infiniband_srp/srp-qib0-1/add_target
- Optional. It is recommended to verify the SRP connection with different tools, such as
lsscsi
ordmesg
.
Procedure 13.3. Connecting to a Remote Linux SRP Target with srp_daemon or ibsrpdm
- Use the
ibstat
command on the target to determine theState
andPort GUID
values. The HCA must be inActive
state. The ACL ID is based on thePort GUID
:[root@target]#
ibstat
CA 'qib0' CA type: InfiniPath_QLE7342 Number of ports: 1 Firmware version: Hardware version: 2 Node GUID: 0x001175000077dd7e System image GUID: 0x001175000077dd7e Port 1: State:Active
Physical state: LinkUp Rate: 40 Base lid: 1 LMC: 0 SM lid: 1 Capability mask: 0x0769086a Port GUID:0x001175000077dd7e
Link layer: InfiniBand - Get the SRP target ID, which is based on the GUID of the HCA port. Note that you need a dedicated disk partition as a back end for a SRP target, for example
/dev/sdc1
. The following command replaces the default prefix of fe80, removes the colon, and adds the new prefix to the remainder of the string:[root@target]#
ibstatus | grep '<default-gid>' | sed -e 's/<default-gid>://' -e 's/://g' | grep 001175000077dd7e
fe80000000000000001175000077dd7e - Use the
targetcli
tool to create the LUN vol1 on the block device, create an SRP target, and export the LUN:[root@target]#
targetcli
/>/backstores/block create vol1 /dev/sdc1
Created block storage object vol1 using /dev/sdc1. />/srpt create 0xfe80000000000000001175000077dd7e
Created target ib.fe80000000000000001175000077dd7e. />/srpt/ib.fe80000000000000001175000077dd7e/luns create /backstores/block/vol1
Created LUN 0. />ls /
o- / ............................................................................. [...] o- backstores .................................................................. [...] | o- block ...................................................... [Storage Objects: 1] | | o- vol1 ............................... [/dev/sdc1 (77.8GiB) write-thru activated] | o- fileio ..................................................... [Storage Objects: 0] | o- pscsi ...................................................... [Storage Objects: 0] | o- ramdisk .................................................... [Storage Objects: 0] o- iscsi ................................................................ [Targets: 0] o- loopback ............................................................. [Targets: 0] o- srpt ................................................................. [Targets: 1] o- ib.fe80000000000000001175000077dd7e ............................... [no-gen-acls] o- acls ................................................................ [ACLs: 0] o- luns ................................................................ [LUNs: 1] o- lun0 ............................................... [block/vol1 (/dev/sdc1)] /> - Use the
ibstat
command on the initiator to check if the state isActive
and determine thePort GUID
:[root@initiator]#
ibstat
CA 'qib0' CA type: InfiniPath_QLE7342 Number of ports: 1 Firmware version: Hardware version: 2 Node GUID: 0x001175000077d708 System image GUID: 0x001175000077d708 Port 1: State:Active
Physical state: LinkUp Rate: 40 Base lid: 2 LMC: 0 SM lid: 1 Capability mask: 0x07690868 Port GUID:0x001175000077d708
Link layer: InfiniBand - Use the following command to scan without connecting to a remote SRP target. The target GUID shows that the initiator had found remote target. The ID string shows that the remote target is a Linux software target (
ib_srpt.ko
).[root@initiator]#
srp_daemon -a -o
IO Unit Info: port LID: 0001 port GID:fe80000000000000001175000077dd7e
change ID: 0001 max controllers: 0x10 controller[ 1] GUID:001175000077dd7e
vendor ID: 000011 device ID: 007322 IO class : 0100 ID:Linux SRP target
service entries: 1 service[ 0]: 001175000077dd7e / SRP.T10:001175000077dd7e - To verify the SRP connection, use the
lsscsi
command to list SCSI devices and compare thelsscsi
output before and after the initiator connects to target.[root@initiator]#
lsscsi
[0:0:10:0] disk IBM-ESXS ST9146803SS B53C /dev/sda - To connect to a remote target without configuring a valid ACL for the initiator port, which is expected to fail, use the following commands for
srp_daemon
oribsrpdm
:[root@initiator]#
srp_daemon -e -n -i qib0 -p 1 -R 60 &
[1] 4184[root@initiator]#
ibsrpdm -c -d /dev/infiniband/umad0 > /sys/class/infiniband_srp/srp-qib0-1/add_target
- The output of the
dmesg
shows why the SRP connection operation failed. In a later step, thedmesg
command on the target side is used to make the situation clear.[root@initiator]#
dmesg -c
[ 1230.059652] scsi host5: ib_srp: REJ received [ 1230.059659] scsi host5: ib_srp: SRP LOGIN fromfe80:0000:0000:0000:0011:7500:0077:d708
to fe80:0000:0000:0000:0011:7500:0077:dd7e REJECTED, reason0x00010006
[ 1230.073792] scsi host5: ib_srp: Connection 0/2 failed [ 1230.078848] scsi host5: ib_srp: Sending CM DREQ failed - Because of failed LOGIN, the output of the
lsscsi
command is the same as in the earlier step.[root@initiator]#
lsscsi
[0:0:10:0] disk IBM-ESXS ST9146803SS B53C /dev/sda - Using the
dmesg
on the target side (ib_srpt.ko
) provides an explanation of why LOGIN failed. Also, the output contains the valid ACL ID provided bysrp_daemon
:0x7edd770000751100001175000077d708
.[root@target]#
dmesg
[ 1200.303001] ib_srpt Received SRP_LOGIN_REQ with i_port_id 0x7edd770000751100:0x1175000077d708, t_port_id 0x1175000077dd7e:0x1175000077dd7e and it_iu_len 260 on port 1 (guid=0xfe80000000000000:0x1175000077dd7e)[ 1200.322207] ib_srpt Rejected login because no ACL has been configured yet for initiator 0x7edd770000751100001175000077d708.
- Use the
targetcli
tool to add a valid ACL:[root@target]#
targetcli
targetcli shell version 2.1.fb41 Copyright 2011-2013 by Datera, Inc and others. For help on commands, type 'help'. /> /srpt/ib.fe80000000000000001175000077dd7e/acls create0x7edd770000751100001175000077d708
Created Node ACL for ib.7edd770000751100001175000077d708 Created mapped LUN 0. - Verify the SRP LOGIN operation:
- Wait for 60 seconds to allow
srp_daemon
to re-try logging in:[root@initiator]#
sleep 60
- Verify the SRP LOGIN operation:
[root@initiator]#
lsscsi
[0:0:10:0] disk IBM-ESXS ST9146803SS B53C /dev/sda[7:0:0:0] disk LIO-ORG vol1 4.0 /dev/sdb
- For a kernel log of SRP target discovery, use:
[root@initiator]#
dmesg -c
[ 1354.182072] scsi host7: SRP.T10:001175000077DD7E [ 1354.187258] scsi 7:0:0:0: Direct-Access LIO-ORG vol1 4.0 PQ: 0 ANSI: 5 [ 1354.208688] scsi 7:0:0:0: alua: supports implicit and explicit TPGS [ 1354.215698] scsi 7:0:0:0: alua: port group 00 rel port 01 [ 1354.221409] scsi 7:0:0:0: alua: port group 00 state A non-preferred supports TOlUSNA [ 1354.229147] scsi 7:0:0:0: alua: Attached [ 1354.233402] sd 7:0:0:0: Attached scsi generic sg1 type 0 [ 1354.233694] sd 7:0:0:0: [sdb] 163258368 512-byte logical blocks: (83.5 GB/77.8 GiB) [ 1354.235127] sd 7:0:0:0: [sdb] Write Protect is off [ 1354.235128] sd 7:0:0:0: [sdb] Mode Sense: 43 00 00 08 [ 1354.235550] sd 7:0:0:0: [sdb] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA [ 1354.255491] sd 7:0:0:0: [sdb] Attached SCSI disk [ 1354.265233] scsi host7: ib_srp: new target: id_ext 001175000077dd7e ioc_guid 001175000077dd7e pkey ffff service_id 001175000077dd7e sgid fe80:0000:0000:0000:0011:7500:0077:d708 dgid fe80:0000:0000:0000:0011:7500:0077:dd7e xyx