5.275. RDMA
Updated RDMA packages that fix various bugs and add various enhancements are now available for Red Hat Enterprise Linux 6.
Red Hat Enterprise Linux includes a collection of InfiniBand and iWARP utilities, libraries and development packages for writing applications that use Remote Direct Memory Access (RDMA) technology.
Note
The RDMA packages have been upgraded to the latest upstream versions which provide a number of bug fixes and enhancements over the previous versions (BZ#739138).
- BZ#814845
- The rdma_bw and rdma_lat utilities provided by the perftest package are now deprecated and will be removed from the perftest package in a future update. Users should use the following utilities instead: ib_write_bw, ib_write_lat, ib_read_bw, and ib_read_lat.
Bug Fixes
- BZ#696019
- Previously, the rping utility did not properly join threads on shutdown. Consequently, on iWARP connections in particular, a race condition was triggered that resulted in the rping utility terminating unexpectedly with a segmentation fault. This update modifies rping to properly handle thread teardown. As a result, rping no longer crashes on iWARP connections.
- BZ#700289
- Previously, the kernel RDMA Connection Manager (rdmacm) did not have an option to reuse a socket port before the timeout had expired on that port after the last close. Consequently, when trying to open and close large numbers of sockets rapidly, it was possible to run out of suitable sockets that were not waiting in the timewait state. This update improves the kernel rdmacm provider to implement the SO_REUSEADDR option available for TCP/UDP sockets, which allows a socket that is closed but still in the timewait state to be reused when needed. As a result, it is now much more difficult to run out of sockets because the rdmacm provider does not need to wait for them to expire from the timewait state before they can be reused.
- BZ#735954
- The framework of the MVAPICH2 process manager mpirun_rsh in the mvapich2 package was broken. Consequently, all attempts to use mpirun_rsh failed. This update upgrades mpirun_rsh to later MVAPICH2 upstream sources that resolve the problem. As a result, mpirun_rsh works as expected.
- BZ#747406
- Previously, the permissions on the
/dev/ipath*
files were not permissive enough for normal users to access. Consequently, when a normal user attempted to run a Message Passing Interface (MPI) application using the Performance Scaled Messaging (PSM) Byte Transfer Layer (BTL), it failed due to the inability to open files starting with/dev/ipath
. This update makes sure that the files starting with/dev/ipath
have the correct permissions to be opened in read-write mode by normal users. As a result, attempts to run an MPI application using the PSM BTL succeed. - BZ#750609
- Previously, mappings from InfiniBand bit values to link speeds only extended to Quad Data Rate (QDR). Consequently, attempts to use newer InfiniBand cards that supported speeds faster than QDR did not work because the stack did not understand the bit values in the link speed field. This update adds FDR (Fourteen Data Rate), FDR10, and EDR (Enhanced Data Rate) link speeds to the kernel and user space libraries. Users can now make use of newer InfiniBand cards at these higher speeds.
- BZ#754196
- OpenSM did not support the
subnet_prefix
option on the command line. Consequently, in order to have two instances of OpenSM running on two different fabrics at the same time and on the same machine, the sysadmin had to edit two differentopensm.conf
files and specify the subnet_prefix separately in each file in order to have different prefixes on the different subnets. With this update, OpenSM accepts a subnet_prefix option and the OpenSM init script now starts OpenSM using this option when it is being started on multiple fabrics. As a result, a sysadmin is no longer required to hand edit multipleopensm.conf
files to create otherwise identical configurations that only vary by which fabric they are managing. - BZ#755459
- Previously, ibv_devinfo (a program included in libibverbs-utils) did not catch bad port numbers on the command line and return an error code. Consequently, scripts could not reliably tell whether or not the command had succeeded or failed due to a bad port number. This update fixes ibv_devinfo so that it returns a non-zero error condition when a user attempts to run it on a non-existent InfiniBand device port. As a result, scripts can now tell for certain if the port value they pass to ibv_devinfo was a valid port or was out of range.
- BZ#758498
- Initialization of RDMA over Converged Ethernet (RoCE) based queue pairs (QPs) was not completed successfully when initialization was done through libibverbs and not through librdmacm. Consequently, attempting to open the connection failed and the following error message was displayed:
cannot transition QP to RTR state
This updated kernel stack provides a fix for the libibverbs based RoCE QP creation and now users can properly create QPs whether they use libibverbs or librdmacm as the connection initiation method. - BZ#768109
- Previously, the openmpi library did not honor the tcp_port_range settings. Consequently, if users wished to limit the
TCP
ports that openmpi used they could not do so. This update to a later upstream version that does not have this problem allows users to now limit which TCP ports openmpi attempts to use. - BZ#768457
- Previously, the shared OpenType font library “libotf.so.0” was provided by both the openmpi package and the libotf package. Consequently, when an RPM spec file requested libotf.so.0 in order to operate properly, Yum could install either openmpi or libotf to satisfy the dependency, but as these two packages do not provide compatible libotf.so.0 libraries, the program might or might not work depending on whether or not the right provider was selected. The libotf.so.0 in openmpi is not intended for other applications to link against, it is an internal library. With this update, libotf.so.0 in openmpi is excluded from RPM's library identification searches. As a result, applications linking against libotf will get the right libotf, and openmpi will not accidentally be installed to satisfy the need for libotf.
- BZ#773713
- There was a race condition in handling of completion events in the perftest programs. Under certain conditions, the perftest program being used would terminate unexpectedly with a segmentation fault. This update adds separate send receive completion queues in place of the single completion queue for both send and receive operations. The race between the finish of a send and the finish of a receive is thereby avoided. As a result, the perftest applications no longer crash with a segmentation fault.
- BZ#804002
- The rds-ping tool did not check to make sure that a socket was available before sending the next ping packet. Consequently, when the timeout between packets was set very small by the user, packets could fill up all available sockets and then overwrite one of the sockets before any ping-packets were returned. This resulted in corruption in the rds-ping data structures and eventually rds-ping terminated unexpectedly with a segmentation fault. With this update, the rds-ping program stalls on sending any more packets if there are no sockets without outstanding packets. As a result, rds-ping no longer crashes with a segmentation fault when the timeout between packets is very small.
- BZ#805129
- Due to a bug in the
libmlx4.conf
modprobe configuration, usage of modprobe could result in an infinite loop of modprobe processes. If the bug was encountered, the processes would continually fork until there were no processes able to run and the system would become unresponsive. This update improves the code and as a result an incorrect configuration of options in/etc/modprobe.d/libmlx4.conf
no longer results in a system that is unresponsive and that requires a hard reboot in order to be restored to proper operation. - BZ#808673
- The qperf application had an outdated constant for
PF_RDS
in its source code that did not match the officially assigned value for PF_RDS and so qperf would compile with the wrong PF_RDS constant. Consequently, when it was run it would mistakenly think RDS (Reliable Datagram Service) was not supported on the machine even when it was and would refuse to run any RDS tests. This update removes the PF_RDS constant from the qperf source code so that it will pick up the correct constant from the system header files. As a result, qperf now properly runs RDS performance tests. - BZ#815215
- The srptools RPM did not automatically add the SCSI Remote Protocol daemon (srpd) to the service list. Consequently, the
chkconfig --list
command would not show the srpd service at all and the service could not be enabled. The srptools RPM now properly adds the srpd init script to the list of available services (it is disabled by default). Users can now see the srpd service using chkconfig --list and can enable the srpd service with thechkconfig --level 345 srpd on
command. - BZ#815622
- There was a bad test in the rdma init script. Consequently, the rds module would be loaded even if the user had configured it not to load. This update corrects the test in the init script so that all conditions must be met instead of just the first condition. As a result, the rds module is only loaded when the user has configured it to be loaded or if autoloaded by the kernel due to rds usage on the local machine.
Enhancements
- BZ#700285
- On large InfiniBand networks, Subnet Administration service lookups consumed a large amount of bandwidth. Consequently, it could take upwards of 1 minute to look up a route from one machine to another if the network InfiniBand Subnet Manager (OpenSM) was heavily congested. This update adds the InfiniBand Communication Management Assistant (ibacm) that caches routes in a similar manner to the ARP cache for Ethernet. The ibacm program caches PathRecords from the Subnet Administration service (SA) which includes information such as MTU (Maximum Transmission Unit), SL (Service Level), SLID (Source Local Identifier) and DLID (Destination Local Identifier) for InfiniBand paths. This information is important to set up QP's properly. As a result, large subnets with many nodes will have reduced overall SA Query traffic and route lookup times.
Users of RDMA should upgrade to these updated packages, which provide numerous bug fixes and enhancements.