5.26. cluster and gfs2-utils
Updated cluster and gfs2-utils packages that fix multiple bugs and add various enhancements are now available for Red Hat Enterprise Linux 6.
The cluster and gfs2-utils packages contain the core clustering libraries for Red Hat High Availability as well as utilities to maintain GFS2 file systems for users of Red Hat Resilient Storage.
Bug Fixes
- BZ#759603
- A race condition existed when a node lost contact with the quorum device at the same time as the token timeout period expired. The nodes raced to fence, which could lead to a cluster failure. To prevent the race condition from occurring, the cman and
qdiskd
interaction timer has been improved. - BZ#750314
- Previously, a cluster partition and merge during startup fencing was not detected correctly. As a consequence, the DLM (Distributed Lock Manager) lockspace operations could become unresponsive. With this update, the partition and merge event is now detected and handled properly. DLM lockspace operations no longer become unresponsive in the described scenario.
- BZ#745538
- Multiple
ping
command examples on the qdisk(5) manual page did not include the-w
option. If theping
command is run without the option, the action can timeout. With this update, the-w
option has been added to thoseping
commands. - BZ#745161
- Due to a bug in libgfs2, sentinel directory entries were counted as if they were real entries. As a consequence, the mkfs.gfs2 utility created file systems which did not pass the fsck check when a large number of journal metadata blocks were required (for example, a file system with block size of 512, and 9 or more journals). With this update, incrementing the count of the directory entry is now avoided when dealing with sentinel entries.
GFS2
file systems created with large numbers of journal metadata blocks now pass the fsck check cleanly. - BZ#806002
- When a node fails and gets fenced, the node is usually rebooted and joins the cluster with a fresh state. However, if a block occurs during the rejoin operation, the node cannot rejoin the cluster and the attempt fails during boot. Previously, in such a case, the cman init script did not revert actions that had happened during startup and some daemons could be erroneously left running on a node. The underlying source code has been modified so that the cman init script now performs a full rollback when errors are encountered. No daemons are left running unnecessarily in this scenario.
- BZ#804938
- The RELAX NG schema used to validate the cluster.conf file previously did not recognize the
totem.miss_count_const
constant as a valid option. As a consequence, users were not able to validatecluster.conf
when this option was in use. This option is now recognized correctly by the RELAX NG schema, and thecluster.conf
file can be validated as expected. - BZ#819787
- The
cmannotifyd
daemon is often started after the cman utility, which means thatcmannotifyd
does not receive or dispatch any notifications on the current cluster status at startup. This update modifies the cman connection loop to generate a notification that the configuration and membership have changed. - BZ#749864
- Incorrect use of the
free()
function in the gfs2_edit code could lead to memory leaks and so cause various problems. For example, when the user executed thegfs2_edit savemeta
command, the gfs2_edit utility could become unresponsive or even terminate unexpectedly. This update applies multiple upstream patches so that thefree()
function is now used correctly and memory leaks no longer occur. With this update, save statistics for thegfs2_edit savemeta
command are now reported more often so that users know that the process is still running when saving a large dinode with a huge amount of metadata. - BZ#742595
- Previously, the gfs2_grow utility failed to expand a GFS file system if the file system contained only one resource group. This was due to the old code being based on
GFS1
(which had different fields) that calculated distances between resource groups and did not work with only one resource group. This update adds thergrp_size()
function in libgfs2, which calculates the size of the resource group instead of determining its distance from the previous resource group. A file system with only one resource group can now be expanded successfully. - BZ#742293
- Previously, the gfs2_edit utility printed unclear error messages when the underlying device did not contain a valid GFS2 file system, which could be confusing. With this update, users are provided with additional information in the aforementioned scenario.
- BZ#769400
- Previously, the mkfs utility provided users with insufficient error messages when creating a
GFS2
file system. The messages also contained absolute build paths and source code references, which was unwanted. A patch has been applied to provide users with comprehensive error messages in the described scenario. - BZ#753300
- The
gfs_controld
daemon ignored an error returned by thedlm_controld
daemon for thedlmc_fs_register()
function while mounting a file system. This resulted in a successful mount, but recovery of aGFS
file system could not be coordinated using Distributed Lock Manager (DLM). With this update, mounting a file system is not successful under these circumstances and an error message is returned instead.
Enhancements
- BZ#675723, BZ#803510
- The gfs2_convert utility can be used on a
GFS1
file system to convert a file system fromGFS1
toGFS2
. However, the gfs2_convert utility required the user to run the gfs_fsck utility prior to conversion, but because this tool is not included in Red Hat Enterprise Linux 6, users had to use Red Hat Enterprise Linux 5 to run this utility. With this update, the gfs2_fsck utility now allows users to perform a completeGFS1
toGFS2
conversion on Red Hat Enterprise Linux 6 systems. - BZ#678372
- Cluster tuning using the
qdiskd
daemon and the device-mapper-multipath utility is a very complex operation, and it was previously easy to misconfigureqdiskd
in this setup, which could consequently lead to a cluster nodes failure. Input and output operations of theqdiskd
daemon have been improved to automatically detect multipath-related timeouts without requiring manual configuration. Users can now easily deployqdiskd
with device-mapper-multipath. - BZ#733298, BZ#740552
- Previously, the cman utility was not able to configure Redundant Ring Protocol (RRP) correctly in corosync, resulting in RRP deployments not working propely. With this update, cman has been improved to configure RRP properly and to perform extra sanity checks on user configurations. It is now easier to deploy a cluster with RRP and the user is provided with more extensive error reports.
- BZ#745150
- With this update, Red Hat Enterprise Linux High Availability has been validated against the VMware vSphere 5.0 release.
- BZ#749228
- With this update, the fence_scsi fencing agent has been validated for use in a two-node cluster with High Availability LVM (HA-LVM).
All users of cluster and gfs2-utils are advised to upgrade to these updated package, which fix these bugs and add these enhancements.