8.15. cluster and gfs2-utils
Updated cluster and gfs2-utils packages that fix several bugs are now available for Red Hat Enterprise Linux 6.
The Red Hat Cluster Manager is a collection of technologies working together to provide data integrity and the ability to maintain application availability in the event of a failure. Using redundant hardware, shared disk storage, power management, and robust cluster communication and application failover mechanisms, a cluster can meet the needs of the enterprise market.
Bug Fixes
- BZ#996233
- Prior to this update, if one of the gfs2_tool, gfs2_quota, gfs2_grow, or gfs2_jadd commands was killed unexpectedly, a temporary GFS2 metadata mount point used by those tools could be left mounted. The mount point was also not registered in the /etc/mtab file, and so the "umount -a -t gfs2" command did not unmount it. This mount point could prevent systems from rebooting properly, and cause the kernel to panic in cases where it was manually unmounted after the normal GFS2 mount point. This update corrects the problem by creating an mtab entry for the temporary mount point, which unmounts it before exiting when signals are received.
- BZ#893925
- Previously, the cman utility did not work correctly if there was a brief network failure in a cluster running in two_node mode with no fence delay. Consequently, the two nodes killed each other when the connection was re-established. This update adds a 5-second delay to the “fenced” daemon for the node with the higher node ID and the described problem no longer occurs. Another option is to add a fence delay into the "cluster.conf" file, as documented in the Red Hat Knowledgebase (see https://access.redhat.com/site/solutions/54829).
- BZ#982670
- Prior to this update, the cman init script did not handle its lock file correctly when executing the "restart" command. Consequently, the node could be removed from the cluster by other members during the node reboot. The cman init script has been modified to handle the lock file correctly, and no fencing action is now taken by other nodes of the cluster.
- BZ#889564
- Previously, when the corosync utility detected a "process pause", an old, therefore invalid, control group ID was occasionally sent to the gfs_controld daemon. Consequently, gfs_controld became unresponsive. This update fixes gfs_controld to discard messages with old control group IDs, and gfs_controld no longer hangs in this scenario.
- BZ#888857
- Prior to this update, the "fenced" daemon and other related daemons occasionally closed a file descriptor that was still referenced by the corosync libraries during an attempt to stop the daemons. Consequently, the daemons did not terminate properly and shutting down the cluster utility failed. This bug has been fixed, the file descriptor now stays open and it is marked unused by the daemons, and the daemons terminate properly.
- BZ#989647
- Previously, the fsck.gfs2 utility did not handle a certain type of file system corruption properly. As a consequence, fsck.gfs2 terminated with an error message and did not repair the corruption. This update extends the abilities of fsck.gfs2 to handle file system corruption and the described problems no longer occur.
- BZ#1007970
- Previously, the "-K" option was unavailable in the mkfs.gfs2 utility. Consequently, mkfs.gfs2 returned the "invalid option" error message, and it was impossible to use this option to keep and not to discard unused blocks. With this update, mkfs.gfs2 handles the "-K" option properly.
- BZ#896191
- The cluster.conf(5) manual page contained incorrect information that the default syslog facility was "daemon". This update corrects this statement to "local4".
- BZ#902920
- Previously, the fsck.gfs2 utility did not correctly recognize cases when information about a directory in the Global File System 2 (GFS2) was misplaced. Also, fsck.gfs2 did not properly check consistency of the GFS2 directory hash table. As a consequence, fsck.gfs2 did not report problems with the file system and the files in the corrupted directories were unusable. With this update, fsck.gfs2 has been modified to do extensive sanity checking and it is now able to identify and fix the described problems among others.
- BZ#963657
- Prior to this update, nested Global File System 2 (GFS2) mount points were not taken into account when stopping the GFS2 resources. Consequently, the mount points were not being unmounted in the correct order and the gfs2 utility failed to stop. The gfs2 init script has been modified to unmount GFS2 mount points in the correct order and the stopping of gfs2 no longer fails in this scenario.
- BZ#920358
- Previously, the qdiskd daemon did not correctly handle newly rejoined nodes that had been rebooted uncleanly. Consequently, qdiskd removed such nodes after its initialization. With this update, qdiskd skips counting of the missed updates for nodes in the "S_NONE" state, and it no longer removes nodes in the described scenario.
- BZ#888318
- Previously, the qdiskd daemon did not issue a specific error message for cases when the token timeout was set incorrectly in the "cluster.conf" file. Consequently, qdiskd terminated with the "qdiskd: configuration failed" error message giving no details. This update adds a specific error message for the described cases.
- BZ#886585
- Previously, the gfs2_grow utility returned a zero exit status even in cases where no growth was possible, due to how little the device had grown. Consequently, automated scripts, used especially for testing of gfs2_grow, received an incorrect "0" return code. With this update, gfs2_grow has been modified to return a non-zero exit status when its operations fail.
- BZ#871603
- Previously, the help text for the "ccs_tool create" command contained incorrect parameters for the "addfence" subcommand, namely "user" instead of "login". Consequently, users could create an incorrect "cluster.conf" file. With this update, the help text has been corrected.
- BZ#985796
- Previously, when the fsck.gfs2 utility was repairing the superblock, it looked up the locking configuration fields from the "cluster.conf" file. Consequently, the "lockproto" and "locktable" fields could be set improperly when the superblock was repaired. With this update, the "lockproto" and "locktable" fields are now set to sensible default values and the user is now instructed to set the fields with the tunegfs2 utility at the end of the fsck.gfs2 run.
- BZ#984085
- Previously, the fsck.gfs2 utility did not properly handle cases when directory leaf blocks were duplicated. As a consequence, files in the corrupted directories were occasionally not found and fsck.gfs2 became unresponsive. With this update, fsck.gfs2 checks for duplicate blocks in all directories, identifies and fixes corruptions, and it no longer hangs in this scenario.
Users of cluster and gfs2-utils are advised to upgrade to these updated packages, which fix these bugs.