3.11. The GFS2 Withdraw Function
The GFS2 withdraw function is a data integrity feature of the GFS2 file system that prevents potential file system damage due to faulty hardware or kernel software. If the GFS2 kernel module detects an inconsistency while using a GFS2 file system on any given cluster node, it withdraws from the file system, leaving it unavailable to that node until it is unmounted and remounted (or the machine detecting the problem is rebooted). All other mounted GFS2 file systems remain fully functional on that node. (The GFS2 withdraw function is less severe than a kernel panic, which causes the node to be fenced.)
The main categories of inconsistency that can cause a GFS2 withdraw are as follows:
- Inode consistency error
- Resource group consistency error
- Journal consistency error
- Magic number metadata consistency error
- Metadata type consistency error
An example of an inconsistency that would cause a GFS2 withdraw is an incorrect block count for a file’s inode. When GFS2 deletes a file, it systematically removes all the data and metadata blocks referenced by that file. When done, it checks the inode’s block count. If the block count is not 1 (meaning all that is left is the disk inode itself), that indicates a file system inconsistency, since the inode’s block count did not match the actual blocks used for the file.
In many cases, the problem may have been caused by faulty hardware (faulty memory, motherboard, HBA, disk drives, cables, and so forth). It may also have been caused by a kernel bug (another kernel module accidentally overwriting GFS2’s memory), or actual file system damage (caused by a GFS2 bug).
In most cases, the GFS2 inconsistency is fixed by rebooting the cluster node. Before rebooting the cluster node, disable the GFS2 file system “clone” service from Pacemaker, which unmounts the file system on that node only.
#pcs resource disable --wait=100 mydata_fs_clone
#/sbin/reboot
Warning
Do not try to unmount and remount the file system manually with the
umount
and mount
commands. You must use the pcs
command, otherwise Pacemaker will detect the file system service has disappeared and fence the node.
The consistency problem that caused the withdraw may make stopping the file system service impossible as it may cause the system to hang.
If the problem persists after a remount, you should stop the file system service to unmount the file system from all nodes in the cluster, then perform a file system check with the fsck.gfs2 command before restarting the service with the following procedure.
- Reboot the affected node.
- Disable the non-clone file system service in Pacemaker to unmount the file system from every node in the cluster.
#
pcs resource disable --wait=100 mydata_fs
- From one node of the cluster, run the
fsck.gfs2
command on the file system device to check for and repair any file system damage.#
fsck.gfs2 -y /dev/vg_mydata/mydata > /tmp/fsck.out
- Remount the GFS2 file system from all nodes by re-enabling the file system service:
#
pcs resource enable --wait=100 mydata_fs
You can override the GFS2 withdraw function by mounting the file system with the
-o errors=panic
option specified in the file system service.
# pcs resource update mydata_fs “options=noatime,errors=panic”
When this option is specified, any errors that would normally cause the system to withdraw force a kernel panic instead. This stops the node's communications, which causes the node to be fenced. This is especially useful for clusters that are left unattended for long periods of time without monitoring or intervention.
Internally, the GFS2 withdraw function works by disconnecting the locking protocol to ensure that all further file system operations result in I/O errors. As a result, when the withdraw occurs, it is normal to see a number of I/O errors from the device mapper device reported in the system logs.