Chapter 7. Diagnosing and correcting problems with GFS2 file systems
The following procedures describe some common GFS2 issues and provide information on how to address them.
7.2. GFS2 file system hangs and requires reboot of one node
If your GFS2 file system hangs and does not return commands run against it, but rebooting one specific node returns the system to normal, this may be indicative of a locking problem or bug. Should this occur, gather GFS2 data during one of these occurences and open a support ticket with Red Hat Support, as described in Gathering GFS2 data for troubleshooting.
7.3. GFS2 file system hangs and requires reboot of all nodes
If your GFS2 file system hangs and does not return commands run against it, requiring that you reboot all nodes in the cluster before using it, check for the following issues.
- You may have had a failed fence. GFS2 file systems will freeze to ensure data integrity in the event of a failed fence. Check the messages logs to see if there are any failed fences at the time of the hang. Ensure that fencing is configured correctly.
The GFS2 file system may have withdrawn. Check through the messages logs for the word
withdraw
and check for any messages and call traces from GFS2 indicating that the file system has been withdrawn. A withdraw is indicative of file system corruption, a storage failure, or a bug. At the earliest time when it is convenient to unmount the file system, you should perform the following procedure:Reboot the node on which the withdraw occurred.
# /sbin/reboot
Stop the file system resource to unmount the GFS2 file system on all nodes.
# pcs resource disable --wait=100 mydata_fs
Capture the metadata with the
gfs2_edit savemeta…
command. You should ensure that there is sufficient space for the file, which in some cases may be large. In this example, the metadata is saved to a file in the/root
directory.# gfs2_edit savemeta /dev/vg_mydata/mydata /root/gfs2metadata.gz
Update the
gfs2-utils
package.# sudo yum update gfs2-utils
On one node, run the
fsck.gfs2
command on the file system to ensure file system integrity and repair any damage.# fsck.gfs2 -y /dev/vg_mydata/mydata > /tmp/fsck.out
After the
fsck.gfs2
command has completed, re-enable the file system resource to return it to service:# pcs resource enable --wait=100 mydata_fs
Open a support ticket with Red Hat Support. Inform them you experienced a GFS2 withdraw and provide logs and the debugging information generated by the
sosreports
andgfs2_edit savemeta
commands.In some instances of a GFS2 withdraw, commands can hang that are trying to access the file system or its block device. In these cases a hard reboot is required to reboot the cluster.
For information about the GFS2 withdraw function, see GFS2 filesystem unavailable to a node (the GFS2 withdraw function).
- This error may be indicative of a locking problem or bug. Gather data during one of these occurrences and open a support ticket with Red Hat Support, as described in Gathering GFS2 data for troubleshooting.
7.4. GFS2 file system does not mount on newly added cluster node
If you add a new node to a cluster and find that you cannot mount your GFS2 file system on that node, you may have fewer journals on the GFS2 file system than nodes attempting to access the GFS2 file system. You must have one journal per GFS2 host you intend to mount the file system on (with the exception of GFS2 file systems mounted with the spectator
mount option set, since these do not require a journal). You can add journals to a GFS2 file system with the gfs2_jadd
command, as described in Adding journals to a GFS2 file system.
7.5. Space indicated as used in empty file system
If you have an empty GFS2 file system, the df
command will show that there is space being taken up. This is because GFS2 file system journals consume space (number of journals * journal size) on disk. f you created a GFS2 file system with a large number of journals or specified a large journal size then you will be see (number of journals * journal size) as already in use when you execute the df
command. Even if you did not specify a large number of journals or large journals, small GFS2 file systems (in the 1GB or less range) will show a large amount of space as being in use with the default GFS2 journal size.
7.6. Gathering GFS2 data for troubleshooting
If your GFS2 file system hangs and does not return commands run against it and you find that you need to open a ticket with Red Hat Support, you should first gather the following data:
The GFS2 lock dump for the file system on each node:
cat /sys/kernel/debug/gfs2/fsname/glocks >glocks.fsname.nodename
The DLM lock dump for the file system on each node: You can get this information with the
dlm_tool
:dlm_tool lockdebug -sv lsname
In this command, lsname is the lockspace name used by DLM for the file system in question. You can find this value in the output from the
group_tool
command.-
The output from the
sysrq -t
command. -
The contents of the
/var/log/messages
file.
Once you have gathered that data, you can open a ticket with Red Hat Support and provide the data you have collected.