Questo contenuto non è disponibile nella lingua selezionata.
Chapter 8. Troubleshooting
If you find that you are seeing error messages when you try to configure your system, or if after configuration your system does not behave as expected, you can perform the following checks and examine the following areas.
- Connect to one of the nodes in the cluster and execute the
clustat
(8) command. This command runs a utility that displays the status of the cluster. It shows membership information, quorum view, and the state of all configured user services.The following example shows the output of theclustat
(8) command.[root@clusternode4 ~]#
clustat
Cluster Status for nfsclust @ Wed Dec 3 12:37:22 2008 Member Status: Quorate Member Name ID Status ------ ---- ---- ------ clusternode5.example.com 1 Online, rgmanager clusternode4.example.com 2 Online, Local, rgmanager clusternode3.example.com 3 Online, rgmanager clusternode2.example.com 4 Online, rgmanager clusternode1.example.com 5 Online, rgmanager Service Name Owner (Last) State ------- --- ----- ------ ----- service:nfssvc clusternode2.example.com startingIn this example,clusternode4
is the local node since it is the host from which the command was run. Ifrgmanager
did not appear in theStatus
category, it could indicate that cluster services are not running on the node. - Connect to one of the nodes in the cluster and execute the
group_tool
(8) command. This command provides information that you may find helpful in debugging your system. The following example shows the output of thegroup_tool
(8) command.[root@clusternode1 ~]#
group_tool
type level name id state fence 0 default 00010005 none [1 2 3 4 5] dlm 1 clvmd 00020005 none [1 2 3 4 5] dlm 1 rgmanager 00030005 none [3 4 5] dlm 1 mygfs 007f0005 none [5] gfs 2 mygfs 007e0005 none [5]The state of the group should benone
. The numbers in the brackets are the node ID numbers of the cluster nodes in the group. Theclustat
shows which node IDs are associated with which nodes. If you do not see a node number in the group, it is not a member of that group. For example, if a node ID is not in dlm/rgmanager group, it is not using the rgmanager dlm lock space (and probably is not running rgmanager).The level of a group indicates the recovery ordering. 0 is recovered first, 1 is recovered second, and so forth. - Connect to one of the nodes in the cluster and execute the
cman_tool nodes -f
command This command provides information about the cluster nodes that you may want to look at. The following example shows the output of thecman_tool nodes -f
command.[root@clusternode1 ~]#
cman_tool nodes -f
Node Sts Inc Joined Name 1 M 752 2008-10-27 11:17:15 clusternode5.example.com 2 M 752 2008-10-27 11:17:15 clusternode4.example.com 3 M 760 2008-12-03 11:28:44 clusternode3.example.com 4 M 756 2008-12-03 11:28:26 clusternode2.example.com 5 M 744 2008-10-27 11:17:15 clusternode1.example.comTheSts
heading indicates the status of a node. A status of M indicates the node is a member of the cluster. A status of X indicates that the node is dead. TheInc
heading indicating the incarnation number of a node, which is for debugging purposes only. - Check whether the
cluster.conf
is identical in each node of the cluster. If you configure your system with Conga, as in the example provided in this document, these files should be identical, but one of the files may have accidentally been deleted or altered. - In addition to using Conga to fence a node in order to test whether failover is working properly as described in Chapter 7, Testing the NFS Cluster Service, you could disconnect the ethernet connection between cluster members. You might try disconnecting one, two, or three nodes, for example. This could help isolate where the problem is.
- If you are having trouble mounting or modifying an NFS volume, check whether the cause is one of the following:
- The network between server and client is down.
- The storage devices are not connected to the system.
- More than half of the nodes in the cluster have crashed, rendering the cluster inquorate. This stops the cluster.
- The GFS file system is not mounted on the cluster nodes.
- The GFS file system is not writable.
- The IP address you defined in the
cluster.conf
is not bounded to the correct interface / NIC (sometimes theip.sh
script does not perform as expected).
- Execute a
showmount -e
command on the node running the cluster service. If it shows up the right 5 exports, check your firewall configuration for all necessary ports for using NFS. - If SELinux is currently in
enforcing
mode on your system, check your/var/log/audit.log
file for any relevant messages. If you are using NFS to serve home directories, check whether the correct SELinux boolean value fornfs_home_dirs
has been set to 1; this is required if you want to use NFS-based home directories on a client that is running SELinux. If you do not set this value on, you can mount the directories as root but cannot use them as home directories for your users. - Check the
/var/log/messages
file for error messages from the NFS daemon. - If you see the expected results locally at the cluster nodes and between the cluster nodes but not at the defined clients, check the firewall configuration at the clients.