10.4. Cluster Daemon crashes
RGManager has a watchdog process that reboots the host if the main
rgmanager process fails unexpectedly. This causes the cluster node to get fenced and rgmanager to recover the service on another host. When the watchdog daemon detects that the main rgmanager process has crashed then it will reboot the cluster node, and the active cluster nodes will detect that the cluster node has left and evict it from the cluster.
The lower number process ID (PID) is the watchdog process that takes action if its child (the process with the higher PID number) crashes. Capturing the core of the process with the higher PID number using
gcore can aid in troubleshooting a crashed daemon.
Install the packages that are required to capture and view the core, and ensure that both the
rgmanager and rgmanager-debuginfo are the same version or the captured application core might be unusable.
yum -y --enablerepo=rhel-debuginfo install gdb rgmanager-debuginfo
$ yum -y --enablerepo=rhel-debuginfo install gdb rgmanager-debuginfo
10.4.1. Capturing the rgmanager Core at Runtime Copy linkLink copied to clipboard!
Copy linkLink copied to clipboard!
There are two
rgmanager processes that are running as it is started. You must capture the core for the rgmanager process with the higher PID.
The following is an example output from the
ps command showing two processes for rgmanager.
ps aux | grep rgmanager | grep -v grep
$ ps aux | grep rgmanager | grep -v grep
root 22482 0.0 0.5 23544 5136 ? S<Ls Dec01 0:00 rgmanager
root 22483 0.0 0.2 78372 2060 ? S<l Dec01 0:47 rgmanager
In the following example, the
pidof program is used to automatically determine the higher-numbered pid, which is the appropriate pid to create the core. The full command captures the application core for the process 22483 which has the higher pid number.
gcore -o /tmp/rgmanager-$(date '+%F_%s').core $(pidof -s rgmanager)
$ gcore -o /tmp/rgmanager-$(date '+%F_%s').core $(pidof -s rgmanager)