10.4. Cluster Daemon crashes
RGManager has a watchdog process that reboots the host if the main
rgmanager
process fails unexpectedly. This causes the cluster node to get fenced and rgmanager
to recover the service on another host. When the watchdog daemon detects that the main rgmanager
process has crashed then it will reboot the cluster node, and the active cluster nodes will detect that the cluster node has left and evict it from the cluster.
The lower number process ID (PID) is the watchdog process that takes action if its child (the process with the higher PID number) crashes. Capturing the core of the process with the higher PID number using
gcore
can aid in troubleshooting a crashed daemon.
Install the packages that are required to capture and view the core, and ensure that both the
rgmanager
and rgmanager-debuginfo
are the same version or the captured application core might be unusable.
$ yum -y --enablerepo=rhel-debuginfo install gdb rgmanager-debuginfo
10.4.1. Capturing the rgmanager
Core at Runtime
There are two
rgmanager
processes that are running as it is started. You must capture the core for the rgmanager
process with the higher PID.
The following is an example output from the
ps
command showing two processes for rgmanager
.
$ ps aux | grep rgmanager | grep -v grep root 22482 0.0 0.5 23544 5136 ? S<Ls Dec01 0:00 rgmanager root 22483 0.0 0.2 78372 2060 ? S<l Dec01 0:47 rgmanager
In the following example, the
pidof
program is used to automatically determine the higher-numbered pid, which is the appropriate pid to create the core. The full command captures the application core for the process 22483 which has the higher pid number.
$ gcore -o /tmp/rgmanager-$(date '+%F_%s').core $(pidof -s rgmanager)