4.5. Configuring a Watchdog
4.5.1. Adding a Watchdog Card to a Virtual Machine
You can add a watchdog card to a virtual machine to monitor the operating system's responsiveness.
Procedure 4.9. Adding Watchdog Cards to Virtual Machines
- Click the Virtual Machines tab and select a virtual machine.
- Click.
- Click the High Availability tab.
- Select the watchdog model to use from the Watchdog Model drop-down list.
- Select an action from the Watchdog Action drop-down list. This is the action that the virtual machine takes when the watchdog is triggered.
- Click.
4.5.2. Installing a Watchdog
To activate a watchdog card attached to a virtual machine, you must install the watchdog package on that virtual machine and start the
watchdog
service.
Procedure 4.10. Installing Watchdogs
- Log in to the virtual machine on which the watchdog card is attached.
- Install the watchdog package and dependencies:
# yum install watchdog
- Edit the
/etc/watchdog.conf
file and uncomment the following line:watchdog-device = /dev/watchdog
- Save the changes.
- Start the
watchdog
service and ensure this service starts on boot:- Red Hat Enterprise Linux 6:
# service watchdog start # chkconfig watchdog on
- Red Hat Enterprise Linux 7:
# systemctl start watchdog.service # systemctl enable watchdog.service
4.5.3. Confirming Watchdog Functionality
Confirm that a watchdog card has been attached to a virtual machine and that the
watchdog
service is active.
Warning
This procedure is provided for testing the functionality of watchdogs only and must not be run on production machines.
Procedure 4.11. Confirming Watchdog Functionality
- Log in to the virtual machine on which the watchdog card is attached.
- Confirm that the watchdog card has been identified by the virtual machine:
# lspci | grep watchdog -i
- Run one of the following commands to confirm that the watchdog is active:
- Trigger a kernel panic:
# echo c > /proc/sysrq-trigger
- Terminate the
watchdog
service:# kill -9 `pgrep watchdog`
The watchdog timer can no longer be reset, so the watchdog counter reaches zero after a short period of time. When the watchdog counter reaches zero, the action specified in the Watchdog Action drop-down menu for that virtual machine is performed.
4.5.4. Parameters for Watchdogs in watchdog.conf
The following is a list of options for configuring the
watchdog
service available in the /etc/watchdog.conf
file. To configure an option, you must uncomment that option and restart the watchdog
service after saving the changes.
Note
For a more detailed explanation of options for configuring the
watchdog
service and using the watchdog
command, see the watchdog
man page.
Variable name | Default Value | Remarks |
---|---|---|
ping | N/A | An IP address that the watchdog attempts to ping to verify whether that address is reachable. You can specify multiple IP addresses by adding additional ping lines. |
interface | N/A | A network interface that the watchdog will monitor to verify the presence of network traffic. You can specify multiple network interfaces by adding additional interface lines. |
file | /var/log/messages | A file on the local system that the watchdog will monitor for changes. You can specify multiple files by adding additional file lines. |
change | 1407 | The number of watchdog intervals after which the watchdog checks for changes to files. A change line must be specified on the line directly after each file line, and applies to the file line directly above that change line. |
max-load-1 | 24 | The maximum average load that the virtual machine can sustain over a one-minute period. If this average is exceeded, then the watchdog is triggered. A value of 0 disables this feature. |
max-load-5 | 18 | The maximum average load that the virtual machine can sustain over a five-minute period. If this average is exceeded, then the watchdog is triggered. A value of 0 disables this feature. By default, the value of this variable is set to a value approximately three quarters that of max-load-1 . |
max-load-15 | 12 | The maximum average load that the virtual machine can sustain over a fifteen-minute period. If this average is exceeded, then the watchdog is triggered. A value of 0 disables this feature. By default, the value of this variable is set to a value approximately one half that of max-load-1 . |
min-memory | 1 | The minimum amount of virtual memory that must remain free on the virtual machine. This value is measured in pages. A value of 0 disables this feature. |
repair-binary | /usr/sbin/repair | The path and file name of a binary file on the local system that will be run when the watchdog is triggered. If the specified file resolves the issues preventing the watchdog from resetting the watchdog counter, then the watchdog action is not triggered. |
test-binary | N/A | The path and file name of a binary file on the local system that the watchdog will attempt to run during each interval. A test binary allows you to specify a file for running user-defined tests. |
test-timeout | N/A | The time limit, in seconds, for which user-defined tests can run. A value of 0 allows user-defined tests to continue for an unlimited duration. |
temperature-device | N/A | The path to and name of a device for checking the temperature of the machine on which the watchdog service is running. |
max-temperature | 120 | The maximum allowed temperature for the machine on which the watchdog service is running. The machine will be halted if this temperature is reached. Unit conversion is not taken into account, so you must specify a value that matches the watchdog card being used. |
admin | root | The email address to which email notifications are sent. |
interval | 10 | The interval, in seconds, between updates to the watchdog device. The watchdog device expects an update at least once every minute, and if there are no updates over a one-minute period, then the watchdog is triggered. This one-minute period is hard-coded into the drivers for the watchdog device, and cannot be configured. |
logtick | 1 | When verbose logging is enabled for the watchdog service, the watchdog service periodically writes log messages to the local system. The logtick value represents the number of watchdog intervals after which a message is written. |
realtime | yes | Specifies whether the watchdog is locked in memory. A value of yes locks the watchdog in memory so that it is not swapped out of memory, while a value of no allows the watchdog to be swapped out of memory. If the watchdog is swapped out of memory and is not swapped back in before the watchdog counter reaches zero, then the watchdog is triggered. |
priority | 1 | The schedule priority when the value of realtime is set to yes . |
pidfile | /var/run/syslogd.pid | The path and file name of a PID file that the watchdog monitors to see if the corresponding process is still active. If the corresponding process is not active, then the watchdog is triggered. |