9.3.2. Slow Recovery Times
The following configuration settings affect recovery time. The example values used give fast recovery on a lightly loaded system. Run tests to determine if the appropriate values for your system and load conditions.
cluster.conf
<rm status_poll_interval=1>
- status_poll_interval is the interval in seconds that the resource manager checks the status of managed services. This affects how quickly the manager will detect failed services.
<ip address="20.0.20.200" monitor_link="yes" sleeptime="0"/>
- This is a virtual IP address for client traffic.
monitor_link="yes"
means monitor the health of the NIC used for the VIP.sleeptime="0"
means don't delay when failing over the VIP to a new address.
qpidd.conf
link-maintenance-interval=0.1
- The number of seconds to wait for back-up brokers to check the link to the primary re-connect if required. This value defaults to
2
. The value can be set lower for a faster failover (for example,0.1
).Note
Setting the value too low will result in excessive link-checking activity on the brokers. link-heartbeat-interval=5
- Heartbeat interval for federation links. The HA cluster uses federation links between the primary and each backup. The primary can take up to twice the heartbeat interval to detect a failed backup. When a sender sends a message the primary waits for all backups to acknowledge before acknowledging to the sender. A disconnected backup may cause the primary to block senders until it is detected via heartbeat.This interval is also used as the timeout for broker status checks by rgmanager. It may take up to this interval for
rgmanager
to detect a hung broker.The default is 120 seconds. This may be too high for many productions scenarios where availability and response time is important. However, if set too low, under network congestion or heavy load a slow-to-respond broker may be restarted byrgmanager
.