3.2.3. Restart Policy Extensions
When the restart recovery policy is used, you may additionally specify a maximum threshold for how many restarts may occur on the same node in a given time. There are two parameters available for services called max_restarts and restart_expire_time which control this.
The max_restarts parameter is an integer which specifies the maximum number of restarts before giving up and relocating the service to another host in the cluster.
The restart_expire_time parameter tells rgmanager how long to remember a restart event.
The use of the two parameters together creates a sliding window for the number of tolerated restarts in a given amount of time. For example:
The above service tolerance is 3 restarts in 5 minutes. On the fourth service failure in 300 seconds, rgmanager will not restart the service and instead relocate the service to another available host in the cluster.