17.4. Moving resources in a cluster
Pacemaker provides a variety of mechanisms for configuring a resource to move from one node to another and to manually move a resource when needed.
You can manually move resources in a cluster with the pcs resource move and pcs resource relocate commands, as described in Manually moving cluster resources. In addition to these commands, you can also control the behavior of cluster resources by enabling, disabling, and banning resources, as described in Disabling, enabling, and banning cluster resources.
You can configure a resource so that it will move to a new node after a defined number of failures, and you can configure a cluster to move resources when external connectivity is lost.
Moving resources due to failure
When you create a resource, you can configure the resource so that it will move to a new node after a defined number of failures by setting the migration-threshold option for that resource. Once the threshold has been reached, this node will no longer be allowed to run the failed resource until:
-
The resource’s
failure-timeoutvalue is reached. -
The administrator manually resets the resource’s failure count by using the
pcs resource cleanupcommand.
The value of migration-threshold is set to INFINITY by default. INFINITY is defined internally as a very large but finite number. A value of 0 disables the migration-threshold feature.
Setting a migration-threshold for a resource is not the same as configuring a resource for migration, in which the resource moves to another location without loss of state.
The following example adds a migration threshold of 10 to the resource named dummy_resource, which indicates that the resource will move to a new node after 10 failures.
# pcs resource meta dummy_resource migration-threshold=10
You can add a migration threshold to the defaults for the whole cluster with the following command.
# pcs resource defaults update migration-threshold=10
To determine the resource’s current failure status and limits, use the pcs resource failcount show command.
There are two exceptions to the migration threshold concept; they occur when a resource either fails to start or fails to stop. If the cluster property start-failure-is-fatal is set to true (which is the default), start failures cause the failcount to be set to INFINITY and always cause the resource to move immediately. For information about the start-failure-is-fatal cluster property, see Summary of cluster properties and options.
Stop failures are slightly different and crucial. If a resource fails to stop and STONITH is enabled, then the cluster will fence the node to be able to start the resource elsewhere. If STONITH is not enabled, then the cluster has no way to continue and will not try to start the resource elsewhere, but will try to stop it again after the failure timeout.
Moving resources due to connectivity changes
Setting up the cluster to move resources when external connectivity is lost is a two step process.
-
Add a
pingresource to the cluster. Thepingresource uses the system utility of the same name to test if a list of machines (specified by DNS host name or IPv4/IPv6 address) are reachable and uses the results to maintain a node attribute calledpingd. - Configure a location constraint for the resource that will move the resource to a different node when connectivity is lost.
The following table describes the properties you can set for a ping resource.
| Field | Description |
|---|---|
|
| The time to wait (dampening) for further changes to occur. This prevents a resource from bouncing around the cluster when cluster nodes notice the loss of connectivity at slightly different times. |
|
| The number of connected ping nodes gets multiplied by this value to get a score. Useful when there are multiple ping nodes configured. |
|
| The machines to contact to determine the current connectivity status. Allowed values include resolvable DNS host names, IPv4 and IPv6 addresses. The entries in the host list are space separated. |
This example procedure creates a ping resource that verifies connectivity to gateway.example.com. In practice, you would verify connectivity to your network gateway/router.
Procedure
Configure a
pingresource. You configure the resource as a clone so that the resource will run on all cluster nodes.# pcs resource create ping ocf:pacemaker:ping dampen=5s multiplier=1000 host_list=gateway.example.com cloneConfigure a location constraint rule for the existing resource named
Webserver. This causes theWebserverresource to move to a host that is able to pinggateway.example.comif the host that it is currently running on cannot pinggateway.example.com.# pcs constraint location Webserver rule score=-INFINITY pingd lt 1 or not_defined pingd