Chapter 23. Configuring resources to remain stopped on clean node shutdown
When a cluster node shuts down, Pacemaker’s default response is to stop all resources running on that node and recover them elsewhere, even if the shutdown is a clean shutdown. As of RHEL 8.2, you can configure Pacemaker so that when a node shuts down cleanly, the resources attached to the node will be locked to the node and unable to start elsewhere until they start again when the node that has shut down rejoins the cluster. This allows you to power down nodes during maintenance windows when service outages are acceptable without causing that node’s resources to fail over to other nodes in the cluster.
23.1. Cluster properties to configure resources to remain stopped on clean node shutdown
The ability to prevent resources from failing over on a clean node shutdown is implemented by means of the following cluster properties.
shutdown-lock
When this cluster property is set to the default value of
false
, the cluster will recover resources that are active on nodes being cleanly shut down. When this property is set totrue
, resources that are active on the nodes being cleanly shut down are unable to start elsewhere until they start on the node again after it rejoins the cluster.The
shutdown-lock
property will work for either cluster nodes or remote nodes, but not guest nodes.If
shutdown-lock
is set totrue
, you can remove the lock on one cluster resource when a node is down so that the resource can start elsewhere by performing a manual refresh on the node with the following command.pcs resource refresh resource node=nodename
Note that once the resources are unlocked, the cluster is free to move the resources elsewhere. You can control the likelihood of this occurring by using stickiness values or location preferences for the resource.
NoteA manual refresh will work with remote nodes only if you first run the following commands:
-
Run the
systemctl stop pacemaker_remote
command on the remote node to stop the node. -
Run the
pcs resource disable remote-connection-resource
command.
You can then perform a manual refresh on the remote node.
-
Run the
shutdown-lock-limit
When this cluster property is set to a time other than the default value of 0, resources will be available for recovery on other nodes if the node does not rejoin within the specified time since the shutdown was initiated.
NoteThe
shutdown-lock-limit
property will work with remote nodes only if you first run the following commands:-
Run the
systemctl stop pacemaker_remote
command on the remote node to stop the node. -
Run the
pcs resource disable remote-connection-resource
command.
After you run these commands, the resources that had been running on the remote node will be available for recovery on other nodes when the amount of time specified as the
shutdown-lock-limit
has passed.-
Run the
23.2. Setting the shutdown-lock cluster property
The following example sets the shutdown-lock
cluster property to true
in an example cluster and shows the effect this has when the node is shut down and started again. This example cluster consists of three nodes: z1.example.com
, z2.example.com
, and z3.example.com
.
Procedure
Set the
shutdown-lock
property to totrue
and verify its value. In this example theshutdown-lock-limit
property maintains its default value of 0.[root@z3 ~]# pcs property set shutdown-lock=true [root@z3 ~]# pcs property list --all | grep shutdown-lock shutdown-lock: true shutdown-lock-limit: 0
Check the status of the cluster. In this example, resources
third
andfifth
are running onz1.example.com
.[root@z3 ~]# pcs status ... Full List of Resources: ... * first (ocf::pacemaker:Dummy): Started z3.example.com * second (ocf::pacemaker:Dummy): Started z2.example.com * third (ocf::pacemaker:Dummy): Started z1.example.com * fourth (ocf::pacemaker:Dummy): Started z2.example.com * fifth (ocf::pacemaker:Dummy): Started z1.example.com ...
Shut down
z1.example.com
, which will stop the resources that are running on that node.[root@z3 ~] # pcs cluster stop z1.example.com Stopping Cluster (pacemaker)... Stopping Cluster (corosync)...
Running the
pcs status
command shows that nodez1.example.com
is offline and that the resources that had been running onz1.example.com
areLOCKED
while the node is down.[root@z3 ~]# pcs status ... Node List: * Online: [ z2.example.com z3.example.com ] * OFFLINE: [ z1.example.com ] Full List of Resources: ... * first (ocf::pacemaker:Dummy): Started z3.example.com * second (ocf::pacemaker:Dummy): Started z2.example.com * third (ocf::pacemaker:Dummy): Stopped z1.example.com (LOCKED) * fourth (ocf::pacemaker:Dummy): Started z3.example.com * fifth (ocf::pacemaker:Dummy): Stopped z1.example.com (LOCKED) ...
Start cluster services again on
z1.example.com
so that it rejoins the cluster. Locked resources should get started on that node, although once they start they will not not necessarily remain on the same node.[root@z3 ~]# pcs cluster start z1.example.com Starting Cluster...
In this example, resouces
third
andfifth
are recovered on nodez1.example.com
.[root@z3 ~]# pcs status ... Node List: * Online: [ z1.example.com z2.example.com z3.example.com ] Full List of Resources: .. * first (ocf::pacemaker:Dummy): Started z3.example.com * second (ocf::pacemaker:Dummy): Started z2.example.com * third (ocf::pacemaker:Dummy): Started z1.example.com * fourth (ocf::pacemaker:Dummy): Started z3.example.com * fifth (ocf::pacemaker:Dummy): Started z1.example.com ...