Chapter 31. Performing cluster maintenance
To perform maintenance on cluster nodes, you might need to stop or move the resources and services running on the cluster. In some cases, you can stop the cluster software without affecting services. Pacemaker provides several methods to support system maintenance.
31.1. Putting a node into standby mode Copy linkLink copied to clipboard!
When a cluster node is in standby mode, the node is no longer able to host resources. Any resources currently active on the node will be moved to another node.
The following command puts the specified node into standby mode. If you specify the --all option, this command puts all nodes into standby mode.
You can use this command when updating a resource’s packages. You can also use this command when testing a configuration, to simulate recovery without actually shutting down a node.
Procedure
Put the specified node into standby mode:
# pcs node standby node | --allRemove the specified node from standby mode. After running this command, the specified node is then able to host resources. If you specify the
--alloption, this command removes all nodes from standby mode:# pcs node unstandby node | --allNote that when you execute the
pcs node standbycommand, this prevents resources from running on the indicated node. When you execute thepcs node unstandbycommand, this allows resources to run on the indicated node. This does not necessarily move the resources back to the indicated node; where the resources can run at that point depends on how you have configured your resources initially.
31.2. Manually moving cluster resources Copy linkLink copied to clipboard!
You can override the cluster and force resources to move from their current location. There are two occasions when you would want to do this.
- When a node is under maintenance, and you need to move all resources running on that node to a different node
- When individually specified resources needs to be moved
To move all resources running on a node to a different node, you put the node in standby mode. For information about putting a cluster node in standby node, see Putting a node in standby mode.
You can move individually specified resources in either of the following ways.
-
You can use the
pcs resource movecommand to move a resource off a node on which it is currently running, as described in Moving a resource from its current node. -
You can use the
pcs resource relocate runcommand to move a resource to its preferred node, as determined by current cluster status, constraints, location of resources and other settings. For information about this command, see Moving a resource to its preferred node.
Moving a resource from its current node
To move a resource off the node on which it is currently running, use the following command, specifying the resource_id of the resource as defined. Specify the destination_node if you want to indicate on which node to run the resource that you are moving.
# pcs resource move resource_id [destination_node] [--promoted] [--strict] [--wait[=n]]
When you execute the pcs resource move command, this adds a constraint to the resource to prevent it from running on the node on which it is currently running. By default, the location constraint that the command creates is automatically removed once the resource has been moved. If removing the constraint would cause the resource to move back to the original node, as might happen if the resource-stickiness value for the resource is 0, the pcs resource move command fails. If you would like to move a resource and leave the resulting constraint in place, use the pcs resource move-with-constraint command.
-
If you specify the
--promotedparameter of thepcs resource movecommand, the constraint applies only to promoted instances of the resource. -
If you specify the
--strictparameter of thepcs resource movecommand, the command will fail if other resources than the one specified in the command would be affected. You can optionally configure a
--wait[=n]parameter for thepcs resource movecommand to indicate the number of seconds to wait for the resource to start on the destination node before returning 0 if the resource is started or 1 if the resource has not yet started. If you do not specify n, it defaults to a value of 60 minutes.For more information about location constraints, see Determining which nodes a resource can run on.
Moving a resource to its preferred node
After a resource has moved, either due to a failover or to an administrator manually moving the node, it will not necessarily move back to its original node even after the circumstances that caused the failover have been corrected. To relocate resources to their preferred node, use the following command. A preferred node is determined by the current cluster status, constraints, resource location, and other settings and may change over time.
# pcs resource relocate run [resource1] [resource2] ...
If you do not specify any resources, all resource are relocated to their preferred nodes.
This command calculates the preferred node for each resource while ignoring resource stickiness. After calculating the preferred node, it creates location constraints which will cause the resources to move to their preferred nodes. Once the resources have been moved, the constraints are deleted automatically. To remove all constraints created by the pcs resource relocate run command, you can enter the pcs resource relocate clear command. To display the current status of resources and their optimal node ignoring resource stickiness, enter the pcs resource relocate show command.
31.3. Disabling, enabling, and banning cluster resources Copy linkLink copied to clipboard!
In addition to the pcs resource move and pcs resource relocate commands, there are a variety of other commands you can use to control the behavior of cluster resources.
31.3.1. Disabling a cluster resource Copy linkLink copied to clipboard!
Stop a resource and prevent the cluster from restarting it. Constraints or failures may keep the resource active. Use --wait=n to pause until the resource stops (returns 0) or the timeout expires (returns 1). The default timeout is 60 minutes.
Simulating disabling a resource
Ensuring that disabling a resource would not have an effect on other resources can be impossible to do by hand when complex resource relations are set up. To determine what effect disabling a resource will have on other resources, use the pcs resource disable --simulate command to show the effects of disabling a resource while not changing the cluster configuration.
Safely disabling resources
You can specify that a resource be disabled only if disabling the resource would not have an effect on other resources.
-
The
pcs resource disable --safecommand disables a resource only if no other resources would be affected in any way, such as being migrated from one node to another. Thepcs resource safe-disablecommand is an alias for thepcs resource disable --safecommand. -
The
pcs resource disable --safe --no-strictcommand disables a resource only if no other resources would be stopped or demoted.
Determining the resource IDs of affected resources
The error report that the pcs resource disable --safe command generates if the safe disable operation fails contains the affected resource IDs. If you need to know only the resource IDs of resources that would be affected by disabling a resource, use the --brief option for the pcs resource disable --safe command, which does not provide the full simulation result and prints errors only.
Procedure
Stop a running resource and prevent the cluster from starting it again:
# pcs resource disable resource_id [--wait[=n]]
31.3.2. Enabling a cluster resource Copy linkLink copied to clipboard!
Enable a resource to allow the cluster to start it. Depending on the configuration, the resource might remain stopped. Use --wait=n to pause until the resource starts (returns 0) or the timeout expires (returns 1). The default timeout is 60 minutes.
Procedure
Use the following command to allow the cluster to start a resource:
pcs resource enable resource_id [--wait[=n]]
31.3.3. Preventing a resource from running on a particular node Copy linkLink copied to clipboard!
You can prevent a resource from running on a specified node, or on the current node if no node is specified.
Procedure
Prevent a resource from running on a specified node, or on the current node if no node is specified:
# pcs resource ban resource_id [node] [--promoted] [lifetime=lifetime] [--wait[=n]]NoteWhen you execute the
pcs resource bancommand, this adds a -INFINITY location constraint to the resource to prevent it from running on the indicated node. You can execute thepcs resource clearor thepcs constraint deletecommand to remove the constraint. This does not necessarily move the resources back to the indicated node; where the resources can run at that point depends on how you have configured your resources initially. For information about resource constraints, see Determining which nodes a resource can run on.-
If you specify the
--promotedparameter of thepcs resource bancommand, the scope of the constraint is limited to the promoted role and you must specify promotable_id rather than resource_id. -
You can optionally configure a
lifetimeparameter for thepcs resource bancommand to indicate a period of time the constraint should remain. -
You can optionally configure a
--wait[=n]parameter for thepcs resource bancommand to indicate the number of seconds to wait for the resource to start on the destination node before returning 0 if the resource is started or 1 if the resource has not yet started. If you do not specify n, the default resource timeout is used.
31.3.4. Forcing a resource to start on the current node Copy linkLink copied to clipboard!
Use pcs resource debug-start to force a resource to start on the current node for debugging. This command prints the output and ignores cluster recommendations. Do not use this for normal operations; Pacemaker manages starting cluster resources.
Procedure
Use the
debug-startcommand to force a specified resource to start on the current node:# pcs resource debug-start resource_id
31.4. Setting a resource to unmanaged mode Copy linkLink copied to clipboard!
When a resource is in unmanaged mode, the resource is still in the configuration but Pacemaker does not manage the resource.
Procedure
Set the indicated resources to
unmanagedmode:# pcs resource unmanage resource1 [resource2] ...Set resources to
managedmode, which is the default state:# pcs resource manage resource1 [resource2] ...You can specify the name of a resource group with the
pcs resource manageorpcs resource unmanagecommand. The command will act on all of the resources in the group, so that you can set all of the resources in a group tomanagedorunmanagedmode with a single command and then manage the contained resources individually.
31.5. Putting a cluster in maintenance mode Copy linkLink copied to clipboard!
When a cluster is in maintenance mode, the cluster does not start or stop any services until told otherwise. When maintenance mode is completed, the cluster does a sanity check of the current state of any services, and then stops or starts any that need it.
To put a cluster in maintenance mode, use the following command to set the maintenance-mode cluster property to true.
# pcs property set maintenance-mode=true
To remove a cluster from maintenance mode, use the following command to set the maintenance-mode cluster property to false.
# pcs property set maintenance-mode=false
For general information on setting and removing cluster properties, see Setting and removing cluster properties.
31.6. Updating a RHEL high availability cluster Copy linkLink copied to clipboard!
Updating packages that make up the RHEL High Availability Add-On, either individually or as a whole, can be done in one of two general ways:
- Rolling Updates: Remove one node at a time from service, update its software, then integrate it back into the cluster. This allows the cluster to continue providing service and managing resources while each node is updated.
- Entire Cluster Update: Stop the entire cluster, apply updates to all nodes, then start the cluster back up.
It is critical that when performing software update procedures for Red Hat Enterprise Linux High Availability clusters, you ensure that any node that will undergo updates is not an active member of the cluster before those updates are initiated.
For a full description of each of these methods and the procedures to follow for the updates, see the Red Hat Knowledgebase article Recommended Practices for Applying Software Updates to a RHEL High Availability or Resilient Storage Cluster.
31.7. Upgrading remote nodes and guest nodes Copy linkLink copied to clipboard!
Stopping the pacemaker_remote service on an active node triggers a graceful resource migration, enabling seamless maintenance. However, the cluster attempts to reconnect immediately. If the service does not restart within the monitor timeout, the cluster detects a failure.
To avoid monitor failures when the pacemaker_remote service is stopped on an active Pacemaker Remote node, use the following procedure to take the node out of the cluster before performing any system administration that might stop pacemaker_remote.
Procedure
Stop the node’s connection resource with the
pcs resource disable resourcenamecommand, which will move all services off the node. The connection resource would be theocf:pacemaker:remoteresource for a remote node or, commonly, theocf:heartbeat:VirtualDomainresource for a guest node. For guest nodes, this command will also stop the VM, so the VM must be started outside the cluster (for example, usingvirsh) to perform any maintenance.pcs resource disable resourcename- Perform the required maintenance.
When ready to return the node to the cluster, re-enable the resource with the
pcs resource enablecommand.pcs resource enable resourcename
31.8. Migrating VMs in a RHEL cluster Copy linkLink copied to clipboard!
Red Hat does not support live migration of active cluster nodes. To migrate a VM, stop the cluster services to remove the node from operation, migrate the VM, and then restart the services. For details, see Support Policies for RHEL High Availability Clusters - General Conditions with Virtualized Cluster Members.
The following steps outline the procedure for removing a VM from a cluster, migrating the VM, and restoring the VM to the cluster.
This procedure applies to VMs that are used as full cluster nodes, not to VMs managed as cluster resources (including VMs used as guest nodes) which can be live-migrated without special precautions. For general information about the fuller procedure required for updating packages that make up the RHEL High Availability and Resilient Storage Add-Ons, either individually or as a whole, see the Red Hat Knowledgebase article Recommended Practices for Applying Software Updates to a RHEL High Availability or Resilient Storage Cluster.
Before performing this procedure, consider the effect on cluster quorum of removing a cluster node. For example, if you have a three-node cluster and you remove one node, your cluster cannot withstand any node failure. This is because if one node of a three-node cluster is already down, removing a second node will lose quorum.
Procedure
- If any preparations need to be made before stopping or moving the resources or software running on the VM to migrate, perform those steps.
Run the following command on the VM to stop the cluster software on the VM.
# pcs cluster stop- Perform the live migration of the VM.
Start cluster services on the VM.
# pcs cluster start
31.9. Identifying clusters by UUID Copy linkLink copied to clipboard!
When you create a cluster it has an associated UUID. Since a cluster name is not a unique cluster identifier, a third-party tool such as a configuration management database that manages multiple clusters with the same name can uniquely identify a cluster by means of its UUID. You can display the current cluster UUID with the pcs cluster config [show] command, which includes the cluster UUID in its output.
Procedure
Add a UUID to an existing cluster:
# pcs cluster config uuid generateRegenerate a UUID for a cluster with an existing UUID:
# pcs cluster config uuid generate --force
31.10. Renaming a cluster Copy linkLink copied to clipboard!
You can change the name of an existing cluster using the pcs cluster rename command.
Procedure
To rename your cluster, run the
pcs cluster renamecommand from any cluster node. Replace<new-name>with the new name you want to assign to the cluster:# pcs cluster rename <new-name>