Chapter 31. Performing cluster maintenance


To perform maintenance on cluster nodes, you might need to stop or move the resources and services running on the cluster. In some cases, you can stop the cluster software without affecting services. Pacemaker provides several methods to support system maintenance.

31.1. Putting a node into standby mode

When a cluster node is in standby mode, the node is no longer able to host resources. Any resources currently active on the node will be moved to another node.

The following command puts the specified node into standby mode. If you specify the --all option, this command puts all nodes into standby mode.

You can use this command when updating a resource’s packages. You can also use this command when testing a configuration, to simulate recovery without actually shutting down a node.

Procedure

  • Put the specified node into standby mode:

    # pcs node standby node | --all
  • Remove the specified node from standby mode. After running this command, the specified node is then able to host resources. If you specify the --all option, this command removes all nodes from standby mode:

    # pcs node unstandby node | --all

    Note that when you execute the pcs node standby command, this prevents resources from running on the indicated node. When you execute the pcs node unstandby command, this allows resources to run on the indicated node. This does not necessarily move the resources back to the indicated node; where the resources can run at that point depends on how you have configured your resources initially.

31.2. Manually moving cluster resources

You can override the cluster and force resources to move from their current location. There are two occasions when you would want to do this.

  • When a node is under maintenance, and you need to move all resources running on that node to a different node
  • When individually specified resources needs to be moved

To move all resources running on a node to a different node, you put the node in standby mode. For information about putting a cluster node in standby node, see Putting a node in standby mode.

You can move individually specified resources in either of the following ways.

  • You can use the pcs resource move command to move a resource off a node on which it is currently running, as described in Moving a resource from its current node.
  • You can use the pcs resource relocate run command to move a resource to its preferred node, as determined by current cluster status, constraints, location of resources and other settings. For information about this command, see Moving a resource to its preferred node.

Moving a resource from its current node

To move a resource off the node on which it is currently running, use the following command, specifying the resource_id of the resource as defined. Specify the destination_node if you want to indicate on which node to run the resource that you are moving.

# pcs resource move resource_id [destination_node] [--promoted] [--strict] [--wait[=n]]

When you execute the pcs resource move command, this adds a constraint to the resource to prevent it from running on the node on which it is currently running. By default, the location constraint that the command creates is automatically removed once the resource has been moved. If removing the constraint would cause the resource to move back to the original node, as might happen if the resource-stickiness value for the resource is 0, the pcs resource move command fails. If you would like to move a resource and leave the resulting constraint in place, use the pcs resource move-with-constraint command.

  • If you specify the --promoted parameter of the pcs resource move command, the constraint applies only to promoted instances of the resource.
  • If you specify the --strict parameter of the pcs resource move command, the command will fail if other resources than the one specified in the command would be affected.
  • You can optionally configure a --wait[=n] parameter for the pcs resource move command to indicate the number of seconds to wait for the resource to start on the destination node before returning 0 if the resource is started or 1 if the resource has not yet started. If you do not specify n, it defaults to a value of 60 minutes.

    For more information about location constraints, see Determining which nodes a resource can run on.

Moving a resource to its preferred node

After a resource has moved, either due to a failover or to an administrator manually moving the node, it will not necessarily move back to its original node even after the circumstances that caused the failover have been corrected. To relocate resources to their preferred node, use the following command. A preferred node is determined by the current cluster status, constraints, resource location, and other settings and may change over time.

# pcs resource relocate run [resource1] [resource2] ...

If you do not specify any resources, all resource are relocated to their preferred nodes.

This command calculates the preferred node for each resource while ignoring resource stickiness. After calculating the preferred node, it creates location constraints which will cause the resources to move to their preferred nodes. Once the resources have been moved, the constraints are deleted automatically. To remove all constraints created by the pcs resource relocate run command, you can enter the pcs resource relocate clear command. To display the current status of resources and their optimal node ignoring resource stickiness, enter the pcs resource relocate show command.

In addition to the pcs resource move and pcs resource relocate commands, there are a variety of other commands you can use to control the behavior of cluster resources.

31.3.1. Disabling a cluster resource

Stop a resource and prevent the cluster from restarting it. Constraints or failures may keep the resource active. Use --wait=n to pause until the resource stops (returns 0) or the timeout expires (returns 1). The default timeout is 60 minutes.

Simulating disabling a resource

Ensuring that disabling a resource would not have an effect on other resources can be impossible to do by hand when complex resource relations are set up. To determine what effect disabling a resource will have on other resources, use the pcs resource disable --simulate command to show the effects of disabling a resource while not changing the cluster configuration.

Safely disabling resources

You can specify that a resource be disabled only if disabling the resource would not have an effect on other resources.

  • The pcs resource disable --safe command disables a resource only if no other resources would be affected in any way, such as being migrated from one node to another. The pcs resource safe-disable command is an alias for the pcs resource disable --safe command.
  • The pcs resource disable --safe --no-strict command disables a resource only if no other resources would be stopped or demoted.

Determining the resource IDs of affected resources

The error report that the pcs resource disable --safe command generates if the safe disable operation fails contains the affected resource IDs. If you need to know only the resource IDs of resources that would be affected by disabling a resource, use the --brief option for the pcs resource disable --safe command, which does not provide the full simulation result and prints errors only.

Procedure

  • Stop a running resource and prevent the cluster from starting it again:

    # pcs resource disable resource_id [--wait[=n]]

31.3.2. Enabling a cluster resource

Enable a resource to allow the cluster to start it. Depending on the configuration, the resource might remain stopped. Use --wait=n to pause until the resource starts (returns 0) or the timeout expires (returns 1). The default timeout is 60 minutes.

Procedure

  • Use the following command to allow the cluster to start a resource:

    pcs resource enable resource_id [--wait[=n]]

You can prevent a resource from running on a specified node, or on the current node if no node is specified.

Procedure

  • Prevent a resource from running on a specified node, or on the current node if no node is specified:

    # pcs resource ban resource_id [node] [--promoted] [lifetime=lifetime] [--wait[=n]]
    Note

    When you execute the pcs resource ban command, this adds a -INFINITY location constraint to the resource to prevent it from running on the indicated node. You can execute the pcs resource clear or the pcs constraint delete command to remove the constraint. This does not necessarily move the resources back to the indicated node; where the resources can run at that point depends on how you have configured your resources initially. For information about resource constraints, see Determining which nodes a resource can run on.

  • If you specify the --promoted parameter of the pcs resource ban command, the scope of the constraint is limited to the promoted role and you must specify promotable_id rather than resource_id.
  • You can optionally configure a lifetime parameter for the pcs resource ban command to indicate a period of time the constraint should remain.
  • You can optionally configure a --wait[=n] parameter for the pcs resource ban command to indicate the number of seconds to wait for the resource to start on the destination node before returning 0 if the resource is started or 1 if the resource has not yet started. If you do not specify n, the default resource timeout is used.

Use pcs resource debug-start to force a resource to start on the current node for debugging. This command prints the output and ignores cluster recommendations. Do not use this for normal operations; Pacemaker manages starting cluster resources.

Procedure

  • Use the debug-start command to force a specified resource to start on the current node:

    # pcs resource debug-start resource_id

31.4. Setting a resource to unmanaged mode

When a resource is in unmanaged mode, the resource is still in the configuration but Pacemaker does not manage the resource.

Procedure

  • Set the indicated resources to unmanaged mode:

    # pcs resource unmanage resource1 [resource2] ...
  • Set resources to managed mode, which is the default state:

    # pcs resource manage resource1 [resource2] ...

    You can specify the name of a resource group with the pcs resource manage or pcs resource unmanage command. The command will act on all of the resources in the group, so that you can set all of the resources in a group to managed or unmanaged mode with a single command and then manage the contained resources individually.

31.5. Putting a cluster in maintenance mode

When a cluster is in maintenance mode, the cluster does not start or stop any services until told otherwise. When maintenance mode is completed, the cluster does a sanity check of the current state of any services, and then stops or starts any that need it.

To put a cluster in maintenance mode, use the following command to set the maintenance-mode cluster property to true.

# pcs property set maintenance-mode=true

To remove a cluster from maintenance mode, use the following command to set the maintenance-mode cluster property to false.

# pcs property set maintenance-mode=false

For general information on setting and removing cluster properties, see Setting and removing cluster properties.

31.6. Updating a RHEL high availability cluster

Updating packages that make up the RHEL High Availability Add-On, either individually or as a whole, can be done in one of two general ways:

  • Rolling Updates: Remove one node at a time from service, update its software, then integrate it back into the cluster. This allows the cluster to continue providing service and managing resources while each node is updated.
  • Entire Cluster Update: Stop the entire cluster, apply updates to all nodes, then start the cluster back up.
Warning

It is critical that when performing software update procedures for Red Hat Enterprise Linux High Availability clusters, you ensure that any node that will undergo updates is not an active member of the cluster before those updates are initiated.

For a full description of each of these methods and the procedures to follow for the updates, see the Red Hat Knowledgebase article Recommended Practices for Applying Software Updates to a RHEL High Availability or Resilient Storage Cluster.

31.7. Upgrading remote nodes and guest nodes

Stopping the pacemaker_remote service on an active node triggers a graceful resource migration, enabling seamless maintenance. However, the cluster attempts to reconnect immediately. If the service does not restart within the monitor timeout, the cluster detects a failure.

To avoid monitor failures when the pacemaker_remote service is stopped on an active Pacemaker Remote node, use the following procedure to take the node out of the cluster before performing any system administration that might stop pacemaker_remote.

Procedure

  1. Stop the node’s connection resource with the pcs resource disable resourcename command, which will move all services off the node. The connection resource would be the ocf:pacemaker:remote resource for a remote node or, commonly, the ocf:heartbeat:VirtualDomain resource for a guest node. For guest nodes, this command will also stop the VM, so the VM must be started outside the cluster (for example, using virsh) to perform any maintenance.

    pcs resource disable resourcename
  2. Perform the required maintenance.
  3. When ready to return the node to the cluster, re-enable the resource with the pcs resource enable command.

    pcs resource enable resourcename

31.8. Migrating VMs in a RHEL cluster

Red Hat does not support live migration of active cluster nodes. To migrate a VM, stop the cluster services to remove the node from operation, migrate the VM, and then restart the services. For details, see Support Policies for RHEL High Availability Clusters - General Conditions with Virtualized Cluster Members.

The following steps outline the procedure for removing a VM from a cluster, migrating the VM, and restoring the VM to the cluster.

This procedure applies to VMs that are used as full cluster nodes, not to VMs managed as cluster resources (including VMs used as guest nodes) which can be live-migrated without special precautions. For general information about the fuller procedure required for updating packages that make up the RHEL High Availability and Resilient Storage Add-Ons, either individually or as a whole, see the Red Hat Knowledgebase article Recommended Practices for Applying Software Updates to a RHEL High Availability or Resilient Storage Cluster.

Note

Before performing this procedure, consider the effect on cluster quorum of removing a cluster node. For example, if you have a three-node cluster and you remove one node, your cluster cannot withstand any node failure. This is because if one node of a three-node cluster is already down, removing a second node will lose quorum.

Procedure

  1. If any preparations need to be made before stopping or moving the resources or software running on the VM to migrate, perform those steps.
  2. Run the following command on the VM to stop the cluster software on the VM.

    # pcs cluster stop
  3. Perform the live migration of the VM.
  4. Start cluster services on the VM.

    # pcs cluster start

31.9. Identifying clusters by UUID

When you create a cluster it has an associated UUID. Since a cluster name is not a unique cluster identifier, a third-party tool such as a configuration management database that manages multiple clusters with the same name can uniquely identify a cluster by means of its UUID. You can display the current cluster UUID with the pcs cluster config [show] command, which includes the cluster UUID in its output.

Procedure

  • Add a UUID to an existing cluster:

    # pcs cluster config uuid generate
  • Regenerate a UUID for a cluster with an existing UUID:

    # pcs cluster config uuid generate --force

31.10. Renaming a cluster

You can change the name of an existing cluster using the pcs cluster rename command.

Procedure

  • To rename your cluster, run the pcs cluster rename command from any cluster node. Replace <new-name> with the new name you want to assign to the cluster:

    # pcs cluster rename <new-name>
Red Hat logoGithubredditYoutubeTwitter

Learn

Try, buy, & sell

Communities

About Red Hat Documentation

We help Red Hat users innovate and achieve their goals with our products and services with content they can trust. Explore our recent updates.

Making open source more inclusive

Red Hat is committed to replacing problematic language in our code, documentation, and web properties. For more details, see the Red Hat Blog.

About Red Hat

We deliver hardened solutions that make it easier for enterprises to work across platforms and environments, from the core datacenter to the network edge.

Theme

© 2026 Red Hat
Back to top