Configuring and managing high availability clusters
Using the Red Hat High Availability Add-On to create and maintain Pacemaker clusters
Abstract
Providing feedback on Red Hat documentation Copy linkLink copied to clipboard!
We are committed to providing high-quality documentation and value your feedback. To help us improve, you can submit suggestions or report errors through the Red Hat Jira tracking system.
Procedure
Log in to the Jira website.
If you do not have an account, select the option to create one.
- Click Create in the top navigation bar.
- Enter a descriptive title in the Summary field.
- Enter your suggestion for improvement in the Description field. Include links to the relevant parts of the documentation.
- Click Create at the bottom of the window.
Chapter 1. High Availability Add-On overview Copy linkLink copied to clipboard!
The High Availability Add-On is a clustered system that provides reliability, scalability, and availability to critical production services.
High availability clusters, sometimes called failover clusters, provide highly available services by eliminating single points of failure and by failing over services from one cluster node to another in case a node becomes inoperative. Typically, services in a high availability cluster read and write data by means of read-write mounted file systems. A high availability cluster must maintain data integrity as one cluster node takes over control of a service from another cluster node. Node failures in a high availability cluster are not visible from clients outside the cluster. The High Availability Add-On provides high availability clustering through its high availability service management component, Pacemaker.
Pacemaker is the cluster resource manager for the High Availability Add-On. It achieves maximum availability for your cluster services and resources by making use of the cluster infrastructure’s messaging and membership capabilities to deter and recover from node and resource-level failure.
Red Hat provides a variety of documentation for planning, configuring, and maintaining a Red Hat high availability cluster. For a listing of articles that provide guided indexes to the various areas of Red Hat cluster documentation, see the Red Hat Knowledgebase article Red Hat High Availability Add-On Documentation Guide.
1.1. Pacemaker architecture components Copy linkLink copied to clipboard!
A cluster configured with Pacemaker comprises separate component daemons that monitor cluster membership, scripts that manage the services, and resource management subsystems that monitor the disparate resources.
The following components form the Pacemaker architecture:
- Cluster Information Base (CIB)
- The Pacemaker information daemon, which uses XML internally to distribute and synchronize current configuration and status information from the Designated Coordinator (DC) - a node assigned by Pacemaker to store and distribute cluster state and actions by means of the CIB - to all other cluster nodes.
- Cluster Resource Management Daemon (CRMd)
Pacemaker cluster resource actions are routed through this daemon. Resources managed by CRMd can be queried by client systems, moved, instantiated, and changed when needed.
Each cluster node also includes a local resource manager daemon (LRMd) that acts as an interface between CRMd and resources. LRMd passes commands from CRMd to agents, such as starting and stopping and relaying status information.
- Shoot the Other Node in the Head (STONITH)
- STONITH is the Pacemaker fencing implementation. It acts as a cluster resource in Pacemaker that processes fence requests, forcefully shutting down nodes and removing them from the cluster to ensure data integrity. STONITH is configured in the CIB and can be monitored as a normal cluster resource.
- corosync
corosyncis the component and daemon of the same name that serves the core membership and member-communication needs for high availability clusters. It is required for the High Availability Add-On to function.In addition to those membership and messaging functions,
corosyncalso:- Manages quorum rules and determination.
- Provides messaging capabilities for applications that coordinate or operate across multiple members of the cluster and thus must communicate stateful or other information between instances.
-
Uses the
kronosnetlibrary as its network transport to provide multiple redundant links and automatic failover.
1.2. Pacemaker configuration and management tools Copy linkLink copied to clipboard!
The High Availability Add-On features three configuration tools for cluster deployment, monitoring, and management.
pcscommand-line interfaceThe
pcscommand-line interface controls and configures Pacemaker and thecorosyncheartbeat daemon. A command-line based program,pcscan perform the following cluster management tasks:- Create and configure a Pacemaker cluster
- Modify configuration of the cluster while it is running
- Start, stop, and display status information of the cluster
- HA Cluster Management RHEL web console add-on
-
The HA Cluster Management RHEL web console add-on is a graphical user interface to create and configure Pacemaker clusters. The HA Cluster Management RHEL web console add-on is available through the RHEL web console when the
cockpit-ha-clusterpackage is installed. For information about the RHEL web console, see Getting started with the HA Cluster Management add-on for the RHEL web console. ha_clusterRHEL system role-
With the
ha_clusterRHEL system role, you can configure and manage a high-availability cluster that uses the Pacemaker high availability cluster resource manager. For information about using RHEL system roles, see Automating system administration by using RHEL system roles.
1.3. The cluster and Pacemaker configuration files Copy linkLink copied to clipboard!
The configuration files for the Red Hat High Availability Add-On are corosync.conf and cib.xml.
-
The
corosync.conffile provides the cluster parameters used bycorosync, the cluster manager that Pacemaker is built on. In general, you should not edit thecorosync.confdirectly but, instead, use thepcsinterface, the HA Cluster Management RHEL web console add-on, or theha_clusterRHEL system role. -
The
cib.xmlfile is an XML file that represents both the cluster’s configuration and the current state of all resources in the cluster. This file is used by Pacemaker’s Cluster Information Base (CIB). The contents of the CIB are automatically kept in sync across the entire cluster. Do not edit thecib.xmlfile directly; use thepcsinterface, the HA Cluster Management RHEL web console add-on, or theha_clusterRHEL system role.
Chapter 2. Getting started with the HA Cluster Management add-on for the RHEL web console Copy linkLink copied to clipboard!
The HA Cluster Management RHEL web console add-on is a graphical user interface to create and configure Pacemaker clusters. The HA Cluster Management RHEL web console add-on is available through the RHEL web console when the cockpit-ha-cluster package is installed.
Previous releases of Red Hat Enterprise Linux utilized the pcsd Web UI as the standalone graphical user interface for HA cluster configuration. This interface has been modified to be usable as a RHEL web console add-on and is no longer operated as a standalone interface.
2.1. Installing and enabling the HA Cluster Management add-on for the RHEL web console Copy linkLink copied to clipboard!
To use the HA Cluster Management add-on to configure a high availability cluster, add the HA Cluster Management application to the RHEL web console and install and enable the necessary Red Hat High Availability Add-On software packages and services on each node in your cluster.
Prerequisites
You have installed the RHEL 10 web console.
For instructions, see Installing and enabling the web console.
Procedure
- From the system on which you are running the RHEL web console, log in to the console and install the HA Cluster Management add-on application. See Add-on applications for the RHEL web console in the "Managing systems in the RHEL web console" document for details.
On each cluster node, install the Red Hat High Availability fence agents from the High Availability channel.
# dnf install fence-agents-allYou can install only the fence agent that you require with the following command.
# dnf install fence-agents-modelOn each cluster node, ensure that the
pcsdservice is running.# systemctl status pcsd.serviceIf the
pcsdservice is not running on a cluster node, enter the following command to start thepcsdservice and to enable it at system start.# systemctl enable --now pcsd.serviceEnsure you are logged in to the RHEL web console. To use the RHEL web console to create clusters, the user account used to sign in to the web console must have sudo access to the system.
NoteThe
haclusteruser account is the Pacemaker service account and you cannot use this account to log in to the RHEL web console.In the RHEL web console, switch to administrative access mode. For information about administrative access mode, see Administrative access in the web console in the "Managing systems in the RHEL web console" document.
NoteOnly a user with sudo access can create clusters and add nodes to existing ones. After you create a cluster, by default, users in the
haclientgroup can manage the cluster and change permissions. For information about granting different permissions to any other users and groups that require them, or for modifying the defaulthaclientpermissions, see Granting HA Cluster Management permissions.
2.2. Granting HA Cluster Management permissions Copy linkLink copied to clipboard!
Each cluster can have a different set of permissions used for its administration. A user with administrative access or full permissions can grant full permissions to other users and groups for the HA Cluster Management web console add-on.
The following table summarizes the cluster management permissions you can grant for the HA Cluster Management web console add-on.
| Permission | Allowed administrative task |
|---|---|
| Read | Viewing cluster settings. |
| Write | Modifying all cluster settings except permissions and ACLs. Does not allow adding nodes and creating clusters. |
| Grant | Modifying ACLs and granting read, write, and grant permissions. |
| Full | Performing all cluster management except adding nodes or creating clusters. |
Prerequisites
- You have installed and enabled the HA Cluster Management add-on for the RHEL web console, as described in Installing and enabling the HA Cluster Management add-on for the RHEL web console.
- You have created a cluster for which you want to manage permissions and it has been added to the cluster list in the HA Cluster Management add-on.
Procedure
Log in to the RHEL web console with an account that has sudo access to the system and ensure that you are in administrative access mode. For information about administrative access mode, see Administrative access in the web console in the "Managing systems in the RHEL web console" document.
Alternatively, log into the RHEL web console with an account that has grant permissions for the cluster you want to manage. The account must be a member of the
haclientgroup to see the HA Cluster Management web console add-on in limited access mode.- Select a cluster for which you want to manage permissions from the cluster list.
- In the cluster detail, click the Permissions tab on the top of the page and select .
- Add, remove, or edit the permissions for a user or group.
-
By default, any user with an account that is a member of the
haclientgroup has read, write and grant permissions. From the Permissions page you can remove this permission if you have administrative access in the web console or if you have grant permissions.
Chapter 3. Creating a Red Hat High-Availability cluster with Pacemaker Copy linkLink copied to clipboard!
To create a Red Hat High Availability two-node cluster use the pcs command-line interface.
3.1. Prerequisites Copy linkLink copied to clipboard!
-
2 RHEL nodes, which will be used to create the cluster. In this example, the nodes used are
z1.example.comandz2.example.com. - Network switches for the private network. We recommend but do not require a private network for communication among the cluster nodes and other cluster hardware such as network power switches and Fibre Channel switches.
-
A fencing device for each node of the cluster. This example uses two ports of the APC power switch with a host name of
zapc.example.com.
You must ensure that your configuration conforms to Red Hat’s support policies. For full information about Red Hat’s support policies, requirements, and limitations for RHEL High Availability clusters, see the Red Hat Knowledgebase article Support Policies for RHEL High Availability Clusters.
3.2. Installing cluster software Copy linkLink copied to clipboard!
Install the cluster software and configure your system for cluster creation.
Procedure
On each node in the cluster, enable the repository for high availability that corresponds to your system architecture. For example, to enable the high availability repository for an x86_64 system, you can enter the following
subscription-managercommand:# subscription-manager repos --enable=rhel-10-for-x86_64-highavailability-rpmsOn each node in the cluster, install the Red Hat High Availability Add-On software packages along with all available fence agents from the High Availability channel.
# dnf install pcs pacemaker fence-agents-allAlternatively, you can install the Red Hat High Availability Add-On software packages along with only the fence agent that you require with the following command.
# dnf install pcs pacemaker fence-agents-modelThe following command displays a list of the available fence agents.
# rpm -q -a | grep fence fence-agents-rhevm-4.0.2-3.el7.x86_64 fence-agents-ilo-mp-4.0.2-3.el7.x86_64 fence-agents-ipmilan-4.0.2-3.el7.x86_64 ...WarningAfter you install the Red Hat High Availability Add-On packages, you should ensure that your software update preferences are set so that nothing is installed automatically. Installation on a running cluster can cause unexpected behaviors. For more information, see the Red Hat Knowledgebase article Recommended Practices for Applying Software Updates to a RHEL High Availability or Resilient Storage Cluster.
If you are running the
firewallddaemon, execute the following commands to enable the ports that are required by the Red Hat High Availability Add-On.NoteYou can determine whether the
firewallddaemon is installed on your system with therpm -q firewalldcommand. If it is installed, you can determine whether it is running with thefirewall-cmd --statecommand.# firewall-cmd --permanent --add-service=high-availability # firewall-cmd --add-service=high-availabilityNoteThe ideal firewall configuration for cluster components depends on the local environment, where you may need to take into account such considerations as whether the nodes have multiple network interfaces or whether off-host firewalling is present. The example here, which opens the ports that are generally required by a Pacemaker cluster, should be modified to suit local conditions. Enabling ports for the High Availability Add-On shows the ports to enable for the Red Hat High Availability Add-On and provides an explanation for what each port is used for.
In order to use
pcsto configure the cluster and communicate among the nodes, you must set a password on each node for the user IDhacluster, which is thepcsadministration account. It is recommended that the password for userhaclusterbe the same on each node.# passwd hacluster Changing password for user hacluster. New password: Retype new password: passwd: all authentication tokens updated successfully.Before the cluster can be configured, the
pcsddaemon must be started and enabled to start up on boot on each node. This daemon works with thepcscommand to manage configuration across the nodes in the cluster.On each node in the cluster, execute the following commands to start the
pcsdservice and to enablepcsdat system start.# systemctl start pcsd.service # systemctl enable pcsd.service
3.3. Installing the pcp-zeroconf package (recommended) Copy linkLink copied to clipboard!
Install the pcp-zeroconf package to configure Performance Co-Pilot (PCP). PCP collects performance data essential for troubleshooting fencing, resource failures, and cluster disruptions.
Cluster deployments where PCP is enabled will need sufficient space available for PCP’s captured data on the file system that contains /var/log/pcp/. Typical space usage by PCP varies across deployments, but 10Gb is usually sufficient when using the pcp-zeroconf default settings, and some environments may require less. Monitoring usage in this directory over a 14-day period of typical activity can provide a more accurate usage expectation.
Procedure
To install the
pcp-zeroconfpackage, run the following command:# dnf install pcp-zeroconfThis package enables
pmcdand sets up data capture at a 10-second interval.
3.4. Creating a high availability cluster Copy linkLink copied to clipboard!
You can create a Red Hat High Availability Add-On cluster. This example uses nodes z1.example.com and z2.example.com.
To display the parameters of a pcs command and a description of those parameters, use the -h option of the pcs command.
Prerequisites
- You have created a Red Hat account
Procedure
Authenticate the
pcsuserhaclusterfor each node in the cluster on the node from which you will be runningpcs.The following command authenticates user
haclusteronz1.example.comfor both of the nodes in a two-node cluster that will consist ofz1.example.comandz2.example.com.[root@z1 ~]# pcs host auth z1.example.com z2.example.com Username: hacluster Password: z1.example.com: Authorized z2.example.com: AuthorizedExecute the following command from
z1.example.comto create the two-node clustermy_clusterthat consists of nodesz1.example.comandz2.example.com. This will propagate the cluster configuration files to both nodes in the cluster. This command includes the--startoption, which will start the cluster services on both nodes in the cluster.[root@z1 ~]# pcs cluster setup my_cluster --start z1.example.com z2.example.comEnable the cluster services to run on each node in the cluster when the node is booted.
NoteFor your particular environment, you can skip this step by keeping the cluster services disabled. If enabled and a node goes down, any issues with your cluster or your resources are resolved before the node rejoins the cluster. If you keep the cluster services disabled, you need to manually start the services when you reboot a node by executing the
pcs cluster startcommand on that node.[root@z1 ~]# pcs cluster enable --allDisplay the status of the cluster you created with the
pcs cluster statuscommand. Because there could be a slight delay before the cluster is up and running when you start the cluster services with the--startoption of thepcs cluster setupcommand, you should ensure that the cluster is up and running before performing any subsequent actions on the cluster and its configuration.[root@z1 ~]# pcs cluster status Cluster Status: Stack: corosync Current DC: z2.example.com (version 2.0.0-10.el8-b67d8d0de9) - partition with quorum Last updated: Thu Oct 11 16:11:18 2018 Last change: Thu Oct 11 16:11:00 2018 by hacluster via crmd on z2.example.com 2 Nodes configured 0 Resources configured ...
3.5. Creating a high availability cluster with multiple links Copy linkLink copied to clipboard!
You create a Red Hat High Availability cluster with multiple links by specifying all of the links for each node with the pcs cluster setup command.
The format for the basic command to create a two-node cluster with two links is as follows.
# pcs cluster setup pass:quotes[cluster_name] pass:quotes[node1_name] addr=pass:quotes[node1_link0_address] addr=pass:quotes[node1_link1_address] pass:quotes[node2_name] addr=pass:quotes[node2_link0_address] addr=pass:quotes[node2_link1_address]
For the full syntax of this command, see the pcs(8) man page on your system.
When creating a cluster with multiple links, you take the following caveats into account.
-
The order of the
addr=addressparameters is important. The first address specified after a node name is forlink0, the second one forlink1, and so forth. -
By default, if
link_priorityis not specified for a link, the link’s priority is equal to the link number. The link priorities are then 0, 1, 2, 3, and so forth, according to the order specified, with 0 being the highest link priority. -
The default link mode is
passive, meaning the active link with the lowest-numbered link priority is used. -
With the default values of
link_modeandlink_priority, the first link specified will be used as the highest priority link, and if that link fails the next link specified will be used. -
It is possible to specify up to eight links using the
knettransport protocol, which is the default transport protocol. -
All nodes must have the same number of
addr=parameters. -
It is possible to add, remove, and change links in an existing cluster using the
pcs cluster link add, thepcs cluster link remove, thepcs cluster link delete, and thepcs cluster link updatecommands. - As with single-link clusters, do not mix IPv4 and IPv6 addresses in one link, although you can have one link running IPv4 and the other running IPv6.
- As with single-link clusters, you can specify addresses as IP addresses or as names as long as the names resolve to IPv4 or IPv6 addresses for which IPv4 and IPv6 addresses are not mixed in one link.
The following example creates a two-node cluster named my_twolink_cluster with two nodes, rh80-node1 and rh80-node2. rh80-node1 has two interfaces, IP address 192.168.122.201 as link0 and 192.168.123.201 as link1. rh80-node2 has two interfaces, IP address 192.168.122.202 as link0 and 192.168.123.202 as link1.
# pcs cluster setup my_twolink_cluster rh80-node1 addr=192.168.122.201 addr=192.168.123.201 rh80-node2 addr=192.168.122.202 addr=192.168.123.202
To set a link priority to a different value than the default value, which is the link number, you can set the link priority with the link_priority option of the pcs cluster setup command. Each of the following two example commands creates a two-node cluster with two interfaces where the first link, link 0, has a link priority of 1 and the second link, link 1, has a link priority of 0. Link 1 will be used first and link 0 will serve as the failover link. Since link mode is not specified, it defaults to passive.
These two commands are equivalent. If you do not specify a link number following the link keyword, the pcs interface automatically adds a link number, starting with the lowest unused link number.
# pcs cluster setup my_twolink_cluster rh80-node1 addr=192.168.122.201 addr=192.168.123.201 rh80-node2 addr=192.168.122.202 addr=192.168.123.202 transport knet link link_priority=1 link link_priority=0
# pcs cluster setup my_twolink_cluster rh80-node1 addr=192.168.122.201 addr=192.168.123.201 rh80-node2 addr=192.168.122.202 addr=192.168.123.202 transport knet link linknumber=1 link_priority=0 link link_priority=1
3.6. Configuring fencing Copy linkLink copied to clipboard!
You must configure a fencing device for each node in the cluster. For information about the fence configuration commands and options, see Configuring fencing in a Red Hat High Availability cluster.
For general information about fencing and its importance in a Red Hat High Availability cluster, see the Red Hat Knowledgebase solution Fencing in a Red Hat High Availability Cluster.
When configuring a fencing device, attention should be given to whether that device shares power with any nodes or devices in the cluster. If a node and its fence device do share power, then the cluster may be at risk of being unable to fence that node if the power to it and its fence device should be lost. Such a cluster should either have redundant power supplies for fence devices and nodes, or redundant fence devices that do not share power. Alternative methods of fencing such as SBD or storage fencing may also bring redundancy in the event of isolated power losses.
This example fence configuration procedure uses the APC power switch with a host name of zapc.example.com to fence the nodes, and it uses the fence_apc_snmp fencing agent. Because both nodes will be fenced by the same fencing agent, you can configure both fencing devices as a single resource, using the pcmk_host_map option.
You create a fencing device by configuring the device as a stonith resource with the pcs stonith create command. The following command configures a stonith resource named myapc that uses the fence_apc_snmp fencing agent for nodes z1.example.com and z2.example.com. The pcmk_host_map option maps z1.example.com to port 1, and z2.example.com to port 2. The login value and password for the APC device are both apc. By default, this device will use a monitor interval of sixty seconds for each node.
Procedure
Create the fencing device. Note that you can use an IP address when specifying the host name for the nodes.
[root@z1 ~]# pcs stonith create myapc fence_apc_snmp ipaddr="zapc.example.com" pcmk_host_map="z1.example.com:1;z2.example.com:2" login="apc" passwd="apc"Display the parameters of the fence device you created.
[root@rh7-1 ~]# pcs stonith config myapc Resource: myapc (class=stonith type=fence_apc_snmp) Attributes: ipaddr=zapc.example.com pcmk_host_map=z1.example.com:1;z2.example.com:2 login=apc passwd=apc Operations: monitor interval=60s (myapc-monitor-interval-60s)After configuring your fence device, you should test the device. For information about testing a fence device, see Testing a fence device.
NoteDo not test your fence device by disabling the network interface, as this will not properly test fencing.
NoteOnce fencing is configured and a cluster has been started, a network restart will trigger fencing for the node which restarts the network even when the timeout is not exceeded. For this reason, do not restart the network service while the cluster service is running because it will trigger unintentional fencing on the node.
3.7. Backing up and restoring a cluster configuration Copy linkLink copied to clipboard!
You can back up a cluster configuration in a tar archive and restore the cluster configuration files on all nodes from the backup.
Procedure
Use the following command to back up the cluster configuration in a tar archive. If you do not specify a file name, the standard output will be used.
# pcs config backup filenameNoteThe
pcs config backupcommand backs up only the cluster configuration itself as configured in the CIB; the configuration of resource daemons is out of the scope of this command. For example if you have configured an Apache resource in the cluster, the resource settings (which are in the CIB) will be backed up, while the Apache daemon settings (as set in`/etc/httpd`) and the files it serves will not be backed up. Similarly, if there is a database resource configured in the cluster, the database itself will not be backed up, while the database resource configuration (CIB) will be.Use the following command to restore the cluster configuration files on all cluster nodes from the backup. Specifying the
--localoption restores the cluster configuration files only on the node from which you run this command. If you do not specify a file name, the standard input will be used.# pcs config restore [--local] [filename]
3.8. Enabling ports for the High Availability Add-On Copy linkLink copied to clipboard!
The ideal firewall configuration for cluster components depends on the local environment, where you need to take into account such considerations as whether the nodes have multiple network interfaces or whether off-host firewalling is present.
Procedure
If you are running the
firewallddaemon, execute the following commands to enable the ports that are required by the Red Hat High Availability Add-On:# firewall-cmd --permanent --add-service=high-availability # firewall-cmd --add-service=high-availabilityYou may need to modify which ports are open to suit local conditions.
NoteYou can determine whether the
firewallddaemon is installed on your system with therpm -q firewalldcommand. If thefirewallddaemon is installed, you can determine whether it is running with thefirewall-cmd --statecommand.The following table shows the ports to enable for the Red Hat High Availability Add-On and provides an explanation for what the port is used for.
Expand Table 3.1. Ports to enable for High Availability Add-On Port When Required TCP 2224
Default
pcsdport required on all nodes for node-to-node communication). You can configure thepcsdport by means of thePCSD_PORTparameter in the/etc/sysconfig/pcsdfile.It is crucial to open port 2224 in such a way that
pcsfrom any node can talk to all nodes in the cluster, including itself. When using the Booth cluster ticket manager or a quorum device you must open port 2224 on all related hosts, such as Booth arbitrators or the quorum device host.TCP 3121
Required on all nodes if the cluster has any Pacemaker Remote nodes
Pacemaker’s
pacemaker-baseddaemon on the full cluster nodes will contact thepacemaker_remoteddaemon on Pacemaker Remote nodes at port 3121. If a separate interface is used for cluster communication, the port only needs to be open on that interface. At a minimum, the port should open on Pacemaker Remote nodes to full cluster nodes. Because users may convert a host between a full node and a remote node, or run a remote node inside a container using the host’s network, it can be useful to open the port to all nodes. It is not necessary to open the port to any hosts other than nodes.TCP 5403
Required on the quorum device host when using a quorum device with
corosync-qnetd. The default value can be changed with the-poption of thecorosync-qnetdcommand.UDP 5404-5412
Required on corosync nodes to facilitate communication between nodes. It is crucial to open ports 5404-5412 in such a way that
corosyncfrom any node can talk to all nodes in the cluster, including itself.TCP 21064
Required on all nodes if the cluster contains any resources requiring DLM (such as
GFS2).TCP 9929, UDP 9929
Required to be open on all cluster nodes and Booth arbitrator nodes to connections from any of those same nodes when the Booth ticket manager is used to establish a multi-site cluster.
3.9. Configuring a Red Hat High Availability cluster with IBM z/VM instances as cluster members Copy linkLink copied to clipboard!
Red Hat provides several articles that may be useful when designing, configuring, and administering a Red Hat High Availability cluster running on z/VM virtual machines.
- Design Guidance for RHEL High Availability Clusters - IBM z/VM Instances as Cluster Members
- Administrative Procedures for RHEL High Availability Clusters - Configuring z/VM SMAPI Fencing with fence_zvmip for RHEL 7 or 8 IBM z Systems Cluster Members
- RHEL High Availability cluster nodes on IBM z Systems experience STONITH-device timeouts around midnight on a nightly basis (Red Hat Knowledgebase)
- Administrative Procedures for RHEL High Availability Clusters - Preparing a dasd Storage Device for Use by a Cluster of IBM z Systems Members
You may also find the following articles useful when designing a Red Hat High Availability cluster in general.
Chapter 4. Configuring an active/passive Apache HTTP server in a Red Hat High Availability cluster Copy linkLink copied to clipboard!
In this use case, clients access the Apache HTTP server through a floating IP address. The web server runs on one of two nodes in the cluster. If the node on which the web server is running becomes inoperative, the web server starts up again on the second node of the cluster with minimal service interruption.
The following illustration shows a high-level overview of the cluster in which the cluster is a two-node Red Hat High Availability cluster which is configured with a network power switch and with shared storage. The cluster nodes are connected to a public network, for client access to the Apache HTTP server through a virtual IP. The Apache server runs on either Node 1 or Node 2, each of which has access to the storage on which the Apache data is kept. In this illustration, the web server is running on Node 1 while Node 2 is available to run the server if Node 1 becomes inoperative.
Figure 4.1. Apache in a Red Hat High Availability Two-Node Cluster
This use case requires that your system include the following components:
- A two-node Red Hat High Availability cluster with power fencing configured for each node. We recommend but do not require a private network. This procedure uses the cluster example provided in Creating a Red Hat High-Availability cluster with Pacemaker.
- A public virtual IP address, required for Apache.
- Shared storage for the nodes in the cluster, using iSCSI, Fibre Channel, or other shared network block device.
The cluster is configured with an Apache resource group, which contains the cluster components that the web server requires: an LVM resource, a file system resource, an IP address resource, and a web server resource. This resource group can fail over from one node of the cluster to the other, allowing either node to run the web server. Before creating the resource group for this cluster, you will be performing the following procedures:
-
Configure an XFS file system on the logical volume
my_lv. - Configure a web server.
After performing these steps, you create the resource group and the resources it contains.
4.1. Configuring an LVM volume with an XFS file system in a Pacemaker cluster Copy linkLink copied to clipboard!
You can create an LVM logical volume on storage that is shared between the nodes of the cluster.
LVM volumes and the corresponding partitions and devices used by cluster nodes must be connected to the cluster nodes only.
The following procedure creates an LVM logical volume and then creates an XFS file system on that volume for use in a Pacemaker cluster. In this example, the shared partition /dev/sdb1 is used to store the LVM physical volume from which the LVM logical volume will be created.
Procedure
On both nodes of the cluster, perform the following steps to set the value for the LVM system ID to the value of the
unameidentifier for the system. The LVM system ID will be used to ensure that only the cluster is capable of activating the volume group.Set the
system_id_sourceconfiguration option in the/etc/lvm/lvm.confconfiguration file touname.# Configuration option global/system_id_source. system_id_source = "uname"Verify that the LVM system ID on the node matches the
unamefor the node.# lvm systemid system ID: z1.example.com # uname -n z1.example.com
Create the LVM volume and create an XFS file system on that volume. Since the
/dev/sdb1partition is storage that is shared, you perform this part of the procedure on one node only.Create an LVM physical volume on partition
/dev/sdb1.[root@z1 ~]# pvcreate /dev/sdb1 Physical volume "/dev/sdb1" successfully createdCreate the volume group
my_vgthat consists of the physical volume/dev/sdb1.Specify the
--setautoactivation nflag to ensure that volume groups managed by Pacemaker in a cluster will not be automatically activated on startup. If you are using an existing volume group for the LVM volume you are creating, you can reset this flag with thevgchange --setautoactivation ncommand for the volume group.[root@z1 ~]# vgcreate --setautoactivation n my_vg /dev/sdb1 Volume group "my_vg" successfully createdNoteIf your LVM volume group contains one or more physical volumes that reside on remote block storage, such as an iSCSI target, Red Hat recommends that you ensure that the service starts before Pacemaker starts. For information about configuring startup order for a remote physical volume used by a Pacemaker cluster, see Configuring startup order for resource dependencies not managed by Pacemaker.
Verify that the new volume group has the system ID of the node on which you are running and from which you created the volume group.
[root@z1 ~]# vgs -o+systemid VG #PV #LV #SN Attr VSize VFree System ID my_vg 1 0 0 wz--n- <1.82t <1.82t z1.example.comCreate a logical volume using the volume group
my_vg.[root@z1 ~]# lvcreate -L450 -n my_lv my_vg Rounding up size to full physical extent 452.00 MiB Logical volume "my_lv" createdYou can use the
lvscommand to display the logical volume.[root@z1 ~]# lvs LV VG Attr LSize Pool Origin Data% Move Log Copy% Convert my_lv my_vg -wi-a---- 452.00m ...Create an XFS file system on the logical volume
my_lv.[root@z1 ~]# mkfs.xfs /dev/my_vg/my_lv meta-data=/dev/my_vg/my_lv isize=512 agcount=4, agsize=28928 blks = sectsz=512 attr=2, projid32bit=1 ...
If the use of a devices file is enabled with the
use_devicesfile = 1parameter in thelvm.conffile, add the shared device to the devices file on the second node in the cluster. This feature is enabled by default.[root@z2 ~]# lvmdevices --adddev /dev/sdb1
4.2. Configuring an Apache HTTP Server Copy linkLink copied to clipboard!
Install and configure the Apache HTTP Server to function as a cluster resource. You must prepare the web server on all nodes, configure server status monitoring for the resource agent, and place web content on the shared file system.
Procedure
Ensure that the Apache HTTP Server is installed on each node in the cluster. You also need the
wgettool installed on the cluster to be able to check the status of the Apache HTTP Server.On each node, execute the following command.
# dnf install -y httpd wgetIf you are running the
firewallddaemon, on each node in the cluster enable the ports that are required by the Red Hat High Availability Add-On and enable the ports you will require for runninghttpd. This example enables thehttpdports for public access, but the specific ports to enable forhttpdmay vary for production use.# firewall-cmd --permanent --add-service=http # firewall-cmd --permanent --zone=public --add-service=http # firewall-cmd --reloadIn order for the Apache resource agent to get the status of Apache, on each node in the cluster create the following addition to the existing configuration to enable the status server URL.
# cat <<-END > /etc/httpd/conf.d/status.conf <Location /server-status> SetHandler server-status Require local </Location> ENDCreate a web page for Apache to serve up.
On one node in the cluster, ensure that the logical volume you created in Configuring an LVM volume with an XFS file system in a Pacemaker cluster is activated, mount the file system that you created on that logical volume, create the file
index.htmlon that file system, and then unmount the file system.# lvchange -ay my_vg/my_lv # mount /dev/my_vg/my_lv /var/www/ # mkdir /var/www/html # mkdir /var/www/cgi-bin # mkdir /var/www/error # restorecon -R /var/www # cat <<-END >/var/www/html/index.html <html> <body>Hello</body> </html> END # umount /var/www
4.3. Creating the resources and resource groups Copy linkLink copied to clipboard!
You can create the resources for your cluster with the following procedure. To ensure these resources all run on the same node, they are configured as part of the resource group apachegroup.
-
An
LVM-activateresource namedmy_lvmthat uses the LVM volume group you created in Configuring an LVM volume with an XFS file system in a Pacemaker cluster. -
A
Filesystemresource namedmy_fs, that uses the file system device/dev/my_vg/my_lvyou created in Configuring an LVM volume with an XFS file system in a Pacemaker cluster. -
An
IPaddr2resource, which is a floating IP address for theapachegroupresource group. The IP address must not be one already associated with a physical node. If theIPaddr2resource’s NIC device is not specified, the floating IP must reside on the same network as one of the node’s statically assigned IP addresses, otherwise the NIC device to assign the floating IP address cannot be properly detected. -
An
apacheresource namedWebsitethat uses theindex.htmlfile and the Apache configuration you defined in Configuring an Apache HTTP server.
The following procedure creates the resource group apachegroup and the resources that the group contains. The resources will start in the order in which you add them to the group, and they will stop in the reverse order in which they are added to the group. Run this procedure from one node of the cluster only.
Procedure
The following command creates the
LVM-activateresourcemy_lvm. Because the resource groupapachegroupdoes not yet exist, this command creates the resource group.NoteDo not configure more than one
LVM-activateresource that uses the same LVM volume group in an active/passive HA configuration, as this could cause data corruption. Additionally, do not configure anLVM-activateresource as a clone resource in an active/passive HA configuration.[root@z1 ~]# pcs resource create my_lvm ocf:heartbeat:LVM-activate vgname=my_vg vg_access_mode=system_id --group apachegroupWhen you create a resource, the resource is started automatically. You can use the following command to confirm that the resource was created and has started.
# pcs resource status Resource Group: apachegroup my_lvm (ocf::heartbeat:LVM-activate): StartedYou can manually stop and start an individual resource with the
pcs resource disableandpcs resource enablecommands.The following commands create the remaining resources for the configuration, adding them to the existing resource group
apachegroup.[root@z1 ~]# pcs resource create my_fs Filesystem device="/dev/my_vg/my_lv" directory="/var/www" fstype="xfs" --group apachegroup [root@z1 ~]# pcs resource create VirtualIP IPaddr2 ip=198.51.100.3 cidr_netmask=24 --group apachegroup [root@z1 ~]# pcs resource create Website apache configfile="/etc/httpd/conf/httpd.conf" statusurl="http://127.0.0.1/server-status" --group apachegroupAfter creating the resources and the resource group that contains them, you can check the status of the cluster. Note that all four resources are running on the same node.
[root@z1 ~]# pcs status Cluster name: my_cluster Last updated: Wed Jul 31 16:38:51 2013 Last change: Wed Jul 31 16:42:14 2013 via crm_attribute on z1.example.com Stack: corosync Current DC: z2.example.com (2) - partition with quorum Version: 1.1.10-5.el7-9abe687 2 Nodes configured 6 Resources configured Online: [ z1.example.com z2.example.com ] Full list of resources: myapc (stonith:fence_apc_snmp): Started z1.example.com Resource Group: apachegroup my_lvm (ocf::heartbeat:LVM-activate): Started z1.example.com my_fs (ocf::heartbeat:Filesystem): Started z1.example.com VirtualIP (ocf::heartbeat:IPaddr2): Started z1.example.com Website (ocf::heartbeat:apache): Started z1.example.comNote that if you have not configured a fencing device for your cluster, by default the resources do not start.
Once the cluster is up and running, you can point a browser to the IP address you defined as the
IPaddr2resource to view the sample display, consisting of the simple word "Hello".HelloIf you find that the resources you configured are not running, you can run the
pcs resource debug-start resourcecommand to test the resource configuration.When you use the
apacheresource agent to manage Apache, it does not usesystemd. Because of this, you must edit thelogrotatescript supplied with Apache so that it does not usesystemctlto reload Apache.Remove the following line in the
/etc/logrotate.d/httpdfile on each node in the cluster./bin/systemctl reload httpd.service > /dev/null 2>/dev/null || trueReplace the line you removed with the following three lines, specifying
/var/run/httpd-website.pidas the PID file path where website is the name of the Apache resource. In this example, the Apache resource name isWebsite./usr/bin/test -f /var/run/httpd-Website.pid >/dev/null 2>/dev/null && /usr/bin/ps -q $(/usr/bin/cat /var/run/httpd-Website.pid) >/dev/null 2>/dev/null && /usr/sbin/httpd -f /etc/httpd/conf/httpd.conf -c "PidFile /var/run/httpd-Website.pid" -k graceful > /dev/null 2>/dev/null || true
4.4. Testing the resource configuration Copy linkLink copied to clipboard!
Verify the high availability configuration by simulating a manual failover. By placing the active node into standby mode, you force the cluster to migrate resources to a backup node, confirming that services remain available during a node outage.
In the example configuration, all resources are currently active on z1.example.com. To test the failover logic, put z1 into standby mode. This prevents the node from hosting resources and triggers the cluster to relocate the resource group to z2.example.com.
Procedure
The following command puts node
z1.example.cominstandbymode.[root@z1 ~]# pcs node standby z1.example.comAfter putting node
z1instandbymode, check the cluster status. Note that the resources should now all be running onz2.[root@z1 ~]# pcs status Cluster name: my_cluster Last updated: Wed Jul 31 17:16:17 2013 Last change: Wed Jul 31 17:18:34 2013 via crm_attribute on z1.example.com Stack: corosync Current DC: z2.example.com (2) - partition with quorum Version: 1.1.10-5.el7-9abe687 2 Nodes configured 6 Resources configured Node z1.example.com (1): standby Online: [ z2.example.com ] Full list of resources: myapc (stonith:fence_apc_snmp): Started z1.example.com Resource Group: apachegroup my_lvm (ocf::heartbeat:LVM-activate): Started z2.example.com my_fs (ocf::heartbeat:Filesystem): Started z2.example.com VirtualIP (ocf::heartbeat:IPaddr2): Started z2.example.com Website (ocf::heartbeat:apache): Started z2.example.comThe web site at the defined IP address should still display, without interruption.
To remove
z1fromstandbymode, enter the following command.[root@z1 ~]# pcs node unstandby z1.example.comNoteRemoving a node from
standbymode does not in itself cause the resources to fail back over to that node. This will depend on theresource-stickinessvalue for the resources. For information about theresource-stickinessmeta attribute, see Configuring a resource to prefer its current node.
Chapter 5. Configuring an active/passive NFS server in a Red Hat High Availability cluster Copy linkLink copied to clipboard!
The Red Hat High Availability Add-On supports active/passive NFS on RHEL clusters using shared storage. Clients connect via a floating IP. If the active node fails, the service fails over to the other node with minimal interruption.
This use case requires that your system include the following components:
- A two-node Red Hat High Availability cluster with power fencing configured for each node. We recommend but do not require a private network. This procedure uses the cluster example provided in Creating a Red Hat High-Availability cluster with Pacemaker.
- A public virtual IP address, required for the NFS server.
- Shared storage for the nodes in the cluster, using iSCSI, Fibre Channel, or other shared network block device.
Configuring a highly available active/passive NFS server on an existing two-node Red Hat Enterprise Linux High Availability cluster requires that you perform the following steps:
- Configure a file system on an LVM logical volume on the shared storage for the nodes in the cluster.
- Configure an NFS share on the shared storage on the LVM logical volume.
- Create the cluster resources.
- Test the NFS server you have configured.
5.1. Configuring an LVM volume with an XFS file system in a Pacemaker cluster Copy linkLink copied to clipboard!
You can create an LVM logical volume on storage that is shared between the nodes of the cluster.
LVM volumes and the corresponding partitions and devices used by cluster nodes must be connected to the cluster nodes only.
The following procedure creates an LVM logical volume and then creates an XFS file system on that volume for use in a Pacemaker cluster. In this example, the shared partition /dev/sdb1 is used to store the LVM physical volume from which the LVM logical volume will be created.
Procedure
On both nodes of the cluster, perform the following steps to set the value for the LVM system ID to the value of the
unameidentifier for the system. The LVM system ID will be used to ensure that only the cluster is capable of activating the volume group.Set the
system_id_sourceconfiguration option in the/etc/lvm/lvm.confconfiguration file touname.# Configuration option global/system_id_source. system_id_source = "uname"Verify that the LVM system ID on the node matches the
unamefor the node.# lvm systemid system ID: z1.example.com # uname -n z1.example.com
Create the LVM volume and create an XFS file system on that volume. Since the
/dev/sdb1partition is storage that is shared, you perform this part of the procedure on one node only.Create an LVM physical volume on partition
/dev/sdb1.[root@z1 ~]# pvcreate /dev/sdb1 Physical volume "/dev/sdb1" successfully createdCreate the volume group
my_vgthat consists of the physical volume/dev/sdb1.Specify the
--setautoactivation nflag to ensure that volume groups managed by Pacemaker in a cluster will not be automatically activated on startup. If you are using an existing volume group for the LVM volume you are creating, you can reset this flag with thevgchange --setautoactivation ncommand for the volume group.[root@z1 ~]# vgcreate --setautoactivation n my_vg /dev/sdb1 Volume group "my_vg" successfully createdNoteIf your LVM volume group contains one or more physical volumes that reside on remote block storage, such as an iSCSI target, Red Hat recommends that you ensure that the service starts before Pacemaker starts. For information about configuring startup order for a remote physical volume used by a Pacemaker cluster, see Configuring startup order for resource dependencies not managed by Pacemaker.
Verify that the new volume group has the system ID of the node on which you are running and from which you created the volume group.
[root@z1 ~]# vgs -o+systemid VG #PV #LV #SN Attr VSize VFree System ID my_vg 1 0 0 wz--n- <1.82t <1.82t z1.example.comCreate a logical volume using the volume group
my_vg.[root@z1 ~]# lvcreate -L450 -n my_lv my_vg Rounding up size to full physical extent 452.00 MiB Logical volume "my_lv" createdYou can use the
lvscommand to display the logical volume.[root@z1 ~]# lvs LV VG Attr LSize Pool Origin Data% Move Log Copy% Convert my_lv my_vg -wi-a---- 452.00m ...Create an XFS file system on the logical volume
my_lv.[root@z1 ~]# mkfs.xfs /dev/my_vg/my_lv meta-data=/dev/my_vg/my_lv isize=512 agcount=4, agsize=28928 blks = sectsz=512 attr=2, projid32bit=1 ...
If the use of a devices file is enabled with the
use_devicesfile = 1parameter in thelvm.conffile, add the shared device to the devices file on the second node in the cluster. This feature is enabled by default.[root@z2 ~]# lvmdevices --adddev /dev/sdb1
5.3. Configuring the resources and resource group for an NFS server in a cluster Copy linkLink copied to clipboard!
You can configure the cluster resources for an NFS server in a cluster.
If you have not configured a fencing device for your cluster, by default the resources do not start.
If you find that the resources you configured are not running, you can run the pcs resource debug-start resource command to test the resource configuration. This starts the service outside of the cluster’s control and knowledge. At the point the configured resources are running again, run pcs resource cleanup resource to make the cluster aware of the updates.
The following procedure configures the system resources. To ensure these resources all run on the same node, they are configured as part of the resource group nfsgroup. The resources will start in the order in which you add them to the group, and they will stop in the reverse order in which they are added to the group. Run this procedure from one node of the cluster only.
Procedure
Create the LVM-activate resource named
my_lvm. Because the resource groupnfsgroupdoes not yet exist, this command creates the resource group.WarningDo not configure more than one
LVM-activateresource that uses the same LVM volume group in an active/passive HA configuration, as this risks data corruption. Additionally, do not configure anLVM-activateresource as a clone resource in an active/passive HA configuration.[root@z1 ~]# pcs resource create my_lvm ocf:heartbeat:LVM-activate vgname=my_vg vg_access_mode=system_id --group nfsgroupCheck the status of the cluster to verify that the resource is running.
root@z1 ~]# pcs status Cluster name: my_cluster Last updated: Thu Jan 8 11:13:17 2015 Last change: Thu Jan 8 11:13:08 2015 Stack: corosync Current DC: z2.example.com (2) - partition with quorum Version: 1.1.12-a14efad 2 Nodes configured 3 Resources configured Online: [ z1.example.com z2.example.com ] Full list of resources: myapc (stonith:fence_apc_snmp): Started z1.example.com Resource Group: nfsgroup my_lvm (ocf::heartbeat:LVM-activate): Started z1.example.com PCSD Status: z1.example.com: Online z2.example.com: Online Daemon Status: corosync: active/enabled pacemaker: active/enabled pcsd: active/enabledConfigure a
Filesystemresource for the cluster.The following command configures an XFS
Filesystemresource namednfsshareas part of thenfsgroupresource group. This file system uses the LVM volume group and XFS file system you created in Configuring an LVM volume with an XFS file system in a cluster and will be mounted on the/nfssharedirectory you created in Configuring an NFS share.[root@z1 ~]# pcs resource create nfsshare Filesystem device=/dev/my_vg/my_lv directory=/nfsshare fstype=xfs --group nfsgroupYou can specify mount options as part of the resource configuration for a
Filesystemresource with theoptions=optionsparameter. Run thepcs resource describe Filesystemcommand for full configuration options.Verify that the
my_lvmandnfsshareresources are running.[root@z1 ~]# pcs status ... Full list of resources: myapc (stonith:fence_apc_snmp): Started z1.example.com Resource Group: nfsgroup my_lvm (ocf::heartbeat:LVM-activate): Started z1.example.com nfsshare (ocf::heartbeat:Filesystem): Started z1.example.com ...Create the
nfsserverresource namednfs-daemonas part of the resource groupnfsgroup.NoteThe
nfsserverresource allows you to specify annfs_shared_infodirparameter, which is a directory that NFS servers use to store NFS-related stateful information.It is recommended that this attribute be set to a subdirectory of one of the
Filesystemresources you created in this collection of exports. This ensures that the NFS servers are storing their stateful information on a device that will become available to another node if this resource group needs to relocate. In this example;-
/nfsshareis the shared-storage directory managed by theFilesystemresource -
/nfsshare/exports/export1and/nfsshare/exports/export2are the export directories -
/nfsshare/nfsinfois the shared-information directory for thenfsserverresource
[root@z1 ~]# pcs resource create nfs-daemon nfsserver nfs_shared_infodir=/nfsshare/nfsinfo nfs_no_notify=true --group nfsgroup [root@z1 ~]# pcs status ...-
Add the
exportfsresources to export the/nfsshare/exportsdirectory. These resources are part of the resource groupnfsgroup. This builds a virtual directory for NFSv4 clients. NFSv3 clients can access these exports as well.NoteThe
fsid=0option is required only if you want to create a virtual directory for NFSv4 clients. For more information, see the Red Hat Knowledgebase solution How do I configure the fsid option in an NFS server’s /etc/exports file?.[root@z1 ~]# pcs resource create nfs-root exportfs clientspec=192.168.122.0/255.255.255.0 options=rw,sync,no_root_squash directory=/nfsshare/exports fsid=0 --group nfsgroup [root@z1 ~]# pcs resource create nfs-export1 exportfs clientspec=192.168.122.0/255.255.255.0 options=rw,sync,no_root_squash directory=/nfsshare/exports/export1 fsid=1 --group nfsgroup [root@z1 ~]# pcs resource create nfs-export2 exportfs clientspec=192.168.122.0/255.255.255.0 options=rw,sync,no_root_squash directory=/nfsshare/exports/export2 fsid=2 --group nfsgroupAdd the floating IP address resource that NFS clients will use to access the NFS share. This resource is part of the resource group
nfsgroup. For this example deployment, we are using 192.168.122.200 as the floating IP address.[root@z1 ~]# pcs resource create nfs_ip IPaddr2 ip=192.168.122.200 cidr_netmask=24 --group nfsgroupAdd an
nfsnotifyresource for sending NFSv3 reboot notifications once the entire NFS deployment has initialized. This resource is part of the resource groupnfsgroup.NoteFor the NFS notification to be processed correctly, the floating IP address must have a host name associated with it that is consistent on both the NFS servers and the NFS client.
[root@z1 ~]# pcs resource create nfs-notify nfsnotify source_host=192.168.122.200 --group nfsgroupAfter creating the resources and the resource constraints, you can check the status of the cluster. Note that all resources are running on the same node.
[root@z1 ~]# pcs status ... Full list of resources: myapc (stonith:fence_apc_snmp): Started z1.example.com Resource Group: nfsgroup my_lvm (ocf::heartbeat:LVM-activate): Started z1.example.com nfsshare (ocf::heartbeat:Filesystem): Started z1.example.com nfs-daemon (ocf::heartbeat:nfsserver): Started z1.example.com nfs-root (ocf::heartbeat:exportfs): Started z1.example.com nfs-export1 (ocf::heartbeat:exportfs): Started z1.example.com nfs-export2 (ocf::heartbeat:exportfs): Started z1.example.com nfs_ip (ocf::heartbeat:IPaddr2): Started z1.example.com nfs-notify (ocf::heartbeat:nfsnotify): Started z1.example.com ...
5.4. Testing the NFS resource configuration Copy linkLink copied to clipboard!
You can validate your NFS resource configuration in a high availability cluster. You should be able to mount the exported file system with either NFSv3 or NFSv4.
Testing the NFS export
-
If you are running the
firewallddaemon on your cluster nodes, ensure that the ports that your system requires for NFS access are enabled on all nodes. On a node outside of the cluster, residing in the same network as the deployment, verify that the NFS share can be seen by mounting the NFS share. For this example, we are using the 192.168.122.0/24 network.
# showmount -e 192.168.122.200 Export list for 192.168.122.200: /nfsshare/exports/export1 192.168.122.0/255.255.255.0 /nfsshare/exports 192.168.122.0/255.255.255.0 /nfsshare/exports/export2 192.168.122.0/255.255.255.0To verify that you can mount the NFS share with NFSv4, mount the NFS share to a directory on the client node. After mounting, verify that the contents of the export directories are visible. Unmount the share after testing.
# mkdir nfsshare # mount -o "vers=4" 192.168.122.200:export1 nfsshare # ls nfsshare clientdatafile1 # umount nfsshareVerify that you can mount the NFS share with NFSv3. After mounting, verify that the test file
clientdatafile1is visible. Unlike NFSv4, since NFSv3 does not use the virtual file system, you must mount a specific export. Unmount the share after testing.# mkdir nfsshare # mount -o "vers=3" 192.168.122.200:/nfsshare/exports/export2 nfsshare # ls nfsshare clientdatafile2 # umount nfsshare
Testing for failover
On a node outside of the cluster, mount the NFS share and verify access to the
clientdatafile1file you created in Configuring an NFS share.# mkdir nfsshare # mount -o "vers=4" 192.168.122.200:export1 nfsshare # ls nfsshare clientdatafile1From a node within the cluster, determine which node in the cluster is running
nfsgroup. In this example,nfsgroupis running onz1.example.com.[root@z1 ~]# pcs status ... Full list of resources: myapc (stonith:fence_apc_snmp): Started z1.example.com Resource Group: nfsgroup my_lvm (ocf::heartbeat:LVM-activate): Started z1.example.com nfsshare (ocf::heartbeat:Filesystem): Started z1.example.com nfs-daemon (ocf::heartbeat:nfsserver): Started z1.example.com nfs-root (ocf::heartbeat:exportfs): Started z1.example.com nfs-export1 (ocf::heartbeat:exportfs): Started z1.example.com nfs-export2 (ocf::heartbeat:exportfs): Started z1.example.com nfs_ip (ocf::heartbeat:IPaddr2): Started z1.example.com nfs-notify (ocf::heartbeat:nfsnotify): Started z1.example.com ...From a node within the cluster, put the node that is running
nfsgroupin standby mode.[root@z1 ~]# pcs node standby z1.example.comVerify that
nfsgroupsuccessfully starts on the other cluster node.[root@z1 ~]# pcs status ... Full list of resources: Resource Group: nfsgroup my_lvm (ocf::heartbeat:LVM-activate): Started z2.example.com nfsshare (ocf::heartbeat:Filesystem): Started z2.example.com nfs-daemon (ocf::heartbeat:nfsserver): Started z2.example.com nfs-root (ocf::heartbeat:exportfs): Started z2.example.com nfs-export1 (ocf::heartbeat:exportfs): Started z2.example.com nfs-export2 (ocf::heartbeat:exportfs): Started z2.example.com nfs_ip (ocf::heartbeat:IPaddr2): Started z2.example.com nfs-notify (ocf::heartbeat:nfsnotify): Started z2.example.com ...From the node outside the cluster on which you have mounted the NFS share, verify that this outside node still continues to have access to the test file within the NFS mount.
# ls nfsshare clientdatafile1Service will be lost briefly for the client during the failover but the client should recover it with no user intervention. By default, clients using NFSv4 may take up to 90 seconds to recover the mount; this 90 seconds represents the NFSv4 file lease grace period observed by the server on startup. NFSv3 clients should recover access to the mount in a matter of a few seconds.
From a node within the cluster, remove the node that was initially running
nfsgroupfrom standby mode.NoteRemoving a node from
standbymode does not in itself cause the resources to fail back over to that node. This will depend on theresource-stickinessvalue for the resources. For information about theresource-stickinessmeta attribute, see Configuring a resource to prefer its current node.[root@z1 ~]# pcs node unstandby z1.example.com
Chapter 6. Saving a configuration change to a working file Copy linkLink copied to clipboard!
Save configuration changes to a file to stage updates without affecting the active CIB. You can then define multiple changes without immediately updating the running cluster.
Although you should not edit the cluster configuration file directly, you can view the raw cluster configuration with the pcs cluster cib command.
The following is the recommended procedure for pushing changes to the CIB file. This procedure creates a copy of the original saved CIB file and makes changes to that copy. When pushing those changes to the active CIB, this procedure specifies the diff-against option of the pcs cluster cib-push command so that only the changes between the original file and the updated file are pushed to the CIB. This allows users to make changes in parallel that do not overwrite each other, and it reduces the load on Pacemaker which does not need to parse the entire configuration file.
Procedure
Save the active CIB to a file. This example saves the CIB to a file named
original.xml.# pcs cluster cib original.xmlCopy the saved file to the working file you will be using for the configuration updates.
# cp original.xml updated.xmlUpdate your configuration as needed. The following command creates a resource in the file
updated.xmlbut does not add that resource to the currently running cluster configuration.# pcs -f updated.xml resource create VirtualIP ocf:heartbeat:IPaddr2 ip=192.168.0.120 op monitor interval=30sPush the updated file to the active CIB, specifying that you are pushing only the changes you have made to the original file.
# pcs cluster cib-push updated.xml diff-against=original.xmlAlternately, you can push the entire current content of a CIB file with the following command.
# pcs cluster cib-push filenameWhen pushing the entire CIB file, Pacemaker checks the version and does not allow you to push a CIB file which is older than the one already in a cluster. If you need to update the entire CIB file with a version that is older than the one currently in the cluster, you can use the
--configoption of thepcs cluster cib-pushcommand.# pcs cluster cib-push --config filename
Chapter 7. Displaying cluster status Copy linkLink copied to clipboard!
There are a variety of commands you can use to display the status of a cluster and its components.
Displaying the full cluster configuration
Use the following command to display the full current cluster configuration:
# pcs config
Displaying status of cluster and cluster resources
You can display the status of the cluster and the cluster resources with the following command:
# pcs status
Displaying status of a cluster component
You can display the status of a particular cluster component with the commands parameter of the pcs status command, specifying resources, cluster, nodes, or pcsd.
# pcs status commands
For example, the following command displays the status of the cluster resources:
# pcs status resources
The following command displays the status of the cluster, but not the cluster resources:
# pcs cluster status
Delaying status display until actions are completed
If you run the pcs status command before Pacemaker has completed any actions required by changes to the CIB, the cluster state at that time might not match the desired status. You can ensure that Pacemaker does not need to take any further actions by running the pcs status wait command.
The pcs status wait command waits until the cluster has completed all current actions before returning a value. If any actions unrelated to your recent changes are in progress, the command waits until those are completed. The pcs status wait command returns a value of 0 as soon as Pacemaker completes pending actions.
You can specify a period of time to wait. If the current actions have not completed after that time period, the command prints an error and returns a value of 1.
The following command waits until Pacemaker has applied configuration changes:
# pcs status wait
Waiting for the cluster to apply configuration changes...
The following command waits up to one minute until Pacemaker has applied configuration changes:
# pcs status wait 1min
Waiting for the cluster to apply configuration changes (timeout: 60 seconds)...
Chapter 8. Modifying, displaying, and exporting the corosync.conf file Copy linkLink copied to clipboard!
To set cluster parameters, corosync uses the corosync.conf file. Do not edit corosync.conf directly. Use the pcs interface instead.
8.1. Modifying the corosync.conf file with the pcs command Copy linkLink copied to clipboard!
Modify corosync.conf parameters, such as token timeouts and logging levels, using pcs. pcs validates input and synchronizes changes across all nodes, ensuring consistency. While some updates apply immediately, modifying core transport settings requires a cluster restart.
Procedure
The following command modifies the parameters in the
corosync.conffile:# pcs cluster config update [transport pass:quotes[transport options]] [compression pass:quotes[compression options]] [crypto pass:quotes[crypto options]] [totem pass:quotes[totem options]] [--corosync_conf pass:quotes[path]]The following example command udates the
knet_pmtud_intervaltransport value and thetokenandjointotem values:# pcs cluster config update transport knet_pmtud_interval=35 totem token=10000 join=100
8.2. Displaying the corosync.conf file with the pcs command Copy linkLink copied to clipboard!
Display the contents of the corosync.conf configuration file to verify cluster settings, network parameters, and quorum configuration.
Procedure
Display the contents of the
corosync.conffile:# pcs cluster corosyncYou can print the contents of the
corosync.conffile in a human-readable format with thepcs cluster configcommand, as in the following example.The output for this command includes the UUID for the cluster if the UUID was added manually as described in Identifying clusters by UUID.
[root@r8-node-01 ~]# pcs cluster config Cluster Name: HACluster Cluster UUID: ad4ae07dcafe4066b01f1cc9391f54f5 Transport: knet Nodes: r8-node-01: Link 0 address: r8-node-01 Link 1 address: 192.168.122.121 nodeid: 1 r8-node-02: Link 0 address: r8-node-02 Link 1 address: 192.168.122.122 nodeid: 2 Links: Link 1: linknumber: 1 ping_interval: 1000 ping_timeout: 2000 pong_count: 5 Compression Options: level: 9 model: zlib threshold: 150 Crypto Options: cipher: aes256 hash: sha256 Totem Options: downcheck: 2000 join: 50 token: 10000 Quorum Device: net Options: sync_timeout: 2000 timeout: 3000 Model Options: algorithm: lms host: r8-node-03 Heuristics: exec_ping: ping -c 1 127.0.0.1
8.3. Exporting the corosync.conf file Copy linkLink copied to clipboard!
You can run the pcs cluster config show command with the --output-format=cmd option to display the pcs configuration commands that can be used to recreate the existing corosync.conf file on a different system.
Procedure
Export the
corosync.conffile:[root@r8-node-01 ~]# pcs cluster config show --output-format=cmd pcs cluster setup HACluster \ r8-node-01 addr=r8-node-01 addr=192.168.122.121 \ r8-node-02 addr=r8-node-02 addr=192.168.122.122 \ transport \ knet \ link \ linknumber=1 \ ping_interval=1000 \ ping_timeout=2000 \ pong_count=5 \ compression \ level=9 \ model=zlib \ threshold=150 \ crypto \ cipher=aes256 \ hash=sha256 \ totem \ downcheck=2000 \ join=50 \ token=10000
Chapter 9. Configuring fencing in a Red Hat High Availability cluster Copy linkLink copied to clipboard!
When a node becomes unresponsive, the cluster must isolate it to prevent data corruption. Since the node cannot be contacted directly, you must configure fencing. An external fence device cuts off the node’s access to shared resources or performs a hard reboot.
Without a fence device configured you do not have a way to know that the resources previously used by the disconnected cluster node have been released, and this could prevent the services from running on any of the other cluster nodes. Conversely, the system may assume erroneously that the cluster node has released its resources and this can lead to data corruption and data loss. Without a fence device configured data integrity cannot be guaranteed and the cluster configuration will be unsupported.
When the fencing is in progress no other cluster operation is allowed to run. Normal operation of the cluster cannot resume until fencing has completed or the cluster node rejoins the cluster after the cluster node has been rebooted. For more information about fencing and its importance in a Red Hat High Availability cluster, see the Red Hat Knowledgebase solution Fencing in a Red Hat High Availability Cluster.
9.1. Displaying available fence agents and their options Copy linkLink copied to clipboard!
You can view available fencing agents and the available options for specific fencing agents.
Your system’s hardware determines the type of fencing device to use for your cluster. For information about supported platforms and architectures and the different fencing devices, see the Red Hat Knowledgebase article Cluster Platforms and Architectures section of the article Support Policies for RHEL High Availability Clusters.
Run the following command to list all available fencing agents. When you specify a filter, this command displays only the fencing agents that match the filter.
# pcs stonith list [filter]
Run the following command to display the options for the specified fencing agent.
# pcs stonith describe [stonith_agent]
For example, the following command displays the options for the fence agent for APC over telnet/SSH.
# pcs stonith describe fence_apc
Stonith options for: fence_apc
ipaddr (required): IP Address or Hostname
login (required): Login Name
passwd: Login password or passphrase
passwd_script: Script to retrieve password
cmd_prompt: Force command prompt
secure: SSH connection
port (required): Physical plug number or name of virtual machine
identity_file: Identity file for ssh
switch: Physical switch number on device
inet4_only: Forces agent to use IPv4 addresses only
inet6_only: Forces agent to use IPv6 addresses only
ipport: TCP port to use for connection with device
action (required): Fencing Action
verbose: Verbose mode
debug: Write debug information to given file
version: Display version information and exit
help: Display help and exit
separator: Separator for CSV created by operation list
power_timeout: Test X seconds for status change after ON/OFF
shell_timeout: Wait X seconds for cmd prompt after issuing command
login_timeout: Wait X seconds for cmd prompt after login
power_wait: Wait X seconds after issuing ON/OFF
delay: Wait X seconds before fencing is started
retry_on: Count of attempts to retry power on
For fence agents that provide a method option, with the exception of the fence_sbd agent a value of cycle is unsupported and should not be specified, as it may cause data corruption. Even for fence_sbd, however. you should not specify a method and instead use the default value.
9.2. Creating a fence device Copy linkLink copied to clipboard!
Create a fence device using the pcs stonith create command. To view all available creation options, use the pcs stonith -h command.
Procedure
Create a fence device:
# pcs stonith create stonith_id stonith_device_type [stonith_device_options] [op operation_action operation_options]The following command creates a single fencing device for a single node:
# pcs stonith create MyStonith fence_virt pcmk_host_list=f1 op monitor interval=30sSome fence devices can fence only a single node, while other devices can fence multiple nodes. The parameters you specify when you create a fencing device depend on what your fencing device supports and requires.
- Some fence devices can automatically determine what nodes they can fence.
-
You can use the
pcmk_host_listparameter when creating a fencing device to specify all of the machines that are controlled by that fencing device. -
Some fence devices require a mapping of host names to the specifications that the fence device understands. You can map host names with the
pcmk_host_mapparameter when creating a fencing device.
9.3. General properties of fencing devices Copy linkLink copied to clipboard!
Configure fencing behavior using device-specific options and cluster-wide properties. Device options define agent settings, such as IP addresses, and metadata like delays. Cluster properties manage global logic, including timeouts and the stonith-enabled parameter.
Any cluster node can fence any other cluster node with any fence device, regardless of whether the fence resource is started or stopped. Whether the resource is started controls only the recurring monitor for the device, not whether it can be used, with the following exceptions:
-
You can disable a fencing device by running the
pcs stonith disable stonith_idcommand. This will prevent any node from using that device. -
To prevent a specific node from using a fencing device, you can configure location constraints for the fencing resource with the
pcs constraint location … avoidscommand. -
Configuring
stonith-enabled=falsewill disable fencing altogether. Note, however, that Red Hat does not support clusters when fencing is disabled, as it is not suitable for a production environment.
The following table describes the general properties you can set for fencing devices.
| Field | Type | Default | Description |
|---|---|---|---|
|
| string |
A mapping of host names to port numbers for devices that do not support host names. For example: | |
|
| string |
A list of machines controlled by this device (Optional unless | |
|
| string |
*
* Otherwise,
* Otherwise,
*Otherwise, |
How to determine which machines are controlled by the device. Allowed values: |
The following table summarizes additional properties you can set for fencing devices. Note that these properties are for advanced use only.
| Field | Type | Default | Description |
|---|---|---|---|
|
| string | port |
An alternate parameter to supply instead of port. Some devices do not support the standard port parameter or may provide additional ones. Use this to specify an alternate, device-specific parameter that should indicate the machine to be fenced. A value of |
|
| string | reboot |
An alternate command to run instead of |
|
| time | 60s |
Specify an alternate timeout to use for reboot actions instead of |
|
| integer | 2 |
The maximum number of times to retry the |
|
| string | off |
An alternate command to run instead of |
|
| time | 60s |
Specify an alternate timeout to use for off actions instead of |
|
| integer | 2 | The maximum number of times to retry the off command within the timeout period. Some devices do not support multiple connections. Operations may fail if the device is busy with another task so Pacemaker will automatically retry the operation, if there is time remaining. Use this option to alter the number of times Pacemaker retries off actions before giving up. |
|
| string | list |
An alternate command to run instead of |
|
| time | 60s | Specify an alternate timeout to use for list actions. Some devices need much more or much less time to complete than normal. Use this to specify an alternate, device-specific, timeout for list actions. |
|
| integer | 2 |
The maximum number of times to retry the |
|
| string | monitor |
An alternate command to run instead of |
|
| time | 60s |
Specify an alternate timeout to use for monitor actions instead of |
|
| integer | 2 |
The maximum number of times to retry the |
|
| string | status |
An alternate command to run instead of |
|
| time | 60s |
Specify an alternate timeout to use for status actions instead of |
|
| integer | 2 | The maximum number of times to retry the status command within the timeout period. Some devices do not support multiple connections. Operations may fail if the device is busy with another task so Pacemaker will automatically retry the operation, if there is time remaining. Use this option to alter the number of times Pacemaker retries status actions before giving up. |
|
| string | 0s |
Enables a base delay for fencing actions and specifies a base delay value. You can specify different values for different nodes with the |
|
| time | 0s |
Enables a random delay for fencing actions and specifies the maximum delay, which is the maximum value of the combined base delay and random delay. For example, if the base delay is 3 and |
|
| integer | 1 |
The maximum number of actions that can be performed in parallel on this device. The cluster property |
|
| string | on |
For advanced use only: An alternate command to run instead of |
|
| time | 60s |
For advanced use only: Specify an alternate timeout to use for |
|
| integer | 2 |
For advanced use only: The maximum number of times to retry the |
In addition to the properties you can set for individual fence devices, there are also cluster properties you can set that determine fencing behavior, as described in the following table.
| Option | Default | Description |
|---|---|---|
|
| true |
Indicates that failed nodes and nodes with resources that cannot be stopped should be fenced. Protecting your data requires that you set this
If
Red Hat only supports clusters with this value set to |
|
| reboot |
Action to send to fencing device. Allowed values: |
|
| 60s | How long to wait for a STONITH action to complete. |
|
| 10 | How many times fencing can fail for a target before the cluster will no longer immediately re-attempt it. |
|
| The maximum time to wait until a node can be assumed to have been killed by the hardware watchdog. It is recommended that this value be set to twice the value of the hardware watchdog timeout. This option is needed only if watchdog-only SBD configuration is used for fencing. | |
|
| true | Allow fencing operations to be performed in parallel. |
|
| stop |
Determines how a cluster node should react if notified of its own fencing. A cluster node may receive notification of its own fencing if fencing is misconfigured, or if fabric fencing is in use that does not cut cluster communication. Allowed values are
Although the default value for this property is |
|
| 0 (disabled) | Sets a fencing delay that allows you to configure a two-node cluster so that in a split-brain situation the node with the fewest or least important resources running is the node that gets fenced. For general information about fencing delay parameters and their interactions, see Fencing delays. |
For information about setting cluster properties, see Setting and removing cluster properties.
9.4. Fencing delays Copy linkLink copied to clipboard!
In two-node clusters, simultaneous communication loss can cause nodes to fence each other, shutting down the entire cluster. Configure a fencing delay to prevent this race condition. Delays are unnecessary in larger clusters, where quorum determines fencing authority.
You can set different types of fencing delays, depending on your system requirements.
static fencing delays
A static fencing delay is a fixed, predetermined delay. Setting a static delay on one node makes that node more likely to be fenced because it increases the chances that the other node will initiate fencing first after detecting lost communication. In an active/passive cluster, setting a delay on a passive node makes it more likely that the passive node will be fenced when communication breaks down. You configure a static delay by using the
pcs_delay_basecluster property. You can set this property when a separate fence device is used for each node or when a single fence device is used for all nodes.dynamic fencing delays
A dynamic fencing delay is random. It can vary and is determined at the time fencing is needed. You configure a random delay and specify a maximum value for the combined base delay and random delay with the
pcs_delay_maxcluster property. When the fencing delay for each node is random, which node is fenced is also random. You may find this feature useful if your cluster is configured with a single fence device for all nodes in an active/active design.priority fencing delays
A priority fencing delay is based on active resource priorities. If all resources have the same priority, the node with the fewest resources running is the node that gets fenced. In most cases, you use only one delay-related parameter, but it is possible to combine them. Combining delay-related parameters adds the priority values for the resources together to create a total delay. You configure a priority fencing delay with the
priority-fencing-delaycluster property. You may find this feature useful in an active/active cluster design because it can make the node running the fewest resources more likely to be fenced when communication between the nodes is lost.
The pcmk_delay_base cluster property
Setting the pcmk_delay_base cluster property enables a base delay for fencing and specifies a base delay value.
When you set the pcmk_delay_max cluster property in addition to the pcmk_delay_base property, the overall delay is derived from a random delay value added to this static delay so that the sum is kept below the maximum delay. When you set pcmk_delay_base but do not set pcmk_delay_max, there is no random component to the delay and it will be the value of pcmk_delay_base.
You can specify different values for different nodes with the pcmk_delay_base parameter. This allows a single fence device to be used in a two-node cluster, with a different delay for each node. You do not need to configure two separate devices to use separate delays. To specify different values for different nodes, you map the host names to the delay value for that node using a similar syntax to pcmk_host_map. For example, node1:0;node2:10s would use no delay when fencing node1 and a 10-second delay when fencing node2.
- The
pcmk_delay_maxcluster property Setting the
pcmk_delay_maxcluster property enables a random delay for fencing actions and specifies the maximum delay, which is the maximum value of the combined base delay and random delay. For example, if the base delay is 3 andpcmk_delay_maxis 10, the random delay will be between 3 and 10.When you set the
pcmk_delay_basecluster property in addition to thepcmk_delay_maxproperty, the overall delay is derived from a random delay value added to this static delay so that the sum is kept below the maximum delay. When you setpcmk_delay_maxbut do not setpcmk_delay_basethere is no static component to the delay.
The priority-fencing-delay cluster property
Setting the priority-fencing-delay cluster property allows you to configure a two-node cluster so that in a split-brain situation the node with the fewest or least important resources running is the node that gets fenced.
The priority-fencing-delay property can be set to a time duration. The default value for this property is 0 (disabled). If this property is set to a non-zero value, and the priority meta-attribute is configured for at least one resource, then in a split-brain situation the node with the highest combined priority of all resources running on it will be more likely to remain operational. For example, if you set pcs resource defaults update priority=1 and pcs property set priority-fencing-delay=15s and no other priorities are set, then the node running the most resources will be more likely to remain operational because the other node will wait 15 seconds before initiating fencing. If a particular resource is more important than the rest, you can give it a higher priority.
The node running the promoted role of a promotable clone gets an extra 1 point if a priority has been configured for that clone.
Interaction of fencing delays
Setting more than one type of fencing delay yields the following results:
-
Any delay set with the
priority-fencing-delayproperty is added to any delay from thepcmk_delay_baseandpcmk_delay_maxfence device properties. This behavior allows some delay when both nodes have equal priority, or both nodes need to be fenced for some reason other than node loss, as whenon-fail=fencingis set for a resource monitor operation. When setting these delays in combination, set thepriority-fencing-delayproperty to a value that is significantly greater than the maximum delay frompcmk_delay_baseandpcmk_delay_maxto be sure the prioritized node is preferred. Setting this property to twice this value is always safe. -
Only fencing scheduled by Pacemaker itself observes fencing delays. Fencing scheduled by external code such as
dlm_controldand fencing implemented by thepcs stonith fencecommand do not provide the necessary information to the fence device. -
Some individual fence agents implement a delay parameter, with a name determined by the agent and which is independent of delays configured with a
pcmk_delay_* property. If both of these delays are configured, they are added together and would generally not be used in conjunction.
9.5. Testing a fence device Copy linkLink copied to clipboard!
Validate fence devices to ensure the cluster can successfully recover from node failures and prevent data corruption. A complete testing strategy involves verifying network connectivity, executing the fence agent script directly, triggering the fence action through the cluster manager, and simulating a physical node failure.
When a Pacemaker cluster node or Pacemaker remote node is fenced a hard kill should occur and not a graceful shutdown of the operating system. If a graceful shutdown occurs when your system fences a node, disable ACPI soft-off in the /etc/systemd/logind.conf file so that your system ignores any power-button-pressed signal. For instructions on disabling ACPI soft-off in the logind.conf file, see Disabling ACPI soft-off in the logind.conf file
Use the following procedure to test a fence device.
Procedure
Use SSH, Telnet, HTTP, or whatever remote protocol is used to connect to the device to manually log in and test the fence device or see what output is given. For example, if you will be configuring fencing for an IPMI-enabled device,then try to log in remotely with
ipmitool. Take note of the options used when logging in manually because those options might be needed when using the fencing agent.If you are unable to log in to the fence device, verify that the device is pingable, there is nothing such as a firewall configuration that is preventing access to the fence device, remote access is enabled on the fencing device, and the credentials are correct.
Run the fence agent manually, using the fence agent script. This does not require that the cluster services are running, so you can perform this step before the device is configured in the cluster. This can ensure that the fence device is responding properly before proceeding.
NoteThese examples use the
fence_ipmilanfence agent script for an iLO device. The actual fence agent you will use and the command that calls that agent will depend on your server hardware. You should consult the man page for the fence agent you are using to determine which options to specify. You will usually need to know the login and password for the fence device and other information related to the fence device.The following example shows the format you would use to run the
fence_ipmilanfence agent script with-o statusparameter to check the status of the fence device interface on another node without actually fencing it. This allows you to test the device and get it working before attempting to reboot the node. When running this command, you specify the name and password of an iLO user that has power on and off permissions for the iLO device.# fence_ipmilan -a ipaddress -l username -p password -o statusThe following example shows the format you would use to run the
fence_ipmilanfence agent script with the-o rebootparameter. Running this command on one node reboots the node managed by this iLO device.# fence_ipmilan -a ipaddress -l username -p password -o rebootIf the fence agent failed to properly do a status, off, on, or reboot action, you should check the hardware, the configuration of the fence device, and the syntax of your commands. In addition, you can run the fence agent script with the debug output enabled. The debug output is useful for some fencing agents to see where in the sequence of events the fencing agent script is failing when logging into the fence device.
# fence_ipmilan -a ipaddress -l username -p password -o status -D /tmp/$(hostname)-fence_agent.debugWhen diagnosing a failure that has occurred, you should ensure that the options you specified when manually logging in to the fence device are identical to what you passed on to the fence agent with the fence agent script.
For fence agents that support an encrypted connection, you may see an error due to certificate validation failing, requiring that you trust the host or that you use the fence agent’s
ssl-insecureparameter. Similarly, if SSL/TLS is disabled on the target device, you may need to account for this when setting the SSL parameters for the fence agent.NoteIf the fence agent that is being tested is a
fence_drac,fence_ilo, or some other fencing agent for a systems management device that continues to fail, then fall back to tryingfence_ipmilan. Most systems management cards support IPMI remote login and the only supported fencing agent isfence_ipmilan.Once the fence device has been configured in the cluster with the same options that worked manually and the cluster has been started, test fencing with the
pcs stonith fencecommand from any node (or even multiple times from different nodes), as in the following example. Thepcs stonith fencecommand reads the cluster configuration from the CIB and calls the fence agent as configured to execute the fence action. This verifies that the cluster configuration is correct.# pcs stonith fence node_nameIf the
pcs stonith fencecommand works properly, that means the fencing configuration for the cluster should work when a fence event occurs. If the command fails, it means that cluster management cannot invoke the fence device through the configuration it has retrieved. Check for the following issues and update your cluster configuration as needed.- Check your fence configuration. For example, if you have used a host map you should ensure that the system can find the node using the host name you have provided.
- Check whether the password and user name for the device include any special characters that could be misinterpreted by the bash shell. Making sure that you enter passwords and user names surrounded by quotation marks could address this issue.
-
Check whether you can connect to the device using the exact IP address or host name you specified in the
pcs stonithcommand. For example, if you give the host name in the stonith command but test by using the IP address, that is not a valid test. If the protocol that your fence device uses is accessible to you, use that protocol to try to connect to the device. For example many agents use ssh or telnet. You should try to connect to the device with the credentials you provided when configuring the device, to see if you get a valid prompt and can log in to the device.
If you determine that all your parameters are appropriate but you still have trouble connecting to your fence device, you can check the logging on the fence device itself, if the device provides that, which will show if the user has connected and what command the user issued. You can also search through the
/var/log/messagesfile for instances of stonith and error, which could give some idea of what is transpiring, but some agents can provide additional information.
Once the fence device tests are working and the cluster is up and running, test an actual failure. To do this, take an action in the cluster that should initiate a token loss.
Take down a network. How you take a network depends on your specific configuration. In many cases, you can physically pull the network or power cables out of the host. For information about simulating a network failure, see the Red Hat Knowledgebase solution What is the proper way to simulate a network failure on a RHEL Cluster?.
NoteDisabling the network interface on the local host rather than physically disconnecting the network or power cables is not recommended as a test of fencing because it does not accurately simulate a typical real-world failure.
Block corosync traffic both inbound and outbound using the local firewall.
The following example blocks corosync, assuming the default corosync port is used,
firewalldis used as the local firewall, and the network interface used by corosync is in the default firewall zone:# firewall-cmd --direct --add-rule ipv4 filter OUTPUT 2 -p udp --dport=5405 -j DROP # firewall-cmd --add-rich-rule='rule family="ipv4" port port="5405" protocol="udp" dropSimulate a crash and panic your machine with
sysrq-trigger. Note, however, that triggering a kernel panic can cause data loss; it is recommended that you disable your cluster resources first.# echo c > /proc/sysrq-trigger
9.6. Configuring fencing levels Copy linkLink copied to clipboard!
Pacemaker supports fencing nodes with multiple devices through a feature called fencing topologies. To implement topologies, create the individual devices as you normally would and then define one or more fencing levels in the fencing topology section in the configuration.
Pacemaker processes fencing levels as follows:
- Each level is attempted in ascending numeric order, starting at 1.
- If a device fails, processing terminates for the current level. No further devices in that level are exercised and the next level is attempted instead.
- If all devices are successfully fenced, then that level has succeeded and no other levels are tried.
- The operation is finished when a level has passed (success), or all levels have been attempted (failed).
Use the following command to add a fencing level to a node. The devices are given as a comma-separated list of stonith ids, which are attempted for the node at that level.
pcs stonith level add level node devices
The following example sets up fence levels so that if the device my_ilo fails and is unable to fence the node, then Pacemaker attempts to use the device my_apc.
Prerequisites
-
You have configured an ilo fence device called
my_ilofor noderh7-2. -
You have configured an apc fence device called
my_apcfor noderh7-2.
Procedure
Add a fencing level of 1 for fence device
my_iloon noderh7-2.# pcs stonith level add 1 rh7-2 my_iloAdd a fencing level of 2 for fence device
my_apcon noderh7-2.# pcs stonith level add 2 rh7-2 my_apcList the currently configured fencing levels.
# pcs stonith level Node: rh7-2 Level 1 - my_ilo Level 2 - my_apc
9.7. Removing a fence level Copy linkLink copied to clipboard!
You can remove the fence level for the specified node and device. If no nodes or devices are specified then the fence level you specify is removed from all nodes.
Procedure
Remove the fence level for the specified node and device:
# pcs stonith level remove level [node_id] [stonith_id] ... [stonith_id]
9.8. Clearing fence levels Copy linkLink copied to clipboard!
You can clear the fence levels on the specified node or stonith id. If you do not specify a node or stonith id, all fence levels are cleared.
Procedure
Clear the fence levels on the specified node or stinith id:
# pcs stonith level clear [node]|stonith_id(s)]If you specify more than one stonith id, they must be separated by a comma and no spaces, as in the following example.
# pcs stonith level clear dev_a,dev_b
9.9. Verifying nodes and devices in fence levels Copy linkLink copied to clipboard!
You can verify that all fence devices and nodes specified in fence levels exist.
Procedure
Use the following command to verify that all fence devices and nodes specified in fence levels exist:
# pcs stonith level verify
9.10. Specifying nodes in fencing topology Copy linkLink copied to clipboard!
You can specify nodes in fencing topology by a regular expression applied on a node name and by a node attribute and its value.
Procedure
The following commands configure nodes
node1,node2, andnode3to use fence devicesapc1andapc2, and nodesnode4,node5, andnode6to use fence devicesapc3andapc4:# pcs stonith level add 1 "regexp%node[1-3]" apc1,apc2 # pcs stonith level add 1 "regexp%node[4-6]" apc3,apc4The following commands yield the same results by using node attribute matching:
# pcs node attribute node1 rack=1 # pcs node attribute node2 rack=1 # pcs node attribute node3 rack=1 # pcs node attribute node4 rack=2 # pcs node attribute node5 rack=2 # pcs node attribute node6 rack=2 # pcs stonith level add 1 attrib%rack=1 apc1,apc2 # pcs stonith level add 1 attrib%rack=2 apc3,apc4
9.11. Configuring fencing for redundant power supplies Copy linkLink copied to clipboard!
When configuring fencing for redundant power supplies, the cluster must ensure that when attempting to reboot a host, both power supplies are turned off before either power supply is turned back on.
If the node never completely loses power, the node may not release its resources. This opens up the possibility of nodes accessing these resources simultaneously and corrupting them.
You need to define each device only once and to specify that both are required to fence the node.
Procedure
Create the first fence device.
# pcs stonith create apc1 fence_apc_snmp ipaddr=apc1.example.com login=user passwd='7a4D#1j!pz864' pcmk_host_map="node1.example.com:1;node2.example.com:2"Create the second fence device.
# pcs stonith create apc2 fence_apc_snmp ipaddr=apc2.example.com login=user passwd='7a4D#1j!pz864' pcmk_host_map="node1.example.com:1;node2.example.com:2"Specify that both devices are required to fence the node.
# pcs stonith level add 1 node1.example.com apc1,apc2 # pcs stonith level add 1 node2.example.com apc1,apc2
9.12. Administering fence devices Copy linkLink copied to clipboard!
The pcs command-line interface provides a variety of commands you can use to administer your fence devices after you have configured them.
9.12.1. Displaying configured fence devices Copy linkLink copied to clipboard!
The following command shows all currently configured fence devices. If a stonith_id is specified, the command shows the options for that configured fencing device only. If the --full option is specified, all configured fencing options are displayed.
pcs stonith config [stonith_id] [--full]
9.12.2. Exporting fence devices as pcs commands Copy linkLink copied to clipboard!
You can display the pcs commands that can be used to re-create configured fence devices on a different system using the --output-format=cmd option of the pcs stonith config command.
The following commands create a fence_apc_snmp fence device and display the pcs command you can use to re-create the device.
# pcs stonith create myapc fence_apc_snmp ip="zapc.example.com" pcmk_host_map="z1.example.com:1;z2.example.com:2" username="apc" password="apc"
# pcs stonith config --output-format=cmd
Warning: Only 'text' output format is supported for stonith levels
pcs stonith create --no-default-ops --force -- myapc fence_apc_snmp \
ip=zapc.example.com password=apc 'pcmk_host_map=z1.example.com:1;z2.example.com:2' username=apc \
op \
monitor interval=60s id=myapc-monitor-interval-60s
9.12.3. Exporting fence level configuration Copy linkLink copied to clipboard!
The pcs stonith config and the pcs stonith level config commands support the --output-format= option to export the fencing level configuration in JSON format and as pcs commands.
-
Specifying
--output-format=cmddisplays thepcscommands created from the current cluster configuration that configure fencing levels. You can use these commands to re-create configured fencing levels on a different system. -
Specifying
--output-format=jsondisplays the fencing level configuration in JSON format, which is suitable for machine parsing.
9.12.4. Modifying and deleting fence devices Copy linkLink copied to clipboard!
Modify or add options to a currently configured fencing device with the following command.
pcs stonith update stonith_id [stonith_device_options]
Updating a SCSI fencing device with the pcs stonith update command causes a restart of all resources running on the same node where the fencing resource was running. You can use either version of the following command to update SCSI devices without causing a restart of other cluster resources. SCSI fencing devices can be configured as multipath devices.
pcs stonith update-scsi-devices stonith_id set device-path1 device-path2
pcs stonith update-scsi-devices stonith_id add device-path1 remove device-path2
Use the following command to remove a fencing device from the current configuration.
pcs stonith delete stonith_id
9.12.5. Manually fencing a cluster node Copy linkLink copied to clipboard!
You can fence a node manually with the following command. If you specify the --off option this will use the off API call to stonith which will turn the node off instead of rebooting it.
pcs stonith fence node [--off]
In a situation where no fence device is able to fence a node even if it is no longer active, the cluster may not be able to recover the resources on the node. If this occurs, after manually ensuring that the node is powered down you can enter the following command to confirm to the cluster that the node is powered down and free its resources for recovery.
If the node you specify is not actually off, but running the cluster software or services normally controlled by the cluster, data corruption and cluster failure occurs.
pcs stonith confirm node
9.12.6. Disabling a fence device Copy linkLink copied to clipboard!
To disable a fencing device, run the pcs stonith disable command.
The following command disables the fence device myapc.
# pcs stonith disable myapc
9.12.7. Preventing a node from using a fencing device Copy linkLink copied to clipboard!
To prevent a specific node from using a fencing device, you can configure location constraints for the fencing resource.
The following example prevents fence device node1-ipmi from running on node1.
# pcs constraint location node1-ipmi avoids node1
Chapter 10. Configuring cluster resources Copy linkLink copied to clipboard!
A cluster resource is an instance of a program, an application, or data to be managed by the cluster service. These resources are abstracted by agents that provide a standard interface for managing the resource in a cluster environment.
To ensure that resources remain healthy, you can add a monitoring operation to a resource’s definition. If you do not specify a monitoring operation for a resource, one is added by default. You can determine the behavior of a resource in a cluster by configuring constraints for that resource. You can configure the following categories of constraints:
-
locationconstraints - A location constraint determines which nodes a resource can run on. For information about configuring location constraints, see Determining which nodes a resource can run on. -
orderconstraints - An ordering constraint determines the order in which the resources run. For information about configuring ordering constraints, see Determining the order in which cluster resources are run. -
colocationconstraints - A colocation constraint determines where resources will be placed relative to other resources. For information about colocation constraints, see Colocating cluster resources.
As a shorthand for configuring a set of constraints that will locate a set of resources together and ensure that the resources start sequentially and stop in reverse order, Pacemaker supports the concept of resource groups. After you have created a resource group, you can configure constraints on the group itself just as you configure constraints for individual resources. For information about configuring resource groups, see Configuring resource groups.
The format for the command to create a cluster resource is as follows:
pcs resource create resource_id [standard:[provider:]]type [resource_options] [op operation_action operation_options [operation_action operation options]...] [meta meta_options...] [clone [clone_id] [clone_options] | promotable [clone_id] [clone_options] [--wait[=n]]
Key cluster resource creation options include the following:
-
The
--beforeand--afteroptions specify the position of the added resource relative to a resource that already exists in a resource group. -
Specifying the
--disabledoption indicates that the resource is not started automatically.
There is no limit to the number of resources you can create in a cluster.
10.1. Resource creation examples Copy linkLink copied to clipboard!
The following command creates a resource with the name VirtualIP of standard ocf, provider heartbeat, and type IPaddr2. The floating address of this resource is 192.168.0.120, and the system will check whether the resource is running every 30 seconds. For information about resource standards and providers, see Resource agent identifiers.
# pcs resource create VirtualIP ocf:heartbeat:IPaddr2 ip=192.168.0.120 cidr_netmask=24 op monitor interval=30s
Alternately, you can omit the standard and provider fields and use the following command. This will default to a standard of ocf and a provider of heartbeat.
# pcs resource create VirtualIP IPaddr2 ip=192.168.0.120 cidr_netmask=24 op monitor interval=30s
10.2. Deleting a configured resource Copy linkLink copied to clipboard!
Delete a configured resource with the following command.
pcs resource delete resource_id
For example, the following command deletes an existing resource with a resource ID of VirtualIP.
# pcs resource delete VirtualIP
You can delete multiple resources with a single command. The following command deletes an existing resource with a resource ID of resource1 and an existing resource with a resource ID of resource2.
# pcs resource delete resource1 resource2
10.3. Resource agent identifiers Copy linkLink copied to clipboard!
The identifiers that you define for a resource tell the cluster which agent to use for the resource, where to find that agent and what standards it conforms to.
The following table describes these properties of a resource agent.
| Field | Description |
|---|---|
| standard | The standard the agent conforms to. Allowed values and their meaning:
*
*
*
*
* |
| type |
The name of the resource agent you wish to use, for example |
| provider |
The OCF spec allows multiple vendors to supply the same resource agent. Most of the agents shipped by Red Hat use |
10.4. Displaying resources and resource parameters Copy linkLink copied to clipboard!
Inspect and monitor cluster resources using pcs display commands. You can verify resource status, review configuration parameters, and list available resource agents, providers, and standards to ensure your cluster is configured correctly.
| pcs Display Command | Output |
|---|---|
|
|
Displays a description of an individual resource, the parameters you can set for that resource, and the default values for the resource. For example, |
|
| Displays a list of all configured resources. |
|
| Displays the configured parameters for a resource. |
|
| Displays the status of an individual resource. |
|
| Displays the status of the resources running on a specific node, use the following command. You can use this command to display the status of resources on both cluster and remote nodes. |
|
| Displays a list of all available resources. |
|
| Displays a list of available resource agent standards. |
|
| Displays a list of available resource agent providers. |
|
| Displays a list of available resources filtered by the specified string. You can use this command to display resources filtered by the name of a standard, a provider, or a type. |
10.5. Configuring resource meta options Copy linkLink copied to clipboard!
In addition to the resource-specific parameters, you can configure additional resource options for any resource. These options are used by the cluster to decide how your resource should behave.
The following table describes the resource meta options.
| Field | Default | Description |
|---|---|---|
|
|
| If not all resources can be active, the cluster will stop lower priority resources in order to keep higher priority ones active. |
|
|
| Indicates what state the cluster should attempt to keep this resource in. Allowed values:
*
*
*
* |
|
|
|
Indicates whether the cluster is allowed to start and stop the resource. Allowed values: |
|
| 1 | Value to indicate how much the resource prefers to stay where it is. |
|
| Calculated | Indicates under what conditions the resource can be started.
Defaults to
*
*
*
* |
|
|
|
How many failures may occur for this resource on a node before this node is marked ineligible to host this resource. A value of 0 indicates that this feature is disabled (the node will never be marked ineligible); by contrast, the cluster treats |
|
|
|
Used in conjunction with the |
|
|
| Indicates what the cluster should do if it ever finds the resource active on more than one node. Allowed values:
*
*
*
* |
|
|
|
Sets the default value for the |
|
|
|
When set to |
10.6. Setting meta options Copy linkLink copied to clipboard!
You can set a resource option for a particular resource to a value other than the default when you create the resource. You can also set the value of a resource meta option for an existing resource, group, or cloned resource.
The following procedure provides example commands that set the value of resource meta options both on resource creation and on an existing resource.
Procedure
Create a resource with a
resource-stickinessvalue of 50.# pcs resource create VirtualIP ocf:heartbeat:IPaddr2 ip=192.168.0.120 meta resource-stickiness=50For the existing resource named
dummy_resource, set thefailure-timeoutmeta option to 20 seconds, so that the resource can attempt to restart on the same node in 20 seconds.# pcs resource meta dummy_resource failure-timeout=20sDisplay the values for the resource to verify that
failure-timeout=20sis set.# pcs resource config dummy_resource Resource: dummy_resource (class=ocf provider=heartbeat type=Dummy) Meta Attrs: failure-timeout=20s ...
10.7. Changing the default value of a resource option Copy linkLink copied to clipboard!
You can change the default value of a resource option for all resources with the pcs resource defaults update command.
Procedure
The following command resets the default value of
resource-stickinessto 100:# pcs resource defaults update resource-stickiness=100The original
pcs resource defaults name=valuecommand, which set defaults for all resources in previous releases, remains supported unless there is more than one set of defaults configured. However,pcs resource defaults updateis now the preferred version of the command.
10.8. Changing the default value of a resource option for sets of resources Copy linkLink copied to clipboard!
You can create multiple sets of resource defaults with the pcs resource defaults set create command, which allows you to specify a rule that contains resource expressions. Only resource and date expressions, including and, or and parentheses, are allowed in rules that you specify with this command.
With the pcs resource defaults set create command, you can configure a default resource value for all resources of a particular type. If, for example, you are running databases which take a long time to stop, you can increase the resource-stickiness default value for all resources of the database type to prevent those resources from moving to other nodes more often than you desire.
Procedure
The following command sets the default value of
resource-stickinessto 100 for all resources of typepqsql:# pcs resource defaults set create id=pgsql-stickiness meta resource-stickiness=100 rule resource ::pgsqlIn this example,
::pgsqlmeans a resource of any class, any provider, of typepgsql.-
Specifying
ocf:heartbeat:pgsqlwould indicate classocf, providerheartbeat, typepgsql, -
Specifying
ocf:pacemaker:would indicate all resources of classocf, providerpacemaker, of any type.
-
Specifying
The
idoption, which names the set of resource defaults, is not mandatory. If you do not set this optionpcswill generate an ID automatically. Setting this value allows you to provide a more descriptive name.To change the default values in an existing set, use the
pcs resource defaults set updatecommand.
10.9. Displaying currently configured resource defaults Copy linkLink copied to clipboard!
The pcs resource defaults [config] command displays a list of currently configured default values for resource options, including any rules that you specified. You can display the output of this command in text, JSON, and command formats.
-
Specifying
--output-format=textdisplays the configured resource defaults in plain text format, which is the default value for this option. -
Specifying
--output-format=cmddisplays thepcs resource defaultscommands created from the current cluster defaults configuration. You can use these commands to re-create configured resource defaults on a different system. -
Specifying
--output-format=jsondisplays the configured resource defaults in JSON format, which is suitable for machine parsing.
The following example procedure shows the three different output formats of the pcs resource defaults config command after you reset the default values for a resource.
Procedure
Reset the default values for or any
ocf:pacemaker:pgsqlresource.# pcs resource defaults set create id=set-1 score=100 meta resource-stickiness=10 rule resource ocf:pacemaker:pgsqlDisplay the configured resource default values in plain text.
# pcs resource defaults config Meta Attrs: build-resource-defaults resource-stickiness=1 Meta Attrs: set-1 score=100 resource-stickiness=10 Rule: boolean-op=and score=INFINITY Expression: resource ocf:pacemaker:pgsqlDisplay the
pcs resource defaultscommands created from the current cluster defaults configuration.# pcs resource defaults config --output-format=cmd pcs -- resource defaults set create id=build-resource-defaults \ meta resource-stickiness=1; pcs -- resource defaults set create id=set-1 score=100 \ meta resource-stickiness=10 \ rule 'resource ocf:pacemaker:pgsql'Display the configured resource default values in JSON format.
# pcs resource defaults config --output-format=json {"instance_attributes": [], "meta_attributes": [{"id": "build-resource-defaults", "options": {}, "rule": null, "nvpairs": [{"id": "build-resource-stickiness", "name": "resource-stickiness", "value": "1"}]}, {"id": "set-1", "options": {"score": "100"}, "rule": {"id": "set-1-rule", "type": "RULE", "in_effect": "UNKNOWN", "options": {"boolean-op": "and", "score": "INFINITY"}, "date_spec": null, "duration": null, "expressions": [{"id": "set-1-rule-rsc-ocf-pacemaker-pgsql", "type": "RSC_EXPRESSION", "in_effect": "UNKNOWN", "options": {"class": "ocf", "provider": "pacemaker", "type": "pgsql"}, "date_spec": null, "duration": null, "expressions": [], "as_string": "resource ocf:pacemaker:pgsql"}], "as_string": "resource ocf:pacemaker:pgsql"}, "nvpairs": [{"id": "set-1-resource-stickiness", "name": "resource-stickiness", "value": "10"}]}]}
10.10. Configuring resource groups Copy linkLink copied to clipboard!
One of the most common elements of a cluster is a set of resources that need to be located together, start sequentially, and stop in the reverse order. To simplify this configuration, Pacemaker supports the concept of resource groups.
Creating a resource group
You can create a resource group with the following command, specifying the resources to include in the group. If the group does not exist, this command creates the group. If the group exists, this command adds additional resources to the group. The resources will start in the order you specify them with this command, and will stop in the reverse order of their starting order.
# pcs resource group add group_name resource_id [resource_id] ... [resource_id] [--before resource_id | --after resource_id]
You can use the --before and --after options of this command to specify the position of the added resources relative to a resource that already exists in the group.
You can also add a new resource to an existing group when you create the resource, using the following command. The resource you create is added to the group named group_name. If the group group_name does not exist, it will be created.
# pcs resource create resource_id [standard:[provider:]]type [resource_options] [op operation_action operation_options] --group group_name
There is no limit to the number of resources a group can contain. The fundamental properties of a group are as follows.
- Resources are colocated within a group.
- Resources are started in the order in which you specify them. If a resource in the group cannot run anywhere, then no resource specified after that resource is allowed to run.
- Resources are stopped in the reverse order in which you specify them.
Additional properties of a group are as follows:
-
You can set the following options for a resource group, and they maintain the same meaning as when they are set for a single resource:
priority,target-role,is-managed. For information about resource meta options, see Configuring resource meta options. -
Stickiness, the measure of how much a resource wants to stay where it is, is additive in groups. Every active resource of the group will contribute its stickiness value to the group’s total. So if the default
resource-stickinessis 100, and a group has seven members, five of which are active, then the group as a whole will prefer its current location with a score of 500.
The following example creates a resource group named shortcut that contains the existing resources IPaddr and Email.
# pcs resource group add shortcut IPaddr Email
In this example:
-
The
IPaddris started first, thenEmail. -
The
Emailresource is stopped first, thenIPAddr. -
If
IPaddrcannot run anywhere, neither canEmail. -
If
Emailcannot run anywhere, however, this does not affectIPaddr.
Removing a resource group
You remove a resource from a group with the following command. If there are no remaining resources in the group, this command removes the group itself.
# pcs resource group remove group_name resource_id...
Displaying resource groups
The following command lists all currently configured resource groups.
# pcs resource group list
10.11. Displaying resource dependencies Copy linkLink copied to clipboard!
You can display the relations between cluster resources in a tree structure.
pcs resource relations resource [--full]
If the --full option is used, the command displays additional information, including the constraint IDs and the resource types.
In the following example, there are 3 configured resources: C, D, and E.
# pcs constraint order start C then start D
Adding C D (kind: Mandatory) (Options: first-action=start then-action=start)
# pcs constraint order start D then start E
Adding D E (kind: Mandatory) (Options: first-action=start then-action=start)
# pcs resource relations C
C
`- order
| start C then start D
`- D
`- order
| start D then start E
`- E
# pcs resource relations D
D
|- order
| | start C then start D
| `- C
`- order
| start D then start E
`- E
# pcs resource relations E
E
`- order
| start D then start E
`- D
`- order
| start C then start D
`- C
In the following example, there are 2 configured resources: A and B. Resources A and B are part of resource group G.
# pcs resource relations A
A
`- outer resource
`- G
`- inner resource(s)
| members: A B
`- B
# pcs resource relations B
B
`- outer resource
`- G
`- inner resource(s)
| members: A B
`- A
# pcs resource relations G
G
`- inner resource(s)
| members: A B
|- A
`- B
Chapter 11. Determining which nodes a resource can run on Copy linkLink copied to clipboard!
Location constraints determine which nodes a resource can run on. You can configure location constraints to determine whether a resource will prefer or avoid a specified node.
In addition to location constraints, the node on which a resource runs is influenced by the resource-stickiness value for that resource, which determines to what degree a resource prefers to remain on the node where it is currently running. For information about setting the resource-stickiness value, see Configuring a resource to prefer its current node.
11.1. Configuring location constraints Copy linkLink copied to clipboard!
You can configure a location constraint to control which nodes a cluster resource can run on in a cluster. You can use a location constraint to make a resource prefer a specific node or to prevent it from running on certain nodes.
11.1.1. Configuring a basic location constraint Copy linkLink copied to clipboard!
You can configure a basic location constraint to specify whether a resource prefers or avoids a node, with an optional score value to indicate the relative degree of preference for the constraint.
Procedure
The following command creates a location constraint for a resource to prefer the specified node or nodes. Note that it is possible to create constraints on a particular resource for more than one node with a single command:
# pcs constraint location rsc prefers node[=score] [node[=score]] ...The following command creates a location constraint for a resource to avoid the specified node or nodes:
# pcs constraint location rsc avoids node[=score] [node[=score]] ...The following table summarizes the meanings of the basic options for configuring location constraints:
Expand Table 11.1. Location Constraint Options Field Description rscA resource name
nodeA node’s name
scorePositive integer value to indicate the degree of preference for whether the given resource should prefer or avoid the given node.
INFINITYis the defaultscorevalue for a resource location constraint.A value of
INFINITYforscorein apcs constraint location rsc preferscommand indicates that the resource will prefer that node if the node is available, but does not prevent the resource from running on another node if the specified node is unavailable.A value of
INFINITYforscorein apcs constraint location rsc avoidscommand indicates that the resource will never run on that node, even if no other node is available. This is the equivalent of setting apcs constraint location addcommand with a score of-INFINITY.A numeric score (that is, not
INFINITY) means the constraint is optional, and will be honored unless some other factor outweighs it. For example, if the resource is already placed on a different node, and itsresource-stickinessscore is higher than apreferslocation constraint’s score, then the resource will be left where it is.
11.1.2. Configuring a location constraint with regular expressions Copy linkLink copied to clipboard!
pcs supports regular expressions in location constraints to match resource names. Use this feature to configure multiple location constraints with a single command.
Procedure
The following command creates a location constraint to specify that resources
dummy0todummy9prefernode1:# pcs constraint location 'regexp%dummy[0-9]' prefers node1
11.1.3. Configuring a location constraint with extended regular expressions Copy linkLink copied to clipboard!
Since Pacemaker uses POSIX extended regular expressions as documented at 9.4 Extended Regular Expressions section of the The Open Group Base Specifications Issue 7, you can specify the same constraint with the following command.
Procedure
To configure a location constraint with extended regular expressions:
# pcs constraint location 'regexp%dummy[[:digit:]]' prefers node1
11.1.4. Displaying location constraints Copy linkLink copied to clipboard!
View location constraints to verify resource placement logic. You can organize the display by resource or node, filter for specific items, and include expired constraints or internal IDs for troubleshooting.
Procedure
To list all current location constraints:
# pcs constraint location [config [resources [resource...]] | [nodes [node...]]] [--full]-
If
resourcesis specified, the command displays constraints per resource (default). -
If
nodesis specified, the command displays constraints per node. - If you specify specific resources or nodes, the command displays only that information.
-
If
To display all current location, order, and colocation constraints, use the following command. To show the internal constraint IDS, specify the
--fulloption:# pcs constraint [config] [--full]By default, listing resource constraints does not display expired constraints. To include expired constraints in the listing, use the
--alloption of thepcs constraintcommand. This will list expired constraints, noting the constraints and their associated rules as(expired)in the display.To list the constraints that reference specific resources:
# pcs constraint ref resource ...
11.2. Limiting resource discovery to a subset of nodes Copy linkLink copied to clipboard!
Before Pacemaker starts a resource anywhere, it first runs a one-time monitor operation (often referred to as a "probe") on every node, to learn whether the resource is already running. This process of resource discovery can result in errors on nodes that are unable to execute the monitor.
When configuring a location constraint on a node, you can use the resource-discovery option of the pcs constraint location command to indicate a preference for whether Pacemaker should perform resource discovery on this node for the specified resource. Limiting resource discovery to a subset of nodes the resource is physically capable of running on can significantly boost performance when a large set of nodes is present. When pacemaker_remote is in use to expand the node count into the hundreds of nodes range, this option should be considered.
The following command shows the format for specifying the resource-discovery option of the pcs constraint location command. In this command, a positive value for score corresponds to a basic location constraint that configures a resource to prefer a node, while a negative value for score corresponds to a basic location`constraint that configures a resource to avoid a node. As with basic location constraints, you can use regular expressions for resources with these constraints as well.
# pcs constraint location add id rsc node score [resource-discovery=option]
The following table summarizes the meanings of the basic parameters for configuring constraints for resource discovery.
| Field | Description |
|
| A user-chosen name for the constraint itself. |
|
| A resource name |
|
| A node’s name |
|
| Integer value to indicate the degree of preference for whether the given resource should prefer or avoid the given node. A positive value for score corresponds to a basic location constraint that configures a resource to prefer a node, while a negative value for score corresponds to a basic location constraint that configures a resource to avoid a node.
A value of
A numeric score (that is, not |
|
|
*
*
* |
Setting resource-discovery to never or exclusive removes Pacemaker’s ability to detect and stop unwanted instances of a service running where it is not supposed to be. It is up to the system administrator to make sure that the service can never be active on nodes without resource discovery (such as by leaving the relevant software uninstalled).
11.3. Configuring a location constraint strategy Copy linkLink copied to clipboard!
When using location constraints, you can configure a general strategy for specifying which nodes a resource can run on.
- Opt-in clusters - Configure a cluster in which, by default, no resource can run anywhere and then selectively enable allowed nodes for specific resources.
- Opt-out clusters - Configure a cluster in which, by default, all resources can run anywhere and then create location constraints for resources that are not allowed to run on specific nodes.
Whether you should choose to configure your cluster as an opt-in or opt-out cluster depends on both your personal preference and the make-up of your cluster. If most of your resources can run on most of the nodes, then an opt-out arrangement is likely to result in a simpler configuration. On the other hand, if most resources can only run on a small subset of nodes an opt-in configuration might be simpler.
Configuring an "Opt-In" cluster
To create an opt-in cluster, set the symmetric-cluster cluster property to false to prevent resources from running anywhere by default.
# pcs property set symmetric-cluster=false
Enable nodes for individual resources. The following commands configure location constraints so that the resource Webserver prefers node example-1, the resource Database prefers node example-2, and both resources can fail over to node example-3 if their preferred node fails. When configuring location constraints for an opt-in cluster, setting a score of zero allows a resource to run on a node without indicating any preference to prefer or avoid the node.
# pcs constraint location Webserver prefers example-1=200
# pcs constraint location Webserver prefers example-3=0
# pcs constraint location Database prefers example-2=200
# pcs constraint location Database prefers example-3=0
Configuring an "Opt-Out" cluster
To create an opt-out cluster, set the symmetric-cluster cluster property to true to allow resources to run everywhere by default. This is the default configuration if symmetric-cluster is not set explicitly.
# pcs property set symmetric-cluster=true
The following commands will then yield a configuration that is equivalent to the example in "Configuring an "Opt-In" cluster". Both resources can fail over to node example-3 if their preferred node fails, since every node has an implicit score of 0.
# pcs constraint location Webserver prefers example-1=200
# pcs constraint location Webserver avoids example-2=INFINITY
# pcs constraint location Database avoids example-1=INFINITY
# pcs constraint location Database prefers example-2=200
Note that it is not necessary to specify a score of INFINITY in these commands, since that is the default value for the score.
11.4. Configuring a resource to prefer its current node Copy linkLink copied to clipboard!
Configure resource-stickiness to define a resource’s preference for remaining on its current node. Pacemaker compares this value against other scores, such as location constraints, to prevent unnecessary migrations. For setup details, see Configuring resource meta options.
With a resource-stickiness value of 0, a cluster may move resources as needed to balance resources across nodes. This may result in resources moving when unrelated resources start or stop. With a positive stickiness, resources have a preference to stay where they are, and move only if other circumstances outweigh the stickiness. This may result in newly-added nodes not getting any resources assigned to them without administrator intervention.
Newly-created clusters set the default value for resource-stickiness to 1. This small value can easily be overridden by other constraints that you create, but it is enough to prevent Pacemaker from needlessly moving healthy resources around the cluster. If you prefer cluster behavior that results from a resource-stickiness value of 0, you can change the resource-stickiness default value to 0 with the following command:
Example 11.1. Example command
# pcs resource defaults update resource-stickiness=0
With a positive resource-stickiness value, no resources will move to a newly-added node. If resource balancing is desired at that point, you can temporarily set the resource-stickiness value to 0.
Note that if a location constraint score is higher than the resource-stickiness value, the cluster may still move a healthy resource to the node where the location constraint points.
Chapter 12. Determining the order in which cluster resources are run Copy linkLink copied to clipboard!
To determine the order in which the resources run, you must configure an ordering constraint.
The following shows the format for the command to configure an ordering constraint.
pcs constraint order [action] resource_id then [action] resource_id [options]
For example the following command configures an ordering constraint to ensure that the resource firstresource starts first, before the resource secondresource.
# pcs constraint order start firstresource then secondresource
The following table summarizes the properties and options for configuring ordering constraints.
| Field | Description |
|---|---|
| resource_id | The name of a resource on which an action is performed. |
| action | The action to be ordered on the resource. Possible values of the action property are as follows:
*
*
*
*
If no action is specified, the default action is |
|
|
How to enforce the constraint. The possible values of the
*
*
* |
|
|
If true, the reverse of the constraint applies for the opposite action (for example, if B starts after A starts, then B stops before A stops). Ordering constraints for which |
- Displaying ordering constraints
The following command lists all current ordering constraints.
pcs constraint order [config]You can display all current location, order, and colocation constraints with the following command. To show the internal constraint IDS, specify the
--fulloption.pcs constraint [config] [--full]By default, listing resource constraints does not display expired constraints. To include expired constaints in the listing, use the
--alloption of thepcs constraintcommand. This will list expired constraints, noting the constraints and their associated rules as(expired)in the display.The following command lists the constraints that reference specific resources.
pcs constraint ref resource ...- Removing resources from an ordering constraint
Use the following command to remove resources from any ordering constraint.
pcs constraint order remove resource1 [resourceN]...
12.1. Configuring mandatory ordering Copy linkLink copied to clipboard!
A mandatory ordering constraint ensures the second resource waits for the first to complete its action. Valid actions include start, stop, promote, and demote. For example, "start A then start B" prevents B from starting until A succeeds. To enable this, set the kind option to Mandatory or use the default.
If the symmetrical option is set to true or left to default, the opposite actions will be ordered in reverse. The start and stop actions are opposites, and demote and promote are opposites. For example, a symmetrical "promote A then start B" ordering implies "stop B then demote A", which means that A cannot be demoted until and unless B successfully stops. A symmetrical ordering means that changes in A’s state can cause actions to be scheduled for B. For example, given "A then B", if A restarts due to failure, B will be stopped first, then A will be stopped, then A will be started, then B will be started.
Note that the cluster reacts to each state change. If the first resource is restarted and is in a started state again before the second resource initiated a stop operation, the second resource will not need to be restarted.
12.2. Configuring advisory ordering Copy linkLink copied to clipboard!
When the kind=Optional option is specified for an ordering constraint, the constraint is considered optional and only applies if both resources are executing the specified actions. Any change in state by the first resource you specify will have no effect on the second resource you specify.
Procedure
Configure an advisory ordering constraint for the resources named
VirtualIPanddummy_resource:# pcs constraint order VirtualIP then dummy_resource kind=Optional
12.3. Configuring ordered resource sets Copy linkLink copied to clipboard!
A common situation is for an administrator to create a chain of ordered resources, where, for example, resource A starts before resource B which starts before resource C. If your configuration requires that you create a set of resources that is colocated and started in order, you can configure a resource group that contains those resources.
There are some situations, however, where configuring the resources that need to start in a specified order as a resource group is not appropriate:
- You may need to configure resources to start in order and the resources are not necessarily colocated.
- You may have a resource C that must start after either resource A or B has started but there is no relationship between A and B.
- You may have resources C and D that must start after both resources A and B have started, but there is no relationship between A and B or between C and D.
In these situations, you can create an ordering constraint on a set or sets of resources with the pcs constraint order set command.
You can set the following options for a set of resources with the pcs constraint order set command.
sequential, which can be set totrueorfalseto indicate whether the set of resources must be ordered relative to each other. The default value istrue.Setting
sequentialtofalseallows a set to be ordered relative to other sets in the ordering constraint, without its members being ordered relative to each other. Therefore, this option makes sense only if multiple sets are listed in the constraint; otherwise, the constraint has no effect.-
require-all, which can be set totrueorfalseto indicate whether all of the resources in the set must be active before continuing. Settingrequire-alltofalsemeans that only one resource in the set needs to be started before continuing on to the next set. Settingrequire-alltofalsehas no effect unless used in conjunction with unordered sets, which are sets for whichsequentialis set tofalse. The default value istrue. -
action, which can be set tostart,promote,demoteorstop, as described in the "Properties of an Order Constraint" table in Determining the order in which cluster resources are run. -
role, which can be set toStopped,Started,Promoted, orUnpromoted.
You can set the following constraint options for a set of resources following the setoptions parameter of the pcs constraint order set command.
-
id, to provide a name for the constraint you are defining. -
kind, which indicates how to enforce the constraint, as described in the "Properties of an Order Constraint" table in Determining the order in which cluster resources are run. -
symmetrical, to set whether the reverse of the constraint applies for the opposite action, as described in in the "Properties of an Order Constraint" table in Determining the order in which cluster resources are run.
# pcs constraint order set resource1 resource2 [resourceN]... [options] [set resourceX resourceY ... [options]] [setoptions [constraint_options]]
If you have three resources named D1, D2, and D3, the following command configures them as an ordered resource set.
# pcs constraint order set D1 D2 D3
If you have six resources named A, B, C, D, E, and F, this example configures an ordering constraint for the set of resources that will start as follows:
-
AandBstart independently of each other -
Cstarts once eitherAorBhas started -
Dstarts onceChas started -
EandFstart independently of each other onceDhas started
Stopping the resources is not influenced by this constraint since symmetrical=false is set.
# pcs constraint order set A B sequential=false require-all=false set C D set E F sequential=false setoptions symmetrical=false
12.4. Configuring startup order for resource dependencies not managed by Pacemaker Copy linkLink copied to clipboard!
It is possible for a cluster to include resources with dependencies that are not themselves managed by the cluster. In this case, you must ensure that those dependencies are started before Pacemaker is started and stopped after Pacemaker is stopped.
You can configure your startup order to account for this situation by means of the systemd resource-agents-deps target. You can create a systemd drop-in unit for this target and Pacemaker will order itself appropriately relative to this target.
12.4.1. Configuring startup order for an external service Copy linkLink copied to clipboard!
If a cluster includes a resource that depends on the external service foo that is not managed by the cluster, you can configure a startup order for an external service.
Procedure
Create the drop-in unit
/etc/systemd/system/resource-agents-deps.target.d/foo.confthat contains the following:[Unit] Requires=foo.service After=foo.serviceRun the
systemctl daemon-reloadcommand:# systemctl daemon-reload
12.4.2. Configuring startup order for an external dependency Copy linkLink copied to clipboard!
Cluster dependencies extend beyond services. For example, you can configure a dependency on mounting a file system at /srv.
Procedure
-
Ensure that
/srvis listed in the/etc/fstabfile. This will be converted automatically to thesystemdfilesrv.mountat boot when the configuration of the system manager is reloaded. For more information, see thesystemd.mount(5) and thesystemd-fstab-generator(8) man pages on your system. To make sure that Pacemaker starts after the disk is mounted, create the drop-in unit
/etc/systemd/system/resource-agents-deps.target.d/srv.confthat contains the following:[Unit] Requires=srv.mount After=srv.mountRun the
systemctl daemon-reloadcommand:# systemctl daemon-reload
12.4.3. Configuring startup order for remote block storage Copy linkLink copied to clipboard!
If an LVM volume group used by a Pacemaker cluster contains one or more physical volumes that reside on remote block storage, such as an iSCSI target, you can configure a systemd resource-agents-deps target and a systemd drop-in unit for the target to ensure that the service starts before Pacemaker starts.
The following procedure configures blk-availability.service as a dependency. The blk-availability.service service is a wrapper that includes iscsi.service, among other services. If your deployment requires it, you could configure iscsi.service (for iSCSI only) or remote-fs.target as the dependency instead of blk-availability.
Procedure
Create the drop-in unit
/etc/systemd/system/resource-agents-deps.target.d/blk-availability.confthat contains the following:[Unit] Requires=blk-availability.service After=blk-availability.serviceRun the
systemctl daemon-reloadcommand:# systemctl daemon-reload
Chapter 13. Colocating cluster resources Copy linkLink copied to clipboard!
To specify that the location of one resource depends on the location of another resource, you configure a colocation constraint.
There is an important side effect of creating a colocation constraint between two resources: it affects the order in which resources are assigned to a node. This is because you cannot place resource A relative to resource B unless you know where resource B is. So when you are creating colocation constraints, it is important to consider whether you should colocate resource A with resource B or resource B with resource A.
Another thing to keep in mind when creating colocation constraints is that, assuming resource A is colocated with resource B, the cluster will also take into account resource A’s preferences when deciding which node to choose for resource B.
The following command creates a colocation constraint:
# pcs constraint colocation add [promoted|unpromoted] source_resource with [promoted|unpromoted] target_resource [score] [options]
The following table summarizes the properties and options for configuring colocation constraints:
- Parameters of a colocation constraint
| Parameter | Description |
|---|---|
| source_resource | The colocation source. If the constraint cannot be satisfied, the cluster may decide not to allow the resource to run at all. |
| target_resource | The colocation target. The cluster will decide where to put this resource first and then decide where to put the source resource. |
| score |
Positive values indicate the resource should run on the same node. Negative values indicate the resources should not run on the same node. A value of + |
|
| Determines whether the cluster will move both the primary resource (source_resource) and dependent resources (target_resource) to another node when the dependent resource reaches its migration threshold for failure, or whether the cluster will leave the dependent resource offline without causing a service switch.
The
When this option has a value of
When this option has a value of |
- Displaying colocation constraints
The following command lists all current colocation constraints.
pcs constraint colocation [config]You can display all current location, order, and colocation constraints with the following command. To show the internal constraint IDS, specify the
--fulloption.pcs constraint [config] [--full]By default, listing resource constraints does not display expired constraints. To include expired constaints in the listing, use the
--alloption of thepcs constraintcommand. This will list expired constraints, noting the constraints and their associated rules as(expired)in the display.The following command lists the constraints that reference specific resources:
# pcs constraint ref resource ...
13.1. Specifying mandatory placement of resources Copy linkLink copied to clipboard!
Mandatory placement occurs any time the constraint’s score is +INFINITY or -INFINITY. In such cases, if the constraint cannot be satisfied, then the source_resource is not permitted to run. For score=INFINITY, this includes cases where the target_resource is not active.
Procedure
If you need
myresource1to always run on the same machine asmyresource2, you would add the following constraint:# pcs constraint colocation add myresource1 with myresource2 score=INFINITYBecause
INFINITYwas used, ifmyresource2cannot run on any of the cluster nodes (for whatever reason) thenmyresource1will not be allowed to run.Alternatively, you may want to configure the opposite, a cluster in which
myresource1cannot run on the same machine asmyresource2. In this case usescore=-INFINITY:# pcs constraint colocation add myresource1 with myresource2 score=-INFINITYAgain, by specifying
-INFINITY, the constraint is binding. So if the only place left to run is wheremyresource2already is, thenmyresource1may not run anywhere.
13.2. Specifying advisory placement of resources Copy linkLink copied to clipboard!
Advisory placement of resources indicates the placement of resources is a preference, but is not mandatory. For constraints with scores greater than -INFINITY and less than INFINITY, the cluster will try to accommodate your wishes but it can ignore them if the alternative is to stop some of the cluster resources.
13.3. Colocating sets of resources Copy linkLink copied to clipboard!
If your configuration requires that you create a set of resources that are colocated and started in order, you can configure a resource group that contains those resources. There are some situations, however, where configuring the resources that need to be colocated as a resource group is not appropriate.
These situation are:
- You may need to colocate a set of resources but the resources do not necessarily need to start in order.
- You may have a resource C that must be colocated with either resource A or B, but there is no relationship between A and B.
- You may have resources C and D that must be colocated with both resources A and B, but there is no relationship between A and B or between C and D.
In these situations, you can create a colocation constraint on a set or sets of resources with the pcs constraint colocation set command.
You can set the following options for a set of resources with the pcs constraint colocation set command.
sequential, which can be set totrueorfalseto indicate whether the members of the set must be colocated with each other.Setting
sequentialtofalseallows the members of this set to be colocated with another set listed later in the constraint, regardless of which members of this set are active. Therefore, this option makes sense only if another set is listed after this one in the constraint; otherwise, the constraint has no effect.-
role, which can be set toStopped,Started,Promoted, orUnpromoted.
You can set the following constraint option for a set of resources following the setoptions parameter of the pcs constraint colocation set command.
-
id, to provide a name for the constraint you are defining. -
score, to indicate the degree of preference for this constraint. For information about this option, see the "Location Constraint Options" table in Configuring Location Constraints
When listing members of a set, each member is colocated with the one before it. For example, "set A B" means "B is colocated with A". However, when listing multiple sets, each set is colocated with the one after it. For example, "set C D sequential=false set A B" means "set C D (where C and D have no relation between each other) is colocated with set A B (where B is colocated with A)".
The following command creates a colocation constraint on a set or sets of resources:
# pcs constraint colocation set resource1 resource2] [resourceN]... [options] [set resourceX resourceY] ... [options]] [setoptions [constraint_options]]
Use the following command to remove colocation constraints with source_resource:
# pcs constraint colocation remove source_resource target_resource
Chapter 14. Exporting resource constraints as pcs commands Copy linkLink copied to clipboard!
You can display the pcs commands that can be used to re-create configured resource constraints on a different system using the --output-format=cmd option of the pcs constraint command.
The following example procecures creates two resources, configures three resource constraints, and displays the xi`pcs` commands you can use to re-create the constraints on a different system.
Procedure
Create an
IPaddr2resource namedVirtualIP.# pcs resource create VirtualIP IPaddr2 ip=198.51.100.3 cidr_netmask=24 Assumed agent name 'ocf:heartbeat:IPaddr2' (deduced from 'IPaddr2')Create an
apacheresource namedWebsite.# pcs resource create Website apache configfile="/etc/httpd/conf/httpd.conf" statusurl="http://127.0.0.1/server-status" Assumed agent name 'ocf:heartbeat:apache' (deduced from 'apache')Configure a location constraint for the
Websiteresource.# pcs constraint location Website avoids node1Configure a colocation constraint for the
WebsiteandVirtualIPresources.# pcs constraint colocation add Website with VirtualIPConfigure an order constraint for the
WebsiteandVirtualIPresources.# pcs constraint order VirtualIP then Website Adding VirtualIP Website (kind: Mandatory) (Options: first-action=start then-action=start)Display the
pcscommands you can use to re-create the constraints on a different system.# pcs constraint --output-format=cmd pcs -- constraint location add location-Website-node1--INFINITY resource%Website node1 -INFINITY; pcs -- constraint colocation add Website with VirtualIP INFINITY \ id=colocation-Website-VirtualIP-INFINITY; pcs -- constraint order start VirtualIP then start Website \ id=order-VirtualIP-Website-mandatory
Chapter 15. Configuring node-specific values using node attributes Copy linkLink copied to clipboard!
Pacemaker supports the configuration of node-specific values, which you specify using node attributes. You can use node attributes to track information associated with a node. For example, you can define node attributes for how much RAM and disk space each node has, which OS each node uses, or which server room rack each node is in.
There are three primary uses for node attributes:
In Pacemaker rules for the cluster configuration
For example, you can set a node attribute named
departmenttoaccountingorITon each node, depending on which department that node is dedicated to. You can then configure a location rule to ensure that an accounting database runs only on servers wheredepartmentis set toaccounting.For information about node attribute expressions in Pacemaker rules, see Pacemaker rules.
In resource agents for the specific resource requirements
For example, a database resource agent can use a node attribute to track the latest replication position for use in a
promoteaction.In external scripts for use outside Pacemaker
For example, you can set
data-centerandrackattributes for each node, for use by an external inventory script.
Defining a node attribute
You define a node attribute with the pcs node attribute command. A node attribute has a name and a value, and can have a distinct value for each node.
When you define a node attribute with the pcs node attribute command, the node attribute is permanent. Permanent node attributes keep their values even when the cluster restarts on a node.
You can define a transient node attribute, which is kept in the CIB’s status section, and does not remain when the cluster stops on the node. For information about defining transient node attributes, see the crm_attribute(8) and attrd_updater(8) man pages on your system.
Run the following commands to define a node attribute with the name
rackfornode1andnode2, setting a value of 1 for therackattribute ofnode1and a value of 2 for therackattribute ofnode2.# pcs node attribute node1 rack=1 # pcs node attribute node2 rack=2
Displaying node attributes as pcs commands
You can export node attributes as a series of pcs commands by using the --output-format=cmd option. This is useful for scripting, automation, or replicating the same configuration on a different system.
You can display the configured node attributes in one of three formats:
-
text: Displays the output in plain text. This is the default format. -
json: Displays the output in a machine-readable JSON format, which is useful for scripting and automation. -
cmd: Displays the output as a series ofpcscommands, which you can use to recreate the same node attributes on a different system.
To display the configured node attributes as a series of
pcscommands:# pcs node attribute --output-format=cmdExample output
pcs node attribute node1 location=rack1 pcs node attribute node2 location=rack2
Chapter 16. Determining resource location with rules Copy linkLink copied to clipboard!
For more complicated location constraints, you can use Pacemaker rules to determine a resource’s location.
16.1. Pacemaker rules Copy linkLink copied to clipboard!
Pacemaker rules can be used to make your configuration more dynamic. One use of rules might be to assign machines to different processing groups, such as using a node attribute, based on time and to then use that attribute when creating location constraints.
Each rule can contain a number of expressions, date-expressions and even other rules. The results of the expressions are combined based on the rule’s boolean-op field to determine if the rule ultimately evaluates to true or false. What happens next depends on the context in which the rule is being used.
| Field | Description |
|---|---|
|
|
Limits the rule to apply only when the resource is in that role. Allowed values: |
|
|
The score to apply if the rule evaluates to |
|
|
The node attribute to look up and use as a score if the rule evaluates to |
|
|
How to combine the result of multiple expression objects. Allowed values: |
16.1.1. Node attribute expressions Copy linkLink copied to clipboard!
Node attribute expressions are used to control a resource based on the attributes defined by a node or nodes. For general information about node attributes, see Configuring node-specific values using node attributes.
| Field | Description |
|---|---|
|
| The node attribute to test |
|
|
Determines how the value(s) should be tested. Allowed values: |
|
| The comparison to perform. Allowed values:
*
*
*
*
*
*
*
* |
|
|
User supplied value for comparison (required unless |
In addition to any attributes added by the administrator, the cluster defines special, built-in node attributes for each node that can also be used, as described in the following table.
| Name | Description |
|---|---|
|
| Node name |
|
| Node ID |
|
|
Node type. Possible values are |
|
|
|
|
|
The value of the |
|
|
The value of the |
|
| The role the relevant promotable clone has on this node. Valid only within a rule for a location constraint for a promotable clone. |
16.1.2. Time/date based expressions Copy linkLink copied to clipboard!
Date expressions are used to control a resource or cluster option based on the current date/time. They can contain an optional date specification.
| Field | Description |
|---|---|
|
| A date/time conforming to the ISO8601 specification. |
|
| A date/time conforming to the ISO8601 specification. |
|
| Compares the current date/time with the start or the end date or both the start and end date, depending on the context. Allowed values:
*
*
*
* |
16.1.3. Date specifications Copy linkLink copied to clipboard!
Date specifications are used to create cron-like expressions relating to time. Each field can contain a single number or a single range. Instead of defaulting to zero, any field not supplied is ignored.
For example, monthdays="1" matches the first day of every month and hours="09-17" matches the hours between 9 am and 5 pm (inclusive). However, you cannot specify weekdays="1,2" or weekdays="1-2,5-6" since they contain multiple ranges.
| Field | Description |
|---|---|
|
| A unique name for the date |
|
| Allowed values: 0-59 |
|
| Allowed values: 0-59 |
|
| Allowed values: 0-23 |
|
| Allowed values: 0-31 (depending on month and year) |
|
| Allowed values: 1-7 (1=Monday, 7=Sunday) |
|
| Allowed values: 1-366 (depending on the year) |
|
| Allowed values: 1-12 |
|
|
Allowed values: 1-53 (depending on |
|
| Year according the Gregorian calendar |
|
|
May differ from Gregorian years; for example, |
16.2. Configuring a Pacemaker location constraint using rules Copy linkLink copied to clipboard!
Configure location constraints with rules to determine resource placement based on complex conditions, such as time, date, or node attributes. You can combine multiple expressions to create sophisticated placement logic.
# pcs constraint location rsc rule[resource-discovery=option] [role=promoted|unpromoted] [score=score | score-attribute=attribute] expression
-
If
scoreis omitted, it defaults to INFINITY. -
When using rules to configure location constraints, the value of
scorecan be positive or negative, with a positive value indicating "prefers" and a negative value indicating "avoids". -
If
resource-discoveryis omitted, it defaults toalways. For information about theresource-discoveryoption, see Limiting resource discovery to a subset of nodes. - As with basic location constraints, you can use regular expressions for resources with these constraints as well.
The expression option in Pacemaker rules
The expression option can be one of the following where duration_options and date_spec_options are: hours, months, weeks, and years as described in the "Properties of a Date Specification" table in Pacemaker rules.
-
defined|not_defined attribute -
attribute lt|gt|lte|gte|eq|ne [string|integer|number|version] value -
date gt|lt date -
date in_range date to date -
date in_range date to duration duration_options … -
date-spec date_spec_options -
expression and|or expression -
(expression)
Durations in Pacemaker rules
Note that durations are an alternative way to specify an end for in_range operations by means of calculations. For example, you can specify a duration of 19 months. The supported values for duration options are seconds, minutes, hours, days, weeks, months, and years.
The following location constraint configures an expression that is true if now is any time in the year 2018.
# pcs constraint location Webserver rule score=INFINITY date-spec years=2018
The following command configures an expression that is true from 9 am to 5 pm, Monday through Friday. Note that the hours value of 16 matches up to 16:59:59, as the numeric value (hour) still matches.
# pcs constraint location Webserver rule score=INFINITY date-spec hours="9-16" weekdays="1-5"
The following command configures an expression that is true on Friday the thirteenth.
# pcs constraint location Webserver rule date-spec weekdays=5 monthdays=13
16.3. Removing a Pacemaker rule Copy linkLink copied to clipboard!
You can remove a Pacemaker rule. If the rule that you are removing is the last rule in its constraint, the constraint will be removed.
Procedure
Remove a Pacemaker rule:
# pcs constraint rule remove rule_id
Chapter 17. Managing cluster resources Copy linkLink copied to clipboard!
There are a variety of commands you can use to display, modify, and administer cluster resources.
17.1. Exporting cluster resources as pcs commands Copy linkLink copied to clipboard!
You can display the pcs commands that can be used to re-create configured cluster resources on a different system using the --output-format=cmd option of the pcs resource config command.
The following example procedure creates four resources for an active/passive Apache HTTP server in a Red Hat high availability cluster and then displays the pcs commands you can use to recreate those resources.
Procedure
Create an
LVM-activateresource.# pcs resource create my_lvm ocf:heartbeat:LVM-activate vgname=my_vg vg_access_mode=system_id --group apachegroupCreate a
Filesystemresource.# pcs resource create my_fs Filesystem device="/dev/my_vg/my_lv" directory="/var/www" fstype="xfs" --group apachegroupCreate an
IPaddr2resource.# pcs resource create VirtualIP IPaddr2 ip=198.51.100.3 cidr_netmask=24 --group apachegroupCreate an
Apacheresource.# pcs resource create Website apache configfile="/etc/httpd/conf/httpd.conf" statusurl="http://127.0.0.1/server-status" --group apachegroupDisplay the
pcscommands you can use to re-create the four resources you created on a different system.# pcs resource config --output-format=cmd pcs resource create --no-default-ops --force -- my_lvm ocf:heartbeat:LVM-activate \ vg_access_mode=system_id vgname=my_vg \ op \ monitor interval=30s id=my_lvm-monitor-interval-30s timeout=90s \ start interval=0s id=my_lvm-start-interval-0s timeout=90s \ stop interval=0s id=my_lvm-stop-interval-0s timeout=90s; pcs resource create --no-default-ops --force -- my_fs ocf:heartbeat:Filesystem \ device=/dev/my_vg/my_lv directory=/var/www fstype=xfs \ op \ monitor interval=20s id=my_fs-monitor-interval-20s timeout=40s \ start interval=0s id=my_fs-start-interval-0s timeout=60s \ stop interval=0s id=my_fs-stop-interval-0s timeout=60s; pcs resource create --no-default-ops --force -- VirtualIP ocf:heartbeat:IPaddr2 \ cidr_netmask=24 ip=198.51.100.3 \ op \ monitor interval=10s id=VirtualIP-monitor-interval-10s timeout=20s \ start interval=0s id=VirtualIP-start-interval-0s timeout=20s \ stop interval=0s id=VirtualIP-stop-interval-0s timeout=20s; pcs resource create --no-default-ops --force -- Website ocf:heartbeat:apache \ configfile=/etc/httpd/conf/httpd.conf statusurl=http://127.0.0.1/server-status \ op \ monitor interval=10s id=Website-monitor-interval-10s timeout=20s \ start interval=0s id=Website-start-interval-0s timeout=40s \ stop interval=0s id=Website-stop-interval-0s timeout=60s; pcs resource group add apachegroup \ my_lvm my_fs VirtualIP WebsiteDisplay the
pcscommand you can use to re-create only theIPaddr2resource. To display only one configured resource, specify the resource ID for that resource.# pcs resource config VirtualIP --output-format=cmd pcs resource create --no-default-ops --force -- VirtualIP ocf:heartbeat:IPaddr2 \ cidr_netmask=24 ip=198.51.100.3 \ op \ monitor interval=10s id=VirtualIP-monitor-interval-10s timeout=20s \ start interval=0s id=VirtualIP-start-interval-0s timeout=20s \ stop interval=0s id=VirtualIP-stop-interval-0s timeout=20s
17.2. Modifying resource parameters Copy linkLink copied to clipboard!
You can modify the parameters of a configured resource.
pcs resource update resource_id [resource_options]
When you update a resource’s operation with the pcs resource update command, any options you do not specifically call out are reset to their default values.
The following example procedure modifies the parameters of the resource VirtualIP.
Procedure
Display the initial values of the configured parameters for resource
VirtualIP.# pcs resource config VirtualIP Resource: VirtualIP (type=IPaddr2 class=ocf provider=heartbeat) Attributes: ip=192.168.0.120 cidr_netmask=24 Operations: monitor interval=30sChange the value of the
ipparameter for resourceVirtualIP.# pcs resource update VirtualIP ip=192.169.0.120Display the values of the parameters for resource
VirtualIPafter you have modifed the value of theipparameter.# pcs resource config VirtualIP Resource: VirtualIP (type=IPaddr2 class=ocf provider=heartbeat) Attributes: ip=192.169.0.120 cidr_netmask=24 Operations: monitor interval=30s
17.3. Clearing failure status of cluster resources Copy linkLink copied to clipboard!
If a resource has failed, a failure message appears when you display the cluster status with the pcs status command. After attempting to resolve the cause of the failure, you can check the updated status of the resource by running the pcs status command again, and you can check the failure count for the cluster resources with the pcs resource failcount show --full command.
After you resolve the cause of a resource failure, you may want to remove the failure message from the status display by removing the failure operation history.
- Resetting the failure status and removing the failure operation history
You can clear that failure status of a resource with the
pcs resource cleanupcommand. Thepcs resource cleanupcommand resets the resource status andfailcountvalue for the resource. This command also removes the operation history for the resource and re-detects its current state. Thepcs resource cleanupcommand operates only on resources with failed actions as shown in the cluster status.The following command resets the resource status and
failcountvalue for the resource specified by resource_id.For example, you can reset the resouce status and
failcountvalue for the resource specified by resource_id by using thepcs resource cleanup resource_idcommand.If you do not specify resource_id, the
pcs resource cleanupcommand resets the resource status andfailcountvalue for all resources with a failure count.- Resetting the resource status and removing the full resource operation history
You can reset the resource status and clear the entire operation history of a resource with the
pcs resource refresh resource_idcommand. Run thepcs resource refreshcommand with no options specified to reset the resource status andfailcountvalue for all resources.The
pcs resource refreshcommand operates on resources regardless of their current state. This requires that Pacemaker reprobe the resources on all nodes, which increases the workload. To remove the operation history only of resources with failed actions, use thepcs resource cleanupcommand.
17.4. Moving resources in a cluster Copy linkLink copied to clipboard!
Pacemaker provides a variety of mechanisms for configuring a resource to move from one node to another and to manually move a resource when needed.
You can manually move resources in a cluster with the pcs resource move and pcs resource relocate commands, as described in Manually moving cluster resources. In addition to these commands, you can also control the behavior of cluster resources by enabling, disabling, and banning resources, as described in Disabling, enabling, and banning cluster resources.
You can configure a resource so that it will move to a new node after a defined number of failures, and you can configure a cluster to move resources when external connectivity is lost.
Moving resources due to failure
When you create a resource, you can configure the resource so that it will move to a new node after a defined number of failures by setting the migration-threshold option for that resource. Once the threshold has been reached, this node will no longer be allowed to run the failed resource until:
-
The resource’s
failure-timeoutvalue is reached. -
The administrator manually resets the resource’s failure count by using the
pcs resource cleanupcommand.
The value of migration-threshold is set to INFINITY by default. INFINITY is defined internally as a very large but finite number. A value of 0 disables the migration-threshold feature.
Setting a migration-threshold for a resource is not the same as configuring a resource for migration, in which the resource moves to another location without loss of state.
The following example adds a migration threshold of 10 to the resource named dummy_resource, which indicates that the resource will move to a new node after 10 failures.
# pcs resource meta dummy_resource migration-threshold=10
You can add a migration threshold to the defaults for the whole cluster with the following command.
# pcs resource defaults update migration-threshold=10
To determine the resource’s current failure status and limits, use the pcs resource failcount show command.
There are two exceptions to the migration threshold concept; they occur when a resource either fails to start or fails to stop. If the cluster property start-failure-is-fatal is set to true (which is the default), start failures cause the failcount to be set to INFINITY and always cause the resource to move immediately. For information about the start-failure-is-fatal cluster property, see Summary of cluster properties and options.
Stop failures are slightly different and crucial. If a resource fails to stop and STONITH is enabled, then the cluster will fence the node to be able to start the resource elsewhere. If STONITH is not enabled, then the cluster has no way to continue and will not try to start the resource elsewhere, but will try to stop it again after the failure timeout.
Moving resources due to connectivity changes
Setting up the cluster to move resources when external connectivity is lost is a two step process.
-
Add a
pingresource to the cluster. Thepingresource uses the system utility of the same name to test if a list of machines (specified by DNS host name or IPv4/IPv6 address) are reachable and uses the results to maintain a node attribute calledpingd. - Configure a location constraint for the resource that will move the resource to a different node when connectivity is lost.
The following table describes the properties you can set for a ping resource.
| Field | Description |
|---|---|
|
| The time to wait (dampening) for further changes to occur. This prevents a resource from bouncing around the cluster when cluster nodes notice the loss of connectivity at slightly different times. |
|
| The number of connected ping nodes gets multiplied by this value to get a score. Useful when there are multiple ping nodes configured. |
|
| The machines to contact to determine the current connectivity status. Allowed values include resolvable DNS host names, IPv4 and IPv6 addresses. The entries in the host list are space separated. |
This example procedure creates a ping resource that verifies connectivity to gateway.example.com. In practice, you would verify connectivity to your network gateway/router.
Procedure
Configure a
pingresource. You configure the resource as a clone so that the resource will run on all cluster nodes.# pcs resource create ping ocf:pacemaker:ping dampen=5s multiplier=1000 host_list=gateway.example.com cloneConfigure a location constraint rule for the existing resource named
Webserver. This causes theWebserverresource to move to a host that is able to pinggateway.example.comif the host that it is currently running on cannot pinggateway.example.com.# pcs constraint location Webserver rule score=-INFINITY pingd lt 1 or not_defined pingd
17.5. Configuring and managing cluster resource tags Copy linkLink copied to clipboard!
You can use the pcs command to tag cluster resources. This allows you to enable, disable, manage, or unmanage a specified set of resources with a single command.
17.5.1. Tagging cluster resources for administration by category Copy linkLink copied to clipboard!
You can tag two resources with a resource tag and disable the tagged resources. In this example, the existing resources to be tagged are named d-01 and d-02.
Procedure
Create a tag named
special-resourcesfor resourcesd-01andd-02.[root@node-01]# pcs tag create special-resources d-01 d-02Display the resource tag configuration.
[root@node-01]# pcs tag config special-resources d-01 d-02Disable all resources that are tagged with the
special-resourcestag.[root@node-01]# pcs resource disable special-resourcesDisplay the status of the resources to confirm that resources
d-01andd-02are disabled.[root@node-01]# pcs resource * d-01 (ocf::pacemaker:Dummy): Stopped (disabled) * d-02 (ocf::pacemaker:Dummy): Stopped (disabled)In addition to the
pcs resource disablecommand, thepcs resource enable,pcs resource manage, andpcs resource unmanagecommands support the administration of tagged resources.After you have created a resource tag:
-
You can delete a resource tag with the
pcs tag deletecommand. -
You can modify resource tag configuration for an existing resource tag with the
pcs tag updatecommand.
-
You can delete a resource tag with the
17.5.2. Deleting a tagged cluster resource Copy linkLink copied to clipboard!
You cannot delete a tagged cluster resource using pcs. Instead, remove the tag before deleting the resource.
Procedure
Remove the resource tag.
The following command removes the resource tag
special-resourcesfrom all resources with that tag,[root@node-01]# pcs tag remove special-resources [root@node-01]# pcs tag No tags definedThe following command removes the resource tag
special-resourcesfrom the resourced-01only.[root@node-01]# pcs tag update special-resources remove d-01
Delete the resource.
[root@node-01]# pcs resource delete d-01 Attempting to stop: d-01... Stopped
17.5.3. Displaying and exporting cluster resource tags Copy linkLink copied to clipboard!
The pcs tag [config] command supports the --output-format option.
-
Specifying
--output-format=textdisplays the configured tags in plain text format, which is the default value for this option. -
Specifying
--output-format=cmddisplays the commands created from the current cluster tags configuration. You can use these commands to re-create configured tags on a different system. -
Specifying
--output-format=jsondisplays the configured tags in JSON format, which is suitable for machine parsing.
Chapter 18. Creating cluster resources that are active on multiple nodes (cloned resources) Copy linkLink copied to clipboard!
You can clone a cluster resource so that the resource can be active on multiple nodes. For example, you can use cloned resources to configure multiple instances of an IP resource to distribute throughout a cluster for node balancing. You can clone any resource provided the resource agent supports it. A clone consists of one resource or one resource group.
Only resources that can be active on multiple nodes at the same time are suitable for cloning. For example, a Filesystem resource mounting a non-clustered file system such as ext4 from a shared memory device should not be cloned. Since the ext4 partition is not cluster aware, this file system is not suitable for read/write operations occurring from multiple nodes at the same time.
18.1. Creating and removing a cloned resource Copy linkLink copied to clipboard!
You can create a resource and a clone of that resource at the same time.
pcs resource create resource_id [standard:[provider:]]type [resource options] [meta resource meta options] clone [clone_id] [clone options]
You can create a clone of a previously-created resource or resource group with the following command.
pcs resource clone resource_id | group_id [clone_id][clone options]...
By default, the name of the clone will be resource_id-clone. You can set a custom name for the clone by specifying a value for the clone_id option.
You cannot create a resource group and a clone of that resource group in a single command.
When you create a resource or resource group clone that will be ordered after another clone, you should almost always set the interleave=true option. This ensures that copies of the dependent clone can stop or start when the clone it depends on has stopped or started on the same node. If you do not set this option, if a cloned resource B depends on a cloned resource A and a node leaves the cluster, when the node returns to the cluster and resource A starts on that node, then all of the copies of resource B on all of the nodes will restart. This is because when a dependent cloned resource does not have the interleave option set, all instances of that resource depend on any running instance of the resource it depends on.
Use the following command to remove a clone of a resource or a resource group. This does not remove the resource or resource group itself.
pcs resource unclone resource_id | clone_id | group_name
The following table describes the options you can specify for a cloned resource.
| Field | Description |
|---|---|
|
| Options inherited from resource that is being cloned, as described in the "Resource Meta Options" table in Configuring resource meta options. |
|
| How many copies of the resource to start. Defaults to the number of nodes in the cluster. |
|
|
How many copies of the resource can be started on a single node; the default value is |
|
|
When stopping or starting a copy of the clone, tell all the other copies beforehand and when the action was successful. Allowed values: |
|
|
Does each copy of the clone perform a different function? Allowed values:
If the value of this option is
If the value of this option is |
|
|
Should the copies be started in series (instead of in parallel). Allowed values: |
|
|
Changes the behavior of ordering constraints (between clones) so that copies of the first clone can start or stop as soon as the copy on the same node of the second clone has started or stopped (rather than waiting until every instance of the second clone has started or stopped). Allowed values: |
|
|
If a value is specified, any clones which are ordered after this clone will not be able to start until the specified number of instances of the original clone are running, even if the |
To achieve a stable allocation pattern, clones are slightly sticky by default, which indicates that they have a slight preference for staying on the node where they are running. If no value for resource-stickiness is provided, the clone will use a value of 1. Being a small value, it causes minimal disturbance to the score calculations of other resources but is enough to prevent Pacemaker from needlessly moving copies around the cluster. For information about setting the resource-stickiness resource meta-option, see Configuring resource meta options.
The following procedure creates and removes a resource clone.
Procedure
On one node of the cluster, create the resource clone.
When you create a clone of a resource, by default the clone takes on the name of the resource with
-cloneappended to the name. The following command creates a resource of typeapachenamedwebfarmand a clone of that resource namedwebfarm-clone.# pcs resource create webfarm apache cloneRemove the clone of a resource or a resource group. This does not remove the resource or resource group itself.
# pcs resource unclone webfarm
18.2. Configuring clone resource constraints Copy linkLink copied to clipboard!
You can determine the behavior of a clone resource in a cluster by configuring constraints for that resourse. These constraints are written no differently than those for regular resources except that you must specify the clone’s ID. For information about resource constraints, see Configuring cluster resources.
Clone resource location constraints
In most cases, a clone will have a single copy on each active cluster node. You can, however, set clone-max for the resource clone to a value that is less than the total number of nodes in the cluster. If this is the case, you can indicate which nodes the cluster should preferentially assign copies to with resource location constraints. For general information about location constraints, see Determining which node a resource can run on.
The following command creates a location constraint for the cluster to preferentially assign resource clone webfarm-clone to node1.
# pcs constraint location webfarm-clone prefers node1
Clone resource ordering constraints
Ordering constraints behave slightly differently for clones. In the example below, because the interleave clone option is left to default as false, no instance of webfarm-stats will start until all instances of webfarm-clone that need to be started have done so. Only if no copies of webfarm-clone can be started then webfarm-stats will be prevented from being active. Additionally, webfarm-clone will wait for webfarm-stats to be stopped before stopping itself.
# pcs constraint order start webfarm-clone then webfarm-stats
For general information about resource ordering constraints, see Determining the order in which cluster resources are run.
Clone resource colocation constraints
Colocation of a regular (or group) resource with a clone means that the resource can run on any machine with an active copy of the clone. The cluster will choose a copy based on where the clone is running and the resource’s own location preferences. For information about colocation constraints, see xref colocating-cluster-resources[Colocating cluster resources].
Colocation between clones is also possible. In such cases, the set of allowed locations for the clone is limited to nodes on which the clone is (or will be) active. Allocation is then performed as normally.
The following command creates a colocation constraint to ensure that the resource webfarm-stats runs on the same node as an active copy of webfarm-clone.
# pcs constraint colocation add webfarm-stats with webfarm-clone
18.3. Promotable clone resources Copy linkLink copied to clipboard!
Promotable clone resources are clone resources with the promotable meta attribute set to true. They allow the instances to be in one of two operating modes; these are called promoted and unpromoted. The names of the modes do not have specific meanings, except for the limitation that when an instance is started, it must come up in the Unpromoted state.
The Promoted and Unpromoted role names are the functional equivalent of the Master and Slave Pacemaker roles in previous RHEL releases.
18.3.1. Creating a promotable clone resource Copy linkLink copied to clipboard!
Configure a promotable clone resource to manage services that run in different modes, such as active and passive. You can create a new promotable resource or convert an existing resource or group to support promoted and unpromoted roles.
Procedure
Create a resource as a promotable clone:
# pcs resource create resource_id [standard:[provider:]]type [resource options] promotable [clone_id] [clone options]By default, the name of the promotable clone is
resource_id-clone. You can set a custom name for the clone by specifying a value for the clone_id option.Alternately, you can create a promotable resource from a previously-created resource or resource group with the following command:
# pcs resource promotable resource_id [clone_id] [clone options]By default, the name of the promotable clone is
resource_id-cloneorgroup_name-clone. You can set a custom name for the clone by specifying a value for the clone_id option.The following table describes the extra clone options you can specify for a promotable resource.
Expand Table 18.2. Extra Clone Options Available for Promotable Clones Field Description promoted-maxHow many copies of the resource can be promoted; default 1.
promoted-node-maxHow many copies of the resource can be promoted on a single node; default 1.
18.3.2. Configuring promotable resource constraints Copy linkLink copied to clipboard!
You can determine the behavior of a promotable resource in a cluster by configuring constraints for that resource. For general information about resource constraints, see Configuring cluster resources.
Promotable resource location constraints
In most cases, a promotable resource will have a single copy on each active cluster node. If this is not the case, you can indicate which nodes the cluster should preferentially assign copies to with resource location constraints. These constraints are written no differently than those for regular resources. For information about location constraints, see Determining which node a resource can run on.
Promotable resource colocation constraints
You can create a colocation constraint which specifies whether the resources are operating in a promoted or unpromoted role. The following command creates a resource colocation constraint:
# pcs constraint colocation add [promoted|unpromoted] source_resource with [promoted|unpromoted] target_resource [score] [options]
For information about colocation constraints, see Colocating cluster resources.
Promotable resource ordering constraints
When configuring an ordering constraint that includes promotable resources, one of the actions that you can specify for the resources is promote, indicating that the resource be promoted from unpromoted role to promoted role. Additionally, you can specify an action of demote, indicated that the resource be demoted from promoted role to unpromoted role.
The command for configuring an order constraint is as follows:
# pcs constraint order [action] resource_id then [action] resource_id [options]
18.3.3. Demoting a promoted resource on failure Copy linkLink copied to clipboard!
You can configure a promotable resource so that when a promote or monitor action fails for that resource, or the partition in which the resource is running loses quorum, the resource will be demoted but will not be fully stopped. This can prevent the need for manual intervention in situations where fully stopping the resource would require it.
Procedure
To configure a promotable resource to be demoted when a
promoteaction fails, set theon-failoperation meta option todemote, as in the following example:# pcs resource op add my-rsc promote on-fail="demote"To configure a promotable resource to be demoted when a
monitoraction fails, setintervalto a nonzero value, set theon-failoperation meta option todemote, and setroletoPromoted, as in the following example.# pcs resource op add my-rsc monitor interval="10s" on-fail="demote" role="Promoted"To configure a cluster so that when a cluster partition loses quorum any promoted resources will be demoted but left running and all other resources will be stopped, set the
no-quorum-policycluster property todemoteSetting the
on-failmeta-attribute todemotefor an operation does not affect how promotion of a resource is determined. If the affected node still has the highest promotion score, it will be selected to be promoted again.
Chapter 19. Managing cluster nodes Copy linkLink copied to clipboard!
To manage cluster nodes, use pcs commands to start and stop cluster services and to add or remove nodes.
19.1. Stopping cluster services Copy linkLink copied to clipboard!
You can stop cluster services on the specified node or nodes. As with the pcs cluster start, the --all option stops cluster services on all nodes and if you do not specify any nodes, cluster services are stopped on the local node only.
Procedure
Stop cluster services on the specified node or nodes:
# pcs cluster stop [--all | node] [...]You can force a stop of cluster services on the local node with the following command, which performs a
kill -9command.# pcs cluster kill
19.2. Enabling and disabling cluster services Copy linkLink copied to clipboard!
Enable the cluster services with the following command. This configures the cluster services to run on startup on the specified node or nodes.
Enabling allows nodes to automatically rejoin the cluster after they have been fenced, minimizing the time the cluster is at less than full strength. If the cluster services are not enabled, an administrator can manually investigate what went wrong before starting the cluster services manually, so that, for example, a node with hardware issues in not allowed back into the cluster when it is likely to fail again.
Procedure
Enable the cluster services:
-
If you specify the
--alloption, the command enables cluster services on all nodes. If you do not specify any nodes, cluster services are enabled on the local node only.
# pcs cluster enable [--all | node] [...]
-
If you specify the
Use the following command to configure the cluster services not to run on startup on the specified node or nodes:
-
If you specify the
--alloption, the command disables cluster services on all nodes. If you do not specify any nodes, cluster services are disabled on the local node only.
# pcs cluster disable [--all | node] [...]
-
If you specify the
19.3. Adding cluster nodes Copy linkLink copied to clipboard!
You can add a new node to an existing cluster with the following procedure.
This procedure adds standard clusters nodes running corosync. For information about integrating non-corosync nodes into a cluster, see Integrating non-corosync nodes into a cluster: the pacemaker_remote service.
It is recommended that you add nodes to existing clusters only during a production maintenance window. This allows you to perform appropriate resource and deployment testing for the new node and its fencing configuration.
In this example, the existing cluster nodes are clusternode-01.example.com, clusternode-02.example.com, and clusternode-03.example.com. The new node is newnode.example.com.
Procedure
On the new node to add to the cluster, perform the following tasks:
Install the cluster packages. If the cluster uses SBD, the Booth ticket manager, or a quorum device, you must manually install the respective packages (
sbd,booth-site,corosync-qdevice) on the new node as well.[root@newnode ~]# dnf install -y pcs fence-agents-allIn addition to the cluster packages, you will also need to install and configure all of the services that you are running in the cluster, which you have installed on the existing cluster nodes. For example, if you are running an Apache HTTP server in a Red Hat high availability cluster, you will need to install the server on the node you are adding, as well as the
wgettool that checks the status of the server.If you are running the
firewallddaemon, execute the following commands to enable the ports that are required by the Red Hat High Availability Add-On.# firewall-cmd --permanent --add-service=high-availability # firewall-cmd --add-service=high-availabilitySet a password for the user ID
hacluster. It is recommended that you use the same password for each node in the cluster.[root@newnode ~]# passwd hacluster Changing password for user hacluster. New password: Retype new password: passwd: all authentication tokens updated successfully.Execute the following commands to start the
pcsdservice and to enablepcsdat system start.# systemctl start pcsd.service # systemctl enable pcsd.service
On a node in the existing cluster, perform the following tasks.
Authenticate user
haclusteron the new cluster node.[root@clusternode-01 ~]# pcs host auth newnode.example.com Username: hacluster Password: newnode.example.com: AuthorizedAdd the new node to the existing cluster. This command also syncs the cluster configuration file
corosync.confto all nodes in the cluster, including the new node you are adding.[root@clusternode-01 ~]# pcs cluster node add newnode.example.com
On the new node to add to the cluster, perform the following tasks.
Start and enable cluster services on the new node.
[root@newnode ~]# pcs cluster start Starting Cluster... [root@newnode ~]# pcs cluster enable- Ensure that you configure and test a fencing device for the new cluster node.
19.4. Removing cluster nodes Copy linkLink copied to clipboard!
You can shut down the specified node and remove it from the cluster configuration file, corosync.conf, on all of the other nodes in the cluster.
Procedure
Remove the cluster node:
# pcs cluster node remove node
19.5. Adding a node to a cluster with multiple links Copy linkLink copied to clipboard!
When adding a node to a multi-link cluster, you must specify an address for every link. This example adds rh80-node3, using 192.168.122.203 for the first link and 192.168.123.203 for the second.
Procedure
Add a node to a cluster with multiple links:
# pcs cluster node add rh80-node3 addr=192.168.122.203 addr=192.168.123.203
19.6. Adding and modifying links in an existing cluster Copy linkLink copied to clipboard!
In most cases, you can add or modify the links in an existing cluster without restarting the cluster.
19.6.1. Adding and removing links in an existing cluster Copy linkLink copied to clipboard!
To add a new link to a running cluster, use the pcs cluster link add command.
- When adding a link, you must specify an address for each node.
-
Adding and removing a link is only possible when you are using the
knettransport protocol. - At least one link in the cluster must be defined at any time.
- The maximum number of links in a cluster is 8, numbered 0-7. It does not matter which links are defined, so, for example, you can define only links 3, 6 and 7.
-
When you add a link without specifying its link number,
pcsuses the lowest link available. -
The link numbers of currently configured links are contained in the
corosync.conffile. To display thecorosync.conffile, run thepcs cluster corosynccommand or thepcs cluster config [show]command.
Procedure
The following command adds link number 5 to a three node cluster.
[root@node1 ~] # pcs cluster link add node1=10.0.5.11 node2=10.0.5.12 node3=10.0.5.31 options linknumber=5To remove an existing link, use the
pcs cluster link deleteorpcs cluster link removecommand. Either of the following commands will remove link number 5 from the cluster.[root@node1 ~] # pcs cluster link delete 5 [root@node1 ~] # pcs cluster link remove 5
19.6.2. Modifying a link in a cluster with multiple links Copy linkLink copied to clipboard!
Modify a specific link in a multi-link cluster configuration.
Procedure
Remove the link you want to change.
[root@node1 ~] # pcs cluster link remove 2Add the link back to the cluster with the updated addresses and options.
[root@node1 ~] # pcs cluster link add node1=10.0.5.11 node2=10.0.5.12 node3=10.0.5.31 options linknumber=2
19.6.3. Modifying the link addresses in a cluster with a single link Copy linkLink copied to clipboard!
If your cluster uses only one link and you want to modify that link to use different addresses, perform the following procedure. In this example, the original link is link 1.
Add a new link with the new addresses and options.
[root@node1 ~] # pcs cluster link add node1=10.0.5.11 node2=10.0.5.12 node3=10.0.5.31 options linknumber=2Remove the original link.
[root@node1 ~] # pcs cluster link remove 1
Note that you cannot specify addresses that are currently in use when adding links to a cluster. This means, for example, that if you have a two-node cluster with one link and you want to change the address for one node only, you cannot use the above procedure to add a new link that specifies one new address and one existing address. Instead, you can add a temporary link before removing the existing link and adding it back with the updated address, as in the following example.
In this example:
- The link for the existing cluster is link 1, which uses the address 10.0.5.11 for node 1 and the address 10.0.5.12 for node 2.
- You would like to change the address for node 2 to 10.0.5.31.
To update only one of the addresses for a two-node cluster with a single link, use the following procedure.
Procedure
Add a new temporary link to the existing cluster, using addresses that are not currently in use.
[root@node1 ~] # pcs cluster link add node1=10.0.5.13 node2=10.0.5.14 options linknumber=2Remove the original link.
[root@node1 ~] # pcs cluster link remove 1Add the new, modified link.
[root@node1 ~] # pcs cluster link add node1=10.0.5.11 node2=10.0.5.31 options linknumber=1Remove the temporary link you created
[root@node1 ~] # pcs cluster link remove 2
19.6.4. Modifying the link options for a link in a cluster with a single link Copy linkLink copied to clipboard!
If your cluster uses only one link and you want to modify the options for that link but you do not want to change the address to use, you can add a temporary link before removing and updating the link to modify.
In this example:
- The link for the existing cluster is link 1, which uses the address 10.0.5.11 for node 1 and the address 10.0.5.12 for node 2.
-
You would like to change the link option
link_priorityto 11.
Procedure
Add a new temporary link to the existing cluster, using addresses that are not currently in use.
[root@node1 ~] # pcs cluster link add node1=10.0.5.13 node2=10.0.5.14 options linknumber=2Remove the original link.
[root@node1 ~] # pcs cluster link remove 1Add back the original link with the updated options.
[root@node1 ~] # pcs cluster link add node1=10.0.5.11 node2=10.0.5.12 options linknumber=1 link_priority=11Remove the temporary link.
[root@node1 ~] # pcs cluster link remove 2
19.6.5. Modifying a link when adding a new link is not possible Copy linkLink copied to clipboard!
If you cannot add a new link and must modify the single existing link, you must shut down the cluster.
The following example procedure updates link number 1 in the cluster and sets the link_priority option for the link to 11.
Procedure
Stop the cluster services for the cluster.
[root@node1 ~] # pcs cluster stop --allUpdate the link addresses and options.
The
pcs cluster link updatecommand does not require that you specify all of the node addresses and options. Instead, you can specify only the addresses to change. This example modifies the addresses fornode1andnode3and thelink_priorityoption only.[root@node1 ~] # pcs cluster link update 1 node1=10.0.5.11 node3=10.0.5.31 options link_priority=11To remove an option, you can set the option to a null value with the
option=format.Restart the cluster
[root@node1 ~] # pcs cluster start --all
19.7. Configuring a node health strategy Copy linkLink copied to clipboard!
A node might be functioning well enough to maintain its cluster membership and yet be unhealthy in some respect that makes it an undesirable location for resources. For example, a disk drive might be reporting SMART errors, or the CPU might be highly loaded.
You can use a node health strategy in Pacemaker to automatically move resources off unhealthy nodes.
You can monitor a node’s health with the the following health node resource agents, which set node attributes based on CPU and disk status:
-
ocf:pacemaker:HealthCPU, which monitors CPU idling -
ocf:pacemaker:HealthIOWait, which monitors the CPU I/O wait -
ocf:pacemaker:HealthSMART, which monitors SMART status of a disk drive -
ocf:pacemaker:SysInfo, which sets a variety of node attributes with local system information and also functions as a health agent monitoring disk space usage
Additionally, any resource agent might provide node attributes that can be used to define a health node strategy.
The following procedure configures a health node strategy for a cluster that will move resources off of any node whose CPU I/O wait goes above 15%.
Procedure
Set the
health-node-strategycluster property to define how Pacemaker responds to changes in node health.# pcs property set node-health-strategy=migrate-on-redCreate a cloned cluster resource that uses a health node resource agent, setting the
allow-unhealthy-nodesresource meta option to define whether the cluster will detect if the node’s health recovers and move resources back to the node. Configure this resource with a recurring monitor action, to continually check the health of all nodes.This example creates a
HealthIOWaitresource agent to monitor the CPU I/O wait, setting a red limit for moving resources off a node to 15%. This command sets theallow-unhealthy-nodesresource meta option totrueand configures a recurring monitor interval of 10 seconds.# pcs resource create io-monitor ocf:pacemaker:HealthIOWait red_limit=15 op monitor interval=10s meta allow-unhealthy-nodes=true clone
19.8. Configuring a large cluster with many resources Copy linkLink copied to clipboard!
For clusters with a large number of nodes and resources, modify the default values of the following parameters.
- The
cluster-ipc-limitcluster property The
cluster-ipc-limitcluster property is the maximum IPC message backlog before one cluster daemon will disconnect another. When a large number of resources are cleaned up or otherwise modified simultaneously in a large cluster, a large number of CIB updates arrive at once. This could cause slower clients to be evicted if the Pacemaker service does not have time to process all of the configuration updates before the CIB event queue threshold is reached.The recommended value of
cluster-ipc-limitfor use in large clusters is the number of resources in the cluster multiplied by the number of nodes. This value can be raised if you see "Evicting client" messages for cluster daemon PIDs in the logs.You can increase the value of
cluster-ipc-limitfrom its default value of 500 with thepcs property setcommand. For example, for a ten-node cluster with 200 resources you can set the value ofcluster-ipc-limitto 2000 with the following command.# pcs property set cluster-ipc-limit=2000- The
PCMK_ipc_bufferPacemaker parameter On very large deployments, internal Pacemaker messages may exceed the size of the message buffer. When this occurs, you will see a message in the system logs of the following format:
Compressed message exceeds X% of configured IPC limit (X bytes); consider setting PCMK_ipc_buffer to X or higherWhen you see this message, you can increase the value of
PCMK_ipc_bufferin the/etc/sysconfig/pacemakerconfiguration file on each node. For example, to increase the value ofPCMK_ipc_bufferfrom its default value to 13396332 bytes, change the uncommentedPCMK_ipc_bufferfield in the/etc/sysconfig/pacemakerfile on each node in the cluster as follows.PCMK_ipc_buffer=13396332To apply this change, run the following command.
# systemctl restart pacemaker
Chapter 20. Setting user permissions for a Pacemaker cluster Copy linkLink copied to clipboard!
You can grant permission for specific users other than user hacluster to manage a Pacemaker cluster.
There are two sets of permissions that you can grant to individual users:
-
Permissions that allow individual users to manage the cluster through the Web UI and to run
pcscommands that connect to nodes over a network. Commands that connect to nodes over a network include commands to set up a cluster, or to add or remove nodes from a cluster. - Permissions for local users to allow read-only or read-write access to the cluster configuration. Commands that do not require connecting over a network include commands that edit the cluster configuration, such as those that create resources and configure constraints.
In situations where both sets of permissions have been assigned, the permissions for commands that connect over a network are applied first, and then permissions for editing the cluster configuration on the local node are applied. Most pcs commands do not require network access and in those cases the network permissions will not apply.
20.1. Setting permissions for node access over a network Copy linkLink copied to clipboard!
To grant permission for specific users to run pcs commands that connect to nodes over a network, add those users to the group haclient. This must be done on every node in the cluster.
Procedure
On every node in the cluster, add the user to the
haclientgroup:# usermod -a -G haclient <username>Replace
<username>with the name of the user to whom you are granting permission.
20.2. Setting local permissions using ACLs Copy linkLink copied to clipboard!
You can use the pcs acl command to set permissions for local users to allow read-only or read-write access to the cluster configuration by using access control lists (ACLs).
By default, ACLs are not enabled. When ACLs are not enabled, any user who is a member of the group haclient on all nodes has full local read/write access to the cluster configuration while users who are not members of haclient have no access. When ACLs are enabled, however, even users who are members of the haclient group have access only to what has been granted to that user by the ACLs. The root and hacluster user accounts always have full access to the cluster configuration, even when ACLs are enabled.
Setting permissions for local users is a two step process:
-
Execute the
pcs acl role create…command to create a role which defines the permissions for that role. -
Assign the role you created to a user with the
pcs acl user createcommand. If you assign multiple roles to the same user, anydenypermission takes precedence, thenwrite, thenread.
The following example procedure provides read-only access for a cluster configuration to a local user named rouser. Note that it is also possible to restrict access to certain portions of the configuration only.
It is important to perform this procedure as root or to save all of the configuration updates to a working file which you can then push to the active CIB when you are finished. Otherwise, you can lock yourself out of making any further changes. For information about saving configuration updates to a working file, see Saving a configuration change to a working file.
Procedure
This procedure requires that the user
rouserexists on the local system and that the userrouseris a member of the grouphaclient.# adduser rouser # usermod -a -G haclient rouserEnable Pacemaker ACLs with the
pcs acl enablecommand.# pcs acl enableCreate a role named
read-onlywith read-only permissions for the cib.# pcs acl role create read-only description="Read access to cluster" read xpath /cibCreate the user
rouserin the pcs ACL system and assign that user theread-onlyrole.# pcs acl user create rouser read-onlyView the current ACLs.
# pcs acl User: rouser Roles: read-only Role: read-only Description: Read access to cluster Permission: read xpath /cib (read-only-read)On each node where
rouserwill runpcscommands, log in asrouserand authenticate to the localpcsdservice. This is required in order to run certainpcscommands, such aspcs status, as the ACL user.[rouser ~]$ pcs client local-auth
Chapter 21. Resource monitoring operations Copy linkLink copied to clipboard!
Configure monitoring operations to track resource health. If you do not specify one, pcs creates a default. The interval is determined by the resource agent, or defaults to 60 seconds if the agent provides none.
The following table summarizes the properties of a resource monitoring operation.
| Field | Description |
|---|---|
|
| Unique name for the action. The system assigns this when you configure an operation. |
|
|
The action to perform. Common values: |
|
|
If set to a nonzero value, a recurring operation is created that repeats at this frequency, in seconds. A nonzero value makes sense only when the action
If set to zero, which is the default value, this parameter allows you to provide values to be used for operations created by the cluster. For example, if the |
|
|
If the operation does not complete in the amount of time set by this parameter, abort the operation and consider it failed. The default value is the value of
The |
|
| The action to take if this action ever fails. Allowed values:
*
*
*
*
*
*
*
The default for the |
|
|
If |
21.1. Configuring resource monitoring operations Copy linkLink copied to clipboard!
You can configure monitoring operations when you create a resource.
# pcs resource create resource_id standard:provider:type|type [resource_options] [op operation_action operation_options [operation_type operation_options]...]
For example, the following command creates an IPaddr2 resource with a monitoring operation. The new resource is called VirtualIP with an IP address of 192.168.0.99 and a netmask of 24 on eth2. A monitoring operation will be performed every 30 seconds.
# pcs resource create VirtualIP ocf:heartbeat:IPaddr2 ip=192.168.0.99 cidr_netmask=24 nic=eth2 op monitor interval=30s
Adding a monitoring operation to an existing resource
Add a monitoring operation to an existing resource with the following command.
# pcs resource op add resource_id operation_action [operation_properties]
Deleting a configured resource operation
Use the following command to delete a configured resource operation.
# pcs resource op remove resource_id operation_name operation_properties
+
You must specify the exact operation properties to properly remove an existing operation.
21.2. Modifying resource monitoring options Copy linkLink copied to clipboard!
To change the values of a monitoring option, you can update the resource.
Procedure
Create an
IPaddr2resource namedVirtualIP.# pcs resource create VirtualIP ocf:heartbeat:IPaddr2 ip=192.168.0.99 cidr_netmask=24 nic=eth2By default, this command creates these operations.
Operations: start interval=0s timeout=20s (VirtualIP-start-timeout-20s) stop interval=0s timeout=20s (VirtualIP-stop-timeout-20s) monitor interval=10s timeout=20s (VirtualIP-monitor-interval-10s)Change the stop timeout option for the stop operation.
# pcs resource update VirtualIP op stop interval=0s timeout=40sDisplay the configured parameters for the resource.
# pcs resource config VirtualIP Resource: VirtualIP (class=ocf provider=heartbeat type=IPaddr2) Attributes: ip=192.168.0.99 cidr_netmask=24 nic=eth2 Operations: start interval=0s timeout=20s (VirtualIP-start-timeout-20s) monitor interval=10s timeout=20s (VirtualIP-monitor-interval-10s) stop interval=0s timeout=40s (VirtualIP-name-stop-interval-0s-timeout-40s)
21.3. Configuring global resource operation defaults Copy linkLink copied to clipboard!
You can change the default value of a resource operation for all resources with the pcs resource op defaults update command.
The following command sets a global default of a timeout value of 240 seconds for all monitoring operations.
# pcs resource op defaults update timeout=240s
The original pcs resource op defaults name=value command, which set resource operation defaults for all resources in previous releases, remains supported unless there is more than one set of defaults configured. However, pcs resource op defaults update is now the preferred version of the command.
Overriding resource-specific operation values
Note that a cluster resource will use the global default only when the option is not specified in the cluster resource definition. By default, resource agents define the timeout option for all operations. For the global operation timeout value to be honored, you must create the cluster resource without the timeout option explicitly or you must remove the timeout option by updating the cluster resource, as in the following command.
# pcs resource update VirtualIP op monitor interval=10s
For example, after setting a global default of a timeout value of 240 seconds for all monitoring operations and updating the cluster resource VirtualIP to remove the timeout value for the monitor operation, the resource VirtualIP will then have timeout values for start, stop, and monitor operations of 20s, 40s and 240s, respectively. The global default value for timeout operations is applied here only on the monitor operation, where the default timeout option was removed by the previous command.
# pcs resource config VirtualIP
Resource: VirtualIP (class=ocf provider=heartbeat type=IPaddr2)
Attributes: ip=192.168.0.99 cidr_netmask=24 nic=eth2
Operations: start interval=0s timeout=20s (VirtualIP-start-timeout-20s)
monitor interval=10s (VirtualIP-monitor-interval-10s)
stop interval=0s timeout=40s (VirtualIP-name-stop-interval-0s-timeout-40s)
Changing the default value of a resource operation for sets of resources
You can create multiple sets of resource operation defaults with the pcs resource op defaults set create command, which allows you to specify a rule that contains resource and operation expressions. All of the rule expressions supported by Pacemaker are allowed.
With this command, you can configure a default resource operation value for all resources of a particular type. For example, it is now possible to configure implicit podman resources created by Pacemaker when bundles are in use.
The following command sets a default timeout value of 90s for all operations for all podman resources. In this example, ::podman means a resource of any class, any provider, of type podman.
The id option, which names the set of resource operation defaults, is not mandatory. If you do not set this option, pcs will generate an ID automatically. Setting this value allows you to provide a more descriptive name.
# pcs resource op defaults set create id=podman-timeout meta timeout=90s rule resource ::podman
The following command sets a default timeout value of 120s for the stop operation for all resources.
# pcs resource op defaults set create id=stop-timeout meta timeout=120s rule op stop
It is possible to set the default timeout value for a specific operation for all resources of a particular type. The following example sets a default timeout value of 120s for the stop operation for all podman resources.
# pcs resource op defaults set create id=podman-stop-timeout meta timeout=120s rule resource ::podman and op stop
Displaying currently configured resource operation default values
The pcs resource op defaults command displays a list of currently configured default values for resource operations, including any rules you specified.
The following command displays the default operation values for a cluster which has been configured with a default timeout value of 90s for all operations for all podman resources, and for which an ID for the set of resource operation defaults has been set as podman-timeout.
# pcs resource op defaults
Meta Attrs: podman-timeout
timeout=90s
Rule: boolean-op=and score=INFINITY
Expression: resource ::podman
The following command displays the default operation values for a cluster which has been configured with a default timeout value of 120s for the stop operation for all podman resources, and for which an ID for the set of resource operation defaults has been set as podman-stop-timeout.
# pcs resource op defaults]
Meta Attrs: podman-stop-timeout
timeout=120s
Rule: boolean-op=and score=INFINITY
Expression: resource ::podman
Expression: op stop
21.4. Configuring multiple monitoring operations Copy linkLink copied to clipboard!
You can configure a single resource with as many monitor operations as a resource agent supports. In this way you can do a superficial health check every minute and progressively more intense ones at higher intervals.
When configuring multiple monitor operations, you must ensure that no two operations are performed at the same interval.
Procedure
To configure additional monitoring operations for a resource that supports more in-depth checks at different levels, add an
OCF_CHECK_LEVEL=noption.For example, if you configure the following
IPaddr2resource, by default this creates a monitoring operation with an interval of 10 seconds and a timeout value of 20 seconds.# pcs resource create VirtualIP ocf:heartbeat:IPaddr2 ip=192.168.0.99 cidr_netmask=24 nic=eth2If the Virtual IP supports a different check with a depth of 10, the following command causes Pacemaker to perform the more advanced monitoring check every 60 seconds in addition to the normal Virtual IP check every 10 seconds. (As noted, you should not configure the additional monitoring operation with a 10-second interval as well.)
# pcs resource op add VirtualIP monitor interval=60s OCF_CHECK_LEVEL=10
21.5. Disabling a monitoring operation Copy linkLink copied to clipboard!
The easiest way to stop a recurring monitor is to delete it. However, there can be times when you only want to disable it temporarily. In such cases, add enabled="false" to the operation’s definition. When you want to reinstate the monitoring operation, set enabled="true" to the operation’s definition.
When you update a resource’s operation with the pcs resource update command, any options you do not specifically call out are reset to their default values. For example, if you have configured a monitoring operation with a custom timeout value of 600, running the following commands will reset the timeout value to the default value of 20 (or whatever you have set the default value to with the pcs resource op defaults command).
Procedure
Temporarily disable and reinstate the monitoring operation:
# pcs resource update resourceXZY op monitor enabled=false # pcs resource update resourceXZY op monitor enabled=trueIn order to maintain the original value of 600 for this option, when you reinstate the monitoring operation you must specify that value, as in the following example:
# pcs resource update resourceXZY op monitor timeout=600 enabled=true
Chapter 22. Pacemaker cluster properties Copy linkLink copied to clipboard!
Cluster properties control how the cluster behaves when confronted with situations that might occur during cluster operation.
22.1. Summary of cluster properties and options Copy linkLink copied to clipboard!
This table summaries the Pacemaker cluster properties, showing the default values of the properties and the possible values you can set for those properties.
There are additional cluster properties that determine fencing behavior. For information about these properties, see the table of cluster properties that determine fencing behavior in General properties of fencing devices.
In addition to the properties described in this table, there are additional cluster properties that are exposed by the cluster software. For these properties, it is recommended that you not change their values from their defaults.
| Option | Default | Description |
|---|---|---|
|
| 0 | The number of resource actions that the cluster is allowed to execute in parallel. The "correct" value will depend on the speed and load of your network and cluster nodes. The default value of 0 means that the cluster will dynamically impose a limit when any node has a high CPU load. |
|
| -1 (unlimited) | The number of migration jobs that the cluster is allowed to execute in parallel on a node. |
|
| stop | What to do when the cluster does not have quorum. Allowed values: * ignore - continue all resource management * freeze - continue resource management, but do not recover resources from nodes not in the affected partition * stop - stop all resources in the affected cluster partition * suicide - fence all nodes in the affected cluster partition * demote - if a cluster partition loses quorum, demote any promoted resources and stop all other resources |
|
| true | Indicates whether resources can run on any node by default. |
|
| 60s | Round trip delay over the network (excluding action execution). The "correct" value will depend on the speed and load of your network and cluster nodes. |
|
| 20s | How long to wait for a response from other nodes during startup. The "correct" value will depend on the speed and load of your network and the type of switches used. |
|
| true | Indicates whether deleted resources should be stopped. |
|
| true | Indicates whether deleted actions should be canceled. |
|
| true |
Indicates whether a failure to start a resource on a particular node prevents further start attempts on that node. When set to
Setting |
|
| -1 (all) | The number of scheduler inputs resulting in ERRORs to save. Used when reporting problems. |
|
| -1 (all) | The number of scheduler inputs resulting in WARNINGs to save. Used when reporting problems. |
|
| -1 (all) | The number of "normal" scheduler inputs to save. Used when reporting problems. |
|
| The messaging stack on which Pacemaker is currently running. Used for informational and diagnostic purposes; not user-configurable. | |
|
| Version of Pacemaker on the cluster’s Designated Controller (DC). Used for diagnostic purposes; not user-configurable. | |
|
| 15 minutes |
Pacemaker is primarily event-driven, and looks ahead to know when to recheck the cluster for failure timeouts and most time-based rules. Pacemaker will also recheck the cluster after the duration of inactivity specified by this property. This cluster recheck has two purposes: rules with |
|
| false | Maintenance Mode tells the cluster to go to a "hands off" mode, and not start or stop any services until told otherwise. When maintenance mode is completed, the cluster does a sanity check of the current state of any services, and then stops or starts any that need it. |
|
| 20min | The time after which to give up trying to shut down gracefully and just exit. Advanced use only. |
|
| false | Should the cluster stop all resources. |
|
| false |
Indicates whether the cluster can use access control lists, as set with the |
|
| default | Indicates whether and how the cluster will take utilization attributes into account when determining resource placement on cluster nodes. |
|
| none | When used in conjunction with a health resource agent, controls how Pacemaker responds to changes in node health. Allowed values:
*
*
*
* |
22.2. Setting and removing cluster properties Copy linkLink copied to clipboard!
Modify cluster behavior by setting or removing global properties. These settings determine how the cluster manages resource placement, fencing, and node failures. When you remove a specific configuration, the property reverts to its default system value.
Procedure
Set the value of a cluster property:
# pcs property set property=valueFor example, set the value of
symmetric-clustertofalse:# pcs property set symmetric-cluster=falseRemove a cluster property from the configuration:
# pcs property unset propertyAlternately, you can remove a cluster property from a configuration by leaving the value field of the
pcs property setcommand blank. This restores that property to its default value. For example, if you have previously set thesymmetric-clusterproperty tofalse, the following command removes the value you have set from the configuration and restores the value ofsymmetric-clustertotrue, which is its default value:# pcs property set symmetic-cluster=
22.3. Querying cluster property settings Copy linkLink copied to clipboard!
Query cluster properties to audit global configuration settings, such as fencing defaults, migration thresholds, and resource constraints. You can view explicitly configured values, specific properties, or the complete list of defaults to understand how the cluster manages resources and nodes.
Procedure
Display the values of the property settings that have been set for the cluster:
# pcs property configDisplay all of the values of the property settings for the cluster, including the default values of the property settings that have not been explicitly set:
# pcs property config --allDisplay the current value of a specific cluster property:
# pcs property config propertyFor example, to display the current value of the
cluster-infrastructureproperty, execute the following command:# pcs property config cluster-infrastructure Cluster Properties: cluster-infrastructure: cmanFor informational purposes, you can display a list of all of the default values for the properties, whether they have been set to a value other than the default or not, by using the following command.
# pcs property [config] --defaults
22.4. Exporting cluster properties as pcs commands Copy linkLink copied to clipboard!
You can display the pcs commands that can be used to re-create configured cluster properties on a different system using the --output-format=cmd option of the pcs property config command.
Procedure
The following command sets the
migration-limitcluster property to 10:# pcs property set migration-limit=10After you set the cluster property, the following command displays the
pcscommand you can use to set the cluster property on a different system.# pcs property config --output-format=cmd pcs property set --force -- \ migration-limit=10 \ placement-strategy=minimal
Chapter 23. Configuring resources to remain stopped on clean node shutdown Copy linkLink copied to clipboard!
Pacemaker defaults to failing over resources during shutdown. However, you can configure resources to lock to a node during a clean shutdown. This prevents failover, allowing you to power down nodes for maintenance when service outages are acceptable.
23.1. Cluster properties to configure resources to remain stopped on clean node shutdown Copy linkLink copied to clipboard!
The ability to prevent resources from failing over on a clean node shutdown is implemented by means of the following cluster properties.
shutdown-lockWhen this cluster property is set to the default value of
false, the cluster will recover resources that are active on nodes being cleanly shut down. When this property is set totrue, resources that are active on the nodes being cleanly shut down are unable to start elsewhere until they start on the node again after it rejoins the cluster.The
shutdown-lockproperty will work for either cluster nodes or remote nodes, but not guest nodes.If
shutdown-lockis set totrue, you can remove the lock on one cluster resource when a node is down so that the resource can start elsewhere by performing a manual refresh on the node with the following ommand.pcs resource refresh resource node=nodenameNote that once the resources are unlocked, the cluster is free to move the resources elsewhere. You can control the likelihood of this occurring by using stickiness values or location preferences for the resource.
NoteA manual refresh will work with remote nodes only if you first run the following commands:
-
Run the
systemctl stop pacemaker_remotecommand on the remote node to stop the node. -
Run the
pcs resource disable remote-connection-resourcecommand.
You can then perform a manual refresh on the remote node.
-
Run the
shutdown-lock-limitWhen this cluster property is set to a time other than the default value of 0, resources will be available for recovery on other nodes if the node does not rejoin within the specified time since the shutdown was initiated.
NoteThe
shutdown-lock-limitproperty will work with remote nodes only if you first run the following commands:-
Run the
systemctl stop pacemaker_remotecommand on the remote node to stop the node. -
Run the
pcs resource disable remote-connection-resourcecommand.
After you run these commands, the resources that had been running on the remote node will be available for recovery on other nodes when the amount of time specified as the
shutdown-lock-limithas passed.-
Run the
23.2. Setting the shutdown-lock cluster property Copy linkLink copied to clipboard!
Set the shutdown-lock cluster property to true in an example cluster and show the effect this has when the node is shut down and started again. This example cluster consists of three nodes: z1.example.com, z2.example.com, and z3.example.com.
Procedure
Set the
shutdown-lockproperty to totrueand verify its value. In this example theshutdown-lock-limitproperty maintains its default value of 0.[root@z3 ~]# pcs property set shutdown-lock=true [root@z3 ~]# pcs property list --all | grep shutdown-lock shutdown-lock: true shutdown-lock-limit: 0Check the status of the cluster. In this example, resources
thirdandfifthare running onz1.example.com.[root@z3 ~]# pcs status ... Full List of Resources: ... * first (ocf::pacemaker:Dummy): Started z3.example.com * second (ocf::pacemaker:Dummy): Started z2.example.com * third (ocf::pacemaker:Dummy): Started z1.example.com * fourth (ocf::pacemaker:Dummy): Started z2.example.com * fifth (ocf::pacemaker:Dummy): Started z1.example.com ...Shut down
z1.example.com, which will stop the resources that are running on that node.[root@z3 ~] # pcs cluster stop z1.example.com Stopping Cluster (pacemaker)... Stopping Cluster (corosync)...Running the
pcs statuscommand shows that nodez1.example.comis offline and that the resources that had been running onz1.example.comareLOCKEDwhile the node is down.[root@z3 ~]# pcs status ... Node List: * Online: [ z2.example.com z3.example.com ] * OFFLINE: [ z1.example.com ] Full List of Resources: ... * first (ocf::pacemaker:Dummy): Started z3.example.com * second (ocf::pacemaker:Dummy): Started z2.example.com * third (ocf::pacemaker:Dummy): Stopped z1.example.com (LOCKED) * fourth (ocf::pacemaker:Dummy): Started z3.example.com * fifth (ocf::pacemaker:Dummy): Stopped z1.example.com (LOCKED) ...Start cluster services again on
z1.example.comso that it rejoins the cluster. Locked resources should get started on that node, although once they start they will not not necessarily remain on the same node.[root@z3 ~]# pcs cluster start z1.example.com Starting Cluster...In this example, resouces
thirdandfifthare recovered on nodez1.example.com.[root@z3 ~]# pcs status ... Node List: * Online: [ z1.example.com z2.example.com z3.example.com ] Full List of Resources: .. * first (ocf::pacemaker:Dummy): Started z3.example.com * second (ocf::pacemaker:Dummy): Started z2.example.com * third (ocf::pacemaker:Dummy): Started z1.example.com * fourth (ocf::pacemaker:Dummy): Started z3.example.com * fifth (ocf::pacemaker:Dummy): Started z1.example.com ...
Chapter 24. Configuring a node placement strategy Copy linkLink copied to clipboard!
Pacemaker decides where to place a resource according to the resource allocation scores on every node. The resource will be allocated to the node where the resource has the highest score. This allocation score is derived from a combination of factors, including resource constraints, resource-stickiness settings, prior failure history of a resource on each node, and utilization of each node.
If the resource allocation scores on all the nodes are equal, by the default placement strategy Pacemaker will choose a node with the least number of allocated resources for balancing the load. If the number of resources on each node is equal, the first eligible node listed in the CIB will be chosen to run the resource.
Often, however, different resources use significantly different proportions of a node’s capacities (such as memory or I/O). You cannot always balance the load ideally by taking into account only the number of resources allocated to a node. In addition, if resources are placed such that their combined requirements exceed the provided capacity, they may fail to start completely or they may run with degraded performance. To take these factors into account, Pacemaker allows you to configure the following components:
- the capacity a particular node provides
- the capacity a particular resource requires
- an overall strategy for placement of resources
24.1. Utilization attributes and placement strategy Copy linkLink copied to clipboard!
To configure the capacity that a node provides or a resource requires, you can use utilization attributes for nodes and resources. You do this by setting a utilization variable for a resource and assigning a value to that variable to indicate what the resource requires, and then setting that same utilization variable for a node and assigning a value to that variable to indicate what that node provides.
You can name utilization attributes according to your preferences and define as many name and value pairs as your configuration needs. The values of utilization attributes must be integers.
Configuring node and resource capacity
The following example procedure configures utilization attributes for two nodes and then specifies the same utilization attributes that three different resources require. A node is considered eligible for a resource if it has sufficient free capacity to satisfy the resource’s requirements, as defined by the utilization attributes.
Displaying node utilization as pcs commands
You can export node utilization as a series of pcs commands by using the --output-format=cmd option. This is useful for scripting, automation, or replicating the same configuration on a different system.
You can display the configured node utilization in one of three formats:
-
text: Displays the output in plain text. This is the default format. -
json: Displays the output in a machine-readable JSON format, which is useful for scripting and automation. -
cmd: Displays the output as a series ofpcscommands, which you can use to recreate the same node utilization on a different system.
To display the configured node utilization as a series of
pcscommands:# pcs node utilization --output-format=cmdExample output
pcs node utilization node1 cpu=2 pcs node utilization node2 cpu=4
Configuring placement strategy
After you have configured the capacities your nodes provide and the capacities your resources require, you need to set the placement-strategy cluster property, otherwise the capacity configurations have no effect.
Four values are available for the placement-strategy cluster property:
-
default- Utilization values are not taken into account at all. Resources are allocated according to allocation scores. If scores are equal, resources are evenly distributed across nodes. -
utilization- Utilization values are taken into account only when deciding whether a node is considered eligible (that is, whether it has sufficient free capacity to satisfy the resource’s requirements). Load-balancing is still done based on the number of resources allocated to a node. -
balanced- Utilization values are taken into account when deciding whether a node is eligible to serve a resource and when load-balancing, so an attempt is made to spread the resources in a way that optimizes resource performance. -
minimal- Utilization values are taken into account only when deciding whether a node is eligible to serve a resource. For load-balancing, an attempt is made to concentrate the resources on as few nodes as possible, thereby enabling possible power savings on the remaining nodes.
The following example command sets the value of placement-strategy to balanced. After running this command, Pacemaker will ensure the load from your resources will be distributed evenly throughout the cluster, without the need for complicated sets of colocation constraints.
# pcs property set placement-strategy=balanced
Procedure
Configure utilization attributes for
node1. This command configures a node attribute of CPU capacity, setting this attribute as the variablecpuand definingnode1as providing a CPU capacity of two. This command also defines a utilization attribute of RAM capacity, setting this attribute as the variablememoryand definingnode1as providing a RAM capacity of 2048.# pcs node utilization node1 cpu=2 memory=2048Configure utilization attributes for
node2. This command configures a node attribute of CPU capacity, setting this attribute as the variablecpuand definingnode2as providing a CPU capacity of four. This command also defines a utilization attribute of RAM capacity, setting this attribute as the variablememoryand definingnode2as providing a RAM capacity of 2048.# pcs node utilization node2 cpu=4 memory=2048Specify the utilization attributes that resource
dummy-smallrequires. In this example,dummy-smallrequires a CPU capacity of 1 and a RAM capacity of 1024.# pcs resource utilization dummy-small cpu=1 memory=1024Specify the utilization attributes that resource
dummy-mediumrequires. In this example,dummy-mediumrequires a CPU capacity of 2 and a RAM capacity of 2048.# pcs resource utilization dummy-medium cpu=2 memory=2048Specify the utilization attributes that resource
dummy-largerequires. In this example,dummy-requiresrequires a CPU capacity of 1 and a RAM capacity of 3072.# pcs resource utilization dummy-large cpu=3 memory=3072
24.2. Pacemaker resource allocation Copy linkLink copied to clipboard!
Pacemaker allocates resources according to node preference, node capacity, and resource allocation preference.
24.2.1. Node preference Copy linkLink copied to clipboard!
Pacemaker determines which node is preferred when allocating resources according to the following strategy.
- The node with the highest node weight gets consumed first. Node weight is a score maintained by the cluster to represent node health.
If multiple nodes have the same node weight:
If the
placement-strategycluster property isdefaultorutilization:- The node that has the least number of allocated resources gets consumed first.
- If the numbers of allocated resources are equal, the first eligible node listed in the CIB gets consumed first.
If the
placement-strategycluster property isbalanced:- The node that has the most free capacity gets consumed first.
- If the free capacities of the nodes are equal, the node that has the least number of allocated resources gets consumed first.
- If the free capacities of the nodes are equal and the number of allocated resources is equal, the first eligible node listed in the CIB gets consumed first.
-
If the
placement-strategycluster property isminimal, the first eligible node listed in the CIB gets consumed first.
24.2.2. Node capacity Copy linkLink copied to clipboard!
Pacemaker determines which node has the most free capacity according to the following strategy.
- If only one type of utilization attribute has been defined, free capacity is a simple numeric comparison.
If multiple types of utilization attributes have been defined, then the node that is numerically highest in the most attribute types has the most free capacity. For example:
- If NodeA has more free CPUs, and NodeB has more free memory, then their free capacities are equal.
- If NodeA has more free CPUs, while NodeB has more free memory and storage, then NodeB has more free capacity.
24.2.3. Resource allocation preference Copy linkLink copied to clipboard!
Pacemaker determines which resource is allocated first according to the following strategy.
- The resource that has the highest priority gets allocated first. You can set a resource’s priority when you create the resource.
- If the priorities of the resources are equal, the resource that has the highest score on the node where it is running gets allocated first, to prevent resource shuffling.
- If the resource scores on the nodes where the resources are running are equal or the resources are not running, the resource that has the highest score on the preferred node gets allocated first. If the resource scores on the preferred node are equal in this case, the first runnable resource listed in the CIB gets allocated first.
24.3. Resource placement strategy guidelines Copy linkLink copied to clipboard!
Optimize Pacemaker’s resource placement strategy by ensuring sufficient physical capacity, configuring buffers for overcommitment, and defining clear resource priorities.
Make sure that you have sufficient physical capacity.
If the physical capacity of your nodes is being used to near maximum under normal conditions, then problems could occur during failover. Even without the utilization feature, you may start to experience timeouts and secondary failures.
Build some buffer into the capabilities you configure for the nodes.
Advertise slightly more node resources than you physically have, on the assumption the that a Pacemaker resource will not use 100% of the configured amount of CPU, memory, and so forth all the time. This practice is sometimes called overcommit.
Specify resource priorities.
If the cluster is going to sacrifice services, it should be the ones you care about least. Ensure that resource priorities are properly set so that your most important resources are scheduled first.
24.4. The NodeUtilization resource agent Copy linkLink copied to clipboard!
The NodeUtilization resoure agent can detect the system parameters of available CPU, host memory availability, and hypervisor memory availability and add these parameters into the CIB. You can run the agent as a clone resource to have it automatically populate these parameters on each node.
For information about the NodeUtilization resource agent and the resource options for this agent, run the pcs resource describe NodeUtilization command.
Chapter 25. Configuring a virtual domain as a resource Copy linkLink copied to clipboard!
You can configure a virtual domain that is managed by the libvirt virtualization framework as a cluster resource with the pcs resource create command, specifying VirtualDomain as the resource type.
When configuring a virtual domain as a resource, take the following considerations into account:
- A virtual domain should be stopped before you configure it as a cluster resource.
- Once a virtual domain is a cluster resource, it should not be started, stopped, or migrated except through the cluster tools.
- Do not configure a virtual domain that you have configured as a cluster resource to start when its host boots.
- All nodes allowed to run a virtual domain must have access to the necessary configuration files and storage devices for that virtual domain.
If you want the cluster to manage services within the virtual domain itself, you can configure the virtual domain as a guest node.
25.1. Virtual domain resource options Copy linkLink copied to clipboard!
Configure options for a VirtualDomain resource to control virtual machine behavior.
| Field | Default | Description |
|---|---|---|
|
|
(required) Absolute path to the | |
|
| System dependent |
Hypervisor URI to connect to. You can determine the system’s default URI by running the |
|
|
|
Always forcefully shut down ("destroy") the domain on stop. The default behavior is to resort to a forceful shutdown only after a graceful shutdown attempt has failed. You should set this to |
|
| System dependent |
Transport used to connect to the remote hypervisor while migrating. If this parameter is omitted, the resource will use |
|
| Use a dedicated migration network. The migration URI is composed by adding this parameter’s value to the end of the node name. If the node name is a fully qualified domain name (FQDN), insert the suffix immediately prior to the first period (.) in the FQDN. Ensure that this composed host name is locally resolvable and the associated IP address is reachable through the favored network. | |
|
|
To additionally monitor services within the virtual domain, add this parameter with a list of scripts to monitor. Note: When monitor scripts are used, the | |
|
|
|
If set to |
|
|
|
If set it true, the agent will detect the number of |
|
| random highport |
This port will be used in the |
|
|
Path to the snapshot directory where the virtual machine image will be stored. When this parameter is set, the virtual machine’s RAM state will be saved to a file in the snapshot directory when stopped. If on start a state file is present for the domain, the domain will be restored to the same state it was in right before it stopped last. This option is incompatible with the |
In addition to the VirtualDomain resource options, you can configure the allow-migrate metadata option to allow live migration of the resource to another node. When this option is set to true, the resource can be migrated without loss of state. When this option is set to false, which is the default state, the virtual domain will be shut down on the first node and then restarted on the second node when it is moved from one node to the other.
25.2. Creating the virtual domain resource Copy linkLink copied to clipboard!
Create a VirtualDomain resource in a cluster for a virtual machine you have previously created.
Procedure
To create the
VirtualDomainresource agent for the management of the virtual machine, Pacemaker requires the virtual machine’sxmlconfiguration file to be dumped to a file on disk. For example, if you created a virtual machine namedguest1, dump thexmlfile to a file somewhere on one of the cluster nodes that will be allowed to run the guest. You can use a file name of your choosing; this example uses/etc/pacemaker/guest1.xml.# virsh dumpxml guest1 > /etc/pacemaker/guest1.xml-
Copy the virtual machine’s
xmlconfiguration file to all of the other cluster nodes that will be allowed to run the guest, in the same location on each node. - Ensure that all of the nodes allowed to run the virtual domain have access to the necessary storage devices for that virtual domain.
- Separately test that the virtual domain can start and stop on each node that will run the virtual domain.
- If it is running, shut down the guest node. Pacemaker will start the node when it is configured in the cluster. The virtual machine should not be configured to start automatically when the host boots.
Configure the
VirtualDomainresource with thepcs resource createcommand. For example, the following command configures aVirtualDomainresource namedVM. Since theallow-migrateoption is set totrueapcs resource move VM nodeXcommand would be done as a live migration.In this example
migration_transportis set tossh. Note that for SSH migration to work properly, keyless logging must work between nodes.# pcs resource create VM VirtualDomain config=/etc/pacemaker/guest1.xml migration_transport=ssh meta allow-migrate=true
Chapter 26. Configuring cluster quorum Copy linkLink copied to clipboard!
To maintain cluster integrity and availability, cluster systems use a concept known as quorum to prevent data corruption and loss. A cluster has quorum when more than half of the cluster nodes are online. To mitigate the chance of data corruption due to failure, Pacemaker by default stops all resources if the cluster does not have quorum.
Quorum is established using a voting system. When a cluster node does not function as it should or loses communication with the rest of the cluster, the majority working nodes can vote to isolate and, if needed, fence the node for servicing.
For example, in a 6-node cluster, quorum is established when at least 4 cluster nodes are functioning. If the majority of nodes go offline or become unavailable, the cluster no longer has quorum and Pacemaker stops clustered services.
The quorum features in Pacemaker prevent what is also known as split-brain, a phenomenon where the cluster is separated from communication but each part continues working as separate clusters, potentially writing to the same data and possibly causing corruption or loss. For more information about what it means to be in a split-brain state, and on quorum concepts in general, see the Red Hat Knowledgebase article Exploring Concepts of RHEL High Availability Clusters - Quorum.
A Red Hat Enterprise Linux High Availability Add-On cluster uses the votequorum service, in conjunction with fencing, to avoid split brain situations. A number of votes is assigned to each system in the cluster, and cluster operations are allowed to proceed only when a majority of votes is present. The service must be loaded into all nodes or none; if it is loaded into a subset of cluster nodes, the results will be unpredictable. For information about the configuration and operation of the votequorum service, see the votequorum(5) man page on your system.
26.1. Configuring quorum options Copy linkLink copied to clipboard!
There are some special features of quorum configuration that you can set when you create a cluster with the pcs cluster setup command.
| Option | Description |
|---|---|
|
|
When enabled, the cluster can suffer up to 50% of the nodes failing at the same time, in a deterministic fashion. The cluster partition, or the set of nodes that are still in contact with the
The
The |
|
| When enabled, the cluster will be quorate for the first time only after all nodes have been visible at least once at the same time.
The
The |
|
|
When enabled, the cluster can dynamically recalculate |
|
|
The time, in milliseconds, to wait before recalculating |
For further information about configuring and using these options, see the votequorum(5) man page on your system.
26.2. Modifying quorum options Copy linkLink copied to clipboard!
You can modify general quorum options for your cluster with the pcs quorum update command. Executing this command requires that the cluster be stopped. For information on the quorum options, see the votequorum(5) man page on your system.
The format of the pcs quorum update command is as follows.
pcs quorum update [auto_tie_breaker=[0|1]] [last_man_standing=[0|1]] [last_man_standing_window=[time-in-ms] [wait_for_all=[0|1]]
The following example procedure modifies the wait_for_all quorum option and displays the updated status of the option. Note that the system does not allow you to modify this option while the cluster is running.
Procedure
Attempt to modify the
wait_for_allquorum option on a running system.[root@node1:~]# pcs quorum update wait_for_all=1 Checking corosync is not running on nodes... Error: node1: corosync is running Error: node2: corosync is runningStop the cluster processes to modify the
wait_for_allquorum option.[root@node1:~]# pcs cluster stop --all node2: Stopping Cluster (pacemaker)... node1: Stopping Cluster (pacemaker)... node1: Stopping Cluster (corosync)... node2: Stopping Cluster (corosync)...Modify the
wait_for_allquorum option on a stopped system.[root@node1:~]# pcs quorum update wait_for_all=1 Checking corosync is not running on nodes... node2: corosync is not running node1: corosync is not running Sending updated corosync.conf to nodes... node1: Succeeded node2: SucceededDisplay the quorum configuration
[root@node1:~]# pcs quorum config Options: wait_for_all: 1
26.3. Displaying quorum configuration and status Copy linkLink copied to clipboard!
Once a cluster is running, you can enter the following cluster quorum commands to display the quorum configuration and status.
Procedure
Display the quorum configuration:
# pcs quorum [config]Display the quorum runtime status:
# pcs quorum status
26.4. Running inquorate clusters Copy linkLink copied to clipboard!
If you take nodes out of a cluster for a long period of time and the loss of those nodes would cause quorum loss, you can change the value of the expected_votes parameter for the live cluster with the pcs quorum expected-votes command. This allows the cluster to continue operation when it does not have quorum.
Changing the expected votes in a live cluster should be done with extreme caution. If less than 50% of the cluster is running because you have manually changed the expected votes, then the other nodes in the cluster could be started separately and run cluster services, causing data corruption and other unexpected results. If you change this value, you should ensure that the wait_for_all parameter is enabled.
Procedure
The following command sets the expected votes in the live cluster to the specified value. This affects the live cluster only and does not change the configuration file; the value of
expected_votesis reset to the value in the configuration file in the event of a reload.# pcs quorum expected-votes votesIn a situation in which you know that the cluster is inquorate but you want the cluster to proceed with resource management, you can use the
pcs quorum unblockcommand to prevent the cluster from waiting for all nodes when establishing quorum.NoteThis command should be used with extreme caution. Before issuing this command, it is imperative that you ensure that nodes that are not currently in the cluster are switched off and have no access to shared resources.
# pcs quorum unblock
Chapter 27. Configuring quorum devices Copy linkLink copied to clipboard!
Configure a quorum device to act as a third-party arbitrator, allowing the cluster to sustain more node failures. This is recommended for clusters with an even number of nodes. In two-node clusters, the device determines which node survives a split-brain scenario.
You must take the following into account when configuring a quorum device.
- It is recommended that a quorum device be run on a different physical network at the same site as the cluster that uses the quorum device. Ideally, the quorum device host should be in a separate rack than the main cluster, or at least on a separate PSU and not on the same network segment as the corosync ring or rings.
- You cannot use more than one quorum device in a cluster at the same time.
-
Although you cannot use more than one quorum device in a cluster at the same time, a single quorum device may be used by several clusters at the same time. Each cluster using that quorum device can use different algorithms and quorum options, as those are stored on the cluster nodes themselves. For example, a single quorum device can be used by one cluster with an
ffsplit(fifty/fifty split) algorithm and by a second cluster with anlms(last man standing) algorithm. - A quorum device should not be run on an existing cluster node.
27.1. Installing quorum device packages Copy linkLink copied to clipboard!
Install the packages you require for configuring a quorum device for a cluster.
Procedure
Install
corosync-qdeviceon the nodes of an existing cluster.[root@node1:~]# dnf install corosync-qdevice [root@node2:~]# dnf install corosync-qdeviceInstall
pcsandcorosync-qnetdon the quorum device host.[root@qdevice:~]# dnf install pcs corosync-qnetdStart the
pcsdservice and enablepcsdat system start on the quorum device host.[root@qdevice:~]# systemctl start pcsd.service [root@qdevice:~]# systemctl enable pcsd.service
27.2. Configuring a quorum device Copy linkLink copied to clipboard!
You can configure a quorum device and add it to the cluster.
In this example:
-
The node used for a quorum device is
qdevice. The quorum device model is
net, which is currently the only supported model. Thenetmodel supports the following algorithms:-
ffsplit: fifty-fifty split. This provides exactly one vote to the partition with the highest number of active nodes. lms: last-man-standing. If the node is the only one left in the cluster that can see theqnetdserver, then it returns a vote.WarningThe LMS algorithm allows the cluster to remain quorate even with only one remaining node, but it also means that the voting power of the quorum device is great since it is the same as number_of_nodes - 1. Losing connection with the quorum device means losing number_of_nodes - 1 votes, which means that only a cluster with all nodes active can remain quorate (by overvoting the quorum device); any other cluster becomes inquorate.
For more detailed information about the implementation of these algorithms, see the
corosync-qdevice(8) man page on your system.
-
-
The cluster nodes are
node1andnode2.
Procedure
On the node that you will use to host your quorum device, configure the quorum device with the following command. This command configures and starts the quorum device model
netand configures the device to start on boot.[root@qdevice:~]# pcs qdevice setup model net --enable --start Quorum device 'net' initialized quorum device enabled Starting quorum device... quorum device startedAfter configuring the quorum device, you can check its status. This should show that the
corosync-qnetddaemon is running and, at this point, there are no clients connected to it. The--fullcommand option provides detailed output.[root@qdevice:~]# pcs qdevice status net --full QNetd address: *:5403 TLS: Supported (client certificate required) Connected clients: 0 Connected clusters: 0 Maximum send/receive size: 32768/32768 bytesEnable the ports on the firewall needed by the
pcsddaemon and thenetquorum device by enabling thehigh-availabilityservice onfirewalldwith following commands.[root@qdevice:~]# firewall-cmd --permanent --add-service=high-availability [root@qdevice:~]# firewall-cmd --add-service=high-availabilityFrom one of the nodes in the existing cluster, authenticate user
haclusteron the node that is hosting the quorum device. This allowspcson the cluster to connect topcson theqdevicehost, but does not allowpcson theqdevicehost to connect topcson the cluster.[root@node1:~] # pcs host auth qdevice Username: hacluster Password: qdevice: AuthorizedAdd the quorum device to the cluster.
Before adding the quorum device, you can check the current configuration and status for the quorum device for later comparison. The output for these commands indicates that the cluster is not yet using a quorum device, and the
Qdevicemembership status for each node isNR(Not Registered).[root@node1:~]# pcs quorum config Options:[root@node1:~]# pcs quorum status Quorum information ------------------ Date: Wed Jun 29 13:15:36 2016 Quorum provider: corosync_votequorum Nodes: 2 Node ID: 1 Ring ID: 1/8272 Quorate: Yes Votequorum information ---------------------- Expected votes: 2 Highest expected: 2 Total votes: 2 Quorum: 1 Flags: 2Node Quorate Membership information ---------------------- Nodeid Votes Qdevice Name 1 1 NR node1 (local) 2 1 NR node2The following command adds the quorum device that you have previously created to the cluster. You cannot use more than one quorum device in a cluster at the same time. However, one quorum device can be used by several clusters at the same time. This example command configures the quorum device to use the
ffsplitalgorithm. For information about the configuration options for the quorum device, see thecorosync-qdevice(8) man page on your system.[root@node1:~]# pcs quorum device add model net host=qdevice algorithm=ffsplit Setting up qdevice certificates on nodes... node2: Succeeded node1: Succeeded Enabling corosync-qdevice... node1: corosync-qdevice enabled node2: corosync-qdevice enabled Sending updated corosync.conf to nodes... node1: Succeeded node2: Succeeded Corosync configuration reloaded Starting corosync-qdevice... node1: corosync-qdevice started node2: corosync-qdevice startedCheck the configuration status of the quorum device.
From the cluster side, you can execute the following commands to see how the configuration has changed.
The
pcs quorum configshows the quorum device that has been configured.[root@node1:~]# pcs quorum config Options: Device: Model: net algorithm: ffsplit host: qdeviceThe
pcs quorum statuscommand shows the quorum runtime status, indicating that the quorum device is in use. The meanings of of theQdevicemembership information status values for each cluster node are as follows:-
A/NA- The quorum device is alive or not alive, indicating whether there is a heartbeat betweenqdeviceandcorosync. This should always indicate that the quorum device is alive. -
V/NV-Vis set when the quorum device has given a vote to a node. In this example, both nodes are set toVsince they can communicate with each other. If the cluster were to split into two single-node clusters, one of the nodes would be set toVand the other node would be set toNV. MW/NMW- The internal quorum device flag is set (MW) or not set (NMW). By default the flag is not set and the value isNMW.[root@node1:~]# pcs quorum status Quorum information ------------------ Date: Wed Jun 29 13:17:02 2016 Quorum provider: corosync_votequorum Nodes: 2 Node ID: 1 Ring ID: 1/8272 Quorate: Yes Votequorum information ---------------------- Expected votes: 3 Highest expected: 3 Total votes: 3 Quorum: 2 Flags: Quorate Qdevice Membership information ---------------------- Nodeid Votes Qdevice Name 1 1 A,V,NMW node1 (local) 2 1 A,V,NMW node2 0 1 QdeviceThe
pcs quorum device statusshows the quorum device runtime status.[root@node1:~]# pcs quorum device status Qdevice information ------------------- Model: Net Node ID: 1 Configured node list: 0 Node ID = 1 1 Node ID = 2 Membership node list: 1, 2 Qdevice-net information ---------------------- Cluster name: mycluster QNetd host: qdevice:5403 Algorithm: ffsplit Tie-breaker: Node with lowest node ID State: ConnectedFrom the quorum device side, you can execute the following status command, which shows the status of the
corosync-qnetddaemon.[root@qdevice:~]# pcs qdevice status net --full QNetd address: *:5403 TLS: Supported (client certificate required) Connected clients: 2 Connected clusters: 1 Maximum send/receive size: 32768/32768 bytes Cluster "mycluster": Algorithm: ffsplit Tie-breaker: Node with lowest node ID Node ID 2: Client address: ::ffff:192.168.122.122:50028 HB interval: 8000ms Configured node list: 1, 2 Ring ID: 1.2050 Membership node list: 1, 2 TLS active: Yes (client certificate verified) Vote: ACK (ACK) Node ID 1: Client address: ::ffff:192.168.122.121:48786 HB interval: 8000ms Configured node list: 1, 2 Ring ID: 1.2050 Membership node list: 1, 2 TLS active: Yes (client certificate verified) Vote: ACK (ACK)
-
27.3. Managing the quorum device service Copy linkLink copied to clipboard!
PCS provides the ability to manage the quorum device service on the local host (corosync-qnetd) with the pcs qdevice command. Note that these commands affect only the corosync-qnetd service.
[root@qdevice:~]# pcs qdevice start net
[root@qdevice:~]# pcs qdevice stop net
[root@qdevice:~]# pcs qdevice enable net
[root@qdevice:~]# pcs qdevice disable net
[root@qdevice:~]# pcs qdevice kill net
27.4. Managing a quorum device in a cluster Copy linkLink copied to clipboard!
There are a variety of pcs commands that you can use to change the quorum device settings in a cluster, disable a quorum device, and remove a quorum device.
Changing quorum device settings
You can change the setting of a quorum device with the pcs quorum device update command.
To change the host option of quorum device model net, use the pcs quorum device remove and the pcs quorum device add commands to set up the configuration properly, unless the old and the new host are the same machine.
The following command changes the quorum device algorithm to lms:
[root@node1:~]# pcs quorum device update model algorithm=lms
Sending updated corosync.conf to nodes...
node1: Succeeded
node2: Succeeded
Corosync configuration reloaded
Reloading qdevice configuration on nodes...
node1: corosync-qdevice stopped
node2: corosync-qdevice stopped
node1: corosync-qdevice started
node2: corosync-qdevice started
Removing a quorum device
The following command removes a quorum device configured on a cluster node:
[root@node1:~]# pcs quorum device remove
Sending updated corosync.conf to nodes...
node1: Succeeded
node2: Succeeded
Corosync configuration reloaded
Disabling corosync-qdevice...
node1: corosync-qdevice disabled
node2: corosync-qdevice disabled
Stopping corosync-qdevice...
node1: corosync-qdevice stopped
node2: corosync-qdevice stopped
Removing qdevice certificates from nodes...
node1: Succeeded
node2: Succeeded
+ After you have removed a quorum device, you should see the following error message when displaying the quorum device status.
+
[root@node1:~]# pcs quorum device status
Error: Unable to get quorum status: corosync-qdevice-tool: Can't connect to QDevice socket (is QDevice running?): No such file or directory
Destroying a quorum device
The following command disables and stops a quorum device on the quorum device host and deletes all of its configuration files.
[root@qdevice:~]# pcs qdevice destroy net
Stopping quorum device...
quorum device stopped
quorum device disabled
Quorum device 'net' configuration files removed
Chapter 28. Triggering scripts for cluster events Copy linkLink copied to clipboard!
A Pacemaker cluster is an event-driven system, where an event might be a resource or node failure, a configuration change, or a resource starting or stopping. You can configure Pacemaker cluster alerts to take some external action when a cluster event occurs by means of alert agents, which are external programs that the cluster calls in the same manner as the cluster calls resource agents to handle resource configuration and operation.
The cluster passes information about the event to the agent by means of environment variables. Agents can do anything with this information, such as send an email message or log to a file or update a monitoring system.
-
Pacemaker provides several sample alert agents, which are installed in
/usr/share/pacemaker/alertsby default. These sample scripts may be copied and used as is, or they may be used as templates to be edited to suit your purposes. Refer to the source code of the sample agents for the full set of attributes they support. - If the sample alert agents do not meet your needs, you can write your own alert agents for a Pacemaker alert to call.
28.1. Installing and configuring sample alert agents Copy linkLink copied to clipboard!
When you use one of the sample alert agents, you should review the script to ensure that it suits your needs. These sample agents are provided as a starting point for custom scripts for specific cluster environments.
+
While Red Hat supports the interfaces that the alert agents scripts use to communicate with Pacemaker, Red Hat does not provide support for the custom agents themselves.
28.1.1. Prerequisites Copy linkLink copied to clipboard!
Install the agent on each node in the cluster
# install --mode=0755 /usr/share/pacemaker/alerts/alert_file.sh.sample /var/lib/pacemaker/alert_file.shAfter you have installed the script, you can create an alert that uses the script.
28.1.2. Configuring an alert that uses the alert_file.sh alert agent Copy linkLink copied to clipboard!
You can install the alert_file.sh alert agent and configure an alert to log events to a file. Alert agents run as the user hacluster, which has a minimal set of permissions.
Procedure
On each node in the cluster, install the
alert_file.sh.samplescript asalert_file.sh.# install --mode=0755 /usr/share/pacemaker/alerts/alert_file.sh.sample /var/lib/pacemaker/alert_file.shOn each node in the cluster, create the log file that will be used to record the events.
# touch /var/log/pcmk_alert_file.logChange the ownership of the file
pcmk_alert_file.logto userhacluster.# chown hacluster:haclient /var/log/pcmk_alert_file.logChange the permissions for the file
pcmk_alert_file.logto read and write for the userhacluster.# chmod 600 /var/log/pcmk_alert_file.log
On one node in the cluster, create the alert.
# pcs alert create id=alert_file description="Log events to a file." path=/var/lib/pacemaker/alert_file.shAdd the path to the log file as the recipient for the alert.
# pcs alert recipient add alert_file id=my-alert_logfile value=/var/log/pcmk_alert_file.log
28.1.3. Configuring an alert that uses the alert_snmp.sh alert agent Copy linkLink copied to clipboard!
You can install the alert_snmp.sh.sample script as alert_snmp.sh and configure an alert that uses the installed alert_snmp.sh alert agent to send cluster events as SNMP traps. By default, the script will send all events except successful monitor calls to the SNMP server.
Procedure
On each node in the cluster, install the
alert_snmp.sh.samplescript asalert_snmp.sh.# install --mode=0755 /usr/share/pacemaker/alerts/alert_snmp.sh.sample /var/lib/pacemaker/alert_snmp.shOn one node in the cluster, configure an alert that uses the
alert_snmp.shagent, configuring the timestamp format as a meta option.# pcs alert create id=snmp_alert path=/var/lib/pacemaker/alert_snmp.sh meta timestamp-format="%Y-%m-%d,%H:%M:%S.%01N"Configure a recipient for the alert
# pcs alert recipient add snmp_alert value=192.168.1.2Display the alert configuration.
# pcs alert Alerts: Alert: snmp_alert (path=/var/lib/pacemaker/alert_snmp.sh) Meta options: timestamp-format=%Y-%m-%d,%H:%M:%S.%01N. Recipients: Recipient: snmp_alert-recipient (value=192.168.1.2)
28.1.4. Configuring an alert that uses the alert_smtp.sh alert agent Copy linkLink copied to clipboard!
You can install the alert_smtp.sh agent and then configure an alert that uses the installed alert agent to send cluster events as email messages.
Procedure
On each node in the cluster, install the
alert_smtp.sh.samplescript asalert_smtp.sh.# install --mode=0755 /usr/share/pacemaker/alerts/alert_smtp.sh.sample /var/lib/pacemaker/alert_smtp.shOn one node in the cluster, configure an alert to send cluster events as email messages.
# pcs alert create id=smtp_alert path=/var/lib/pacemaker/alert_smtp.sh options email_sender=donotreply@example.comConfigure a recipient for the alert.
# pcs alert recipient add smtp_alert value=admin@example.comDisplay the alert configuration.
# pcs alert Alerts: Alert: smtp_alert (path=/var/lib/pacemaker/alert_smtp.sh) Options: email_sender=donotreply@example.com Recipients: Recipient: smtp_alert-recipient (value=admin@example.com)
28.2. Creating a cluster alert Copy linkLink copied to clipboard!
You can createe a cluster alert. The options that you configure are agent-specific configuration values that are passed to the alert agent script at the path you specify as additional environment variables. If you do not specify a value for id, one will be generated.
Procedure
Create a cluster alert:
# pcs alert create path=path [id=alert-id] [description=description] [options [option=value]...] [meta [meta-option=value]...]Multiple alert agents may be configured; the cluster will call all of them for each event. Alert agents will be called only on cluster nodes. They will be called for events involving Pacemaker Remote nodes, but they will never be called on those nodes.
The following example creates a simple alert that will call
myscript.shfor each event.# pcs alert create id=my_alert path=/path/to/myscript.sh
28.3. Displaying, modifying, and removing cluster alerts Copy linkLink copied to clipboard!
There are a variety of pcs commands you can use to display, modify, and remove cluster alerts.
Displaying configured cluster alerts
The following command shows all configured alerts along with the values of the configured options:
# pcs alert [config]
Modifying a configured cluster alert
The following command updates an existing alert with the specified alert-id value:
# pcs alert update alert-id [path=path] [description=description] [options [option=value]...] [meta [meta-option=value]...]
Removing a cluster alert
The following command removes an alert with the specified alert-id value:
# pcs alert remove alert-id
Alternately, you can run the pcs alert delete command, which is identical to the pcs alert remove command. Both the pcs alert delete and the pcs alert remove commands allow you to specify more than one alert to be deleted.
28.4. Configuring cluster alert recipients Copy linkLink copied to clipboard!
Configure alerts with one or more recipients. The cluster calls the agent separately for each recipient. The recipient may be anything the alert agent can recognize: an IP address, an email address, a file name, or whatever the particular agent supports.
Adding a recipient to a configured cluster alert
The following command adds a new recipient to the specified alert.
# *pcs alert recipient add alert-id value=recipient-value [id=recipient-id] [description=description] [options [option=value]...] [meta [meta-option=value]...] *
The following example command adds the alert recipient my-alert-recipient with a recipient ID of my-recipient-id to the alert my-alert. This will configure the cluster to call the alert script that has been configured for my-alert for each event, passing the recipient some-address as an environment variable.
# pcs alert recipient add my-alert value=my-alert-recipient id=my-recipient-id options value=some-address
Updating an existing alert recipient
The following command updates an existing alert recipient.
# pcs alert recipient update recipient-id [value=recipient-value] [description=description] [options [option=value]...] [meta [meta-option=value]...]
Removing an alert recipient
The following command removes the specified alert recipient.
# pcs alert recipient remove recipient-id
Alternately, you can run the pcs alert recipient delete command, which is identical to the pcs alert recipient remove command. Both the pcs alert recipient remove and the pcs alert recipient delete commands allow you to remove more than one alert recipient.
28.5. Configuring cluster alert meta options Copy linkLink copied to clipboard!
As with resource agents, meta options can be configured for alert agents to affect how Pacemaker calls them. The following table describes the alert meta options. Meta options can be configured per alert agent as well as per recipient.
| Meta-Attribute | Default | Description |
|---|---|---|
|
|
|
If set to |
|
| %H:%M:%S.%06N |
Format the cluster uses when sending the event’s timestamp to the agent. This is a string as used with the |
|
| 30s | If the alert agent does not complete within this amount of time, it will be terminated. |
The following example procedure configures cluster alert meta options for both an alert agent and for the alert recipients. The procedure configures an alert that calls the script myscript.sh and then adds two recipients to the alert. The script gets called twice for each event.
Procedure
Configure an alert that calls the script
myscript.shand that uses a 15-second timeout.# pcs alert create id=my-alert path=/path/to/myscript.sh meta timeout=15sAdd an alert recipient that has an ID of
my-alert-recipient1, passing the call to the recipientsomeuser@example.comwith a timestamp in the format %D %H:%M.# pcs alert recipient add my-alert value=someuser@example.com id=my-alert-recipient1 meta timestamp-format="%D %H:%M"Add an alert recipient that has an ID of
my-alert-recipient2, passing the call to the recipientotheruser@example.comwith a timestamp in the format %c.# pcs alert recipient add my-alert value=otheruser@example.com id=my-alert-recipient2 meta timestamp-format="%c"
28.6. Displaying configured cluster alerts as pcs commands Copy linkLink copied to clipboard!
You can export cluster alerts as a series of pcs commands by using the --output-format=cmd option. This is useful for scripting, automation, or replicating the same configuration on a different system.
You can display the configured cluster alerts in one of three formats:
-
text: Displays the output in plain text. This is the default format. -
json: Displays the output in a machine-readable JSON format, which is useful for scripting and automation. -
cmd: Displays the output as a series ofpcscommands, which you can use to recreate the same alert or other cluster configuration part on a different system.
Procedure
To display the configured alerts as a series of
pcscommands:# pcs alert config --output-format=cmdExample output
pcs alert create smtp alert_id_1 host=smtp.example.com to=admin@example.com pcs alert create snmp alert_id_2 host=snmp.example.com community=public
28.7. Creating and modifying a cluster alert Copy linkLink copied to clipboard!
You can create alerts, add recipients, modify an alert, remove an alert, and display the configured alerts at each step.
While you must install the alert agents themselves on each node in a cluster, you need to run the pcs commands only once.
Procedure
Create a simple alert. Since no alert ID value is specified, the system creates an alert ID value of
alert.# pcs alert create path=/my/pathAdd a recipient of
rec_valueto the alert. Since this command does not specify a recipient ID, the value ofalert-recipientis used as the recipient ID.# pcs alert recipient add alert value=rec_valueAdd a second recipient of
rec_value2to the alert. This command specifies a recipient ID ofmy-recipientfor the recipient.# pcs alert recipient add alert value=rec_value2 id=my-recipientDisplay the alert configuration.
# pcs alert config Alerts: Alert: alert (path=/my/path) Recipients: Recipient: alert-recipient (value=rec_value) Recipient: my-recipient (value=rec_value2)Add a second alert, with an alert ID of
my-alert.# pcs alert create id=my-alert path=/path/to/script description=alert_description options option1=value1 opt=val meta timeout=50s timestamp-format="%H%B%S"Add a recipient for that alert for the second alert with a recipient value of
my-other-recipient. Since no recipient ID is specified, the system provides a recipient id ofmy-alert-recipient.# pcs alert recipient add my-alert value=my-other-recipientDisplay the alert configuration.
# pcs alert Alerts: Alert: alert (path=/my/path) Recipients: Recipient: alert-recipient (value=rec_value) Recipient: my-recipient (value=rec_value2) Alert: my-alert (path=/path/to/script) Description: alert_description Options: opt=val option1=value1 Meta options: timestamp-format=%H%B%S timeout=50s Recipients: Recipient: my-alert-recipient (value=my-other-recipient)Modify the alert values for the alert
my-alert.# pcs alert update my-alert options option1=newvalue1 meta timestamp-format="%H%M%S"Modify the alert values for the recipient
my-alert-recipient.# pcs alert recipient update my-alert-recipient options option1=new meta timeout=60sDisplay the alert configuration.
# pcs alert Alerts: Alert: alert (path=/my/path) Recipients: Recipient: alert-recipient (value=rec_value) Recipient: my-recipient (value=rec_value2) Alert: my-alert (path=/path/to/script) Description: alert_description Options: opt=val option1=newvalue1 Meta options: timestamp-format=%H%M%S timeout=50s Recipients: Recipient: my-alert-recipient (value=my-other-recipient) Options: option1=new Meta options: timeout=60sRemove the recipient
my-alert-recipientfromalert.# pcs alert recipient remove my-recipientDisplay the alert configuration.
# pcs alert Alerts: Alert: alert (path=/my/path) Recipients: Recipient: alert-recipient (value=rec_value) Alert: my-alert (path=/path/to/script) Description: alert_description Options: opt=val option1=newvalue1 Meta options: timestamp-format="%M%B%S" timeout=50s Recipients: Recipient: my-alert-recipient (value=my-other-recipient) Options: option1=new Meta options: timeout=60sRemove the alert
myalertfrom the configuration.# pcs alert remove myalertDisplay the alert configuration.
# pcs alert Alerts: Alert: alert (path=/my/path) Recipients: Recipient: alert-recipient (value=rec_value)
28.8. Writing a cluster alert agent Copy linkLink copied to clipboard!
There are three types of Pacemaker cluster alerts: node alerts, fencing alerts, and resource alerts. The environment variables that are passed to the alert agents can differ, depending on the type of alert.
- Environment variables passed to alert agents
| Environment Variable | Description |
|---|---|
|
| The type of alert (node, fencing, or resource) |
|
| The version of Pacemaker sending the alert |
|
| The configured recipient |
|
| A sequence number increased whenever an alert is being issued on the local node, which can be used to reference the order in which alerts have been issued by Pacemaker. An alert for an event that happened later in time reliably has a higher sequence number than alerts for earlier events. Be aware that this number has no cluster-wide meaning. |
|
|
A timestamp created prior to executing the agent, in the format specified by the |
|
| Name of affected node |
|
|
Detail about event. For node alerts, this is the node’s current state (member or lost). For fencing alerts, this is a summary of the requested fencing operation, including origin, target, and fencing operation error code, if any. For resource alerts, this is a readable string equivalent of |
|
| ID of node whose status changed (provided with node alerts only) |
|
| The requested fencing or resource operation (provided with fencing and resource alerts only) |
|
| The numerical return code of the fencing or resource operation (provided with fencing and resource alerts only) |
|
| The name of the affected resource (resource alerts only) |
|
| The interval of the resource operation (resource alerts only) |
|
| The expected numerical return code of the operation (resource alerts only) |
|
| A numerical code used by Pacemaker to represent the operation result (resource alerts only) |
When writing an alert agent, you must take the following concerns into account.
- Alert agents may be called with no recipient (if none is configured), so the agent must be able to handle this situation, even if it only exits in that case. Users may modify the configuration in stages, and add a recipient later.
- If more than one recipient is configured for an alert, the alert agent will be called once per recipient. If an agent is not able to run concurrently, it should be configured with only a single recipient. The agent is free, however, to interpret the recipient as a list.
- When a cluster event occurs, all alerts are fired off at the same time as separate processes. Depending on how many alerts and recipients are configured and on what is done within the alert agents, a significant load burst may occur. The agent could be written to take this into consideration, for example by queueing resource-intensive actions into some other instance, instead of directly executing them.
-
Alert agents are run as the
haclusteruser, which has a minimal set of permissions. If an agent requires additional privileges, it is recommended to configuresudoto allow the agent to run the necessary commands as another user with the appropriate privileges. -
Take care to validate and sanitize user-configured parameters, such as
CRM_alert_timestamp(whose content is specified by the user-configuredtimestamp-format),CRM_alert_recipient, and all alert options. This is necessary to protect against configuration errors. In addition, if some user can modify the CIB without havinghacluster-level access to the cluster nodes, this is a potential security concern as well, and you should avoid the possibility of code injection. -
If a cluster contains resources with operations for which the
on-failparameter is set tofence, there will be multiple fence notifications on failure, one for each resource for which this parameter is set plus one additional notification. Both thepacemaker-fencedandpacemaker-controldwill send notifications. Pacemaker performs only one actual fence operation in this case, however, no matter how many notifications are sent.
The alerts interface is designed to be backward compatible with the external scripts interface used by the ocf:pacemaker:ClusterMon resource. To preserve this compatibility, the environment variables passed to alert agents are available prepended with CRM_notify_ as well as CRM_alert_. One break in compatibility is that the ClusterMon resource ran external scripts as the root user, while alert agents are run as the hacluster user.
Chapter 29. Multi-site Pacemaker clusters Copy linkLink copied to clipboard!
Network issues between sites can cause split-brain scenarios, as nodes cannot verify remote status. Distance also complicates synchronous HA. To address this, Pacemaker supports multi-site clusters using the Booth cluster ticket manager.
29.1. Overview of Booth cluster ticket manager Copy linkLink copied to clipboard!
Booth is a distributed ticket manager that operates on a network separate from the networks connecting the site clusters. It creates a loose overlay cluster, the Booth formation, above the site clusters. This layer facilitates consensus-based decisions for individual Booth tickets.
A Booth ticket is a singleton in the Booth formation and represents a time-sensitive, movable unit of authorization. Resources can be configured to require a certain ticket to run. This can ensure that resources are run at only one site at a time, for which a ticket or tickets have been granted.
You can think of a Booth formation as an overlay cluster consisting of clusters running at different sites, where all the original clusters are independent of each other. It is the Booth service which communicates to the clusters whether they have been granted a ticket, and it is Pacemaker that determines whether to run resources in a cluster based on a Pacemaker ticket constraint. This means that when using the ticket manager, each of the clusters can run its own resources as well as shared resources. For example there can be resources A, B and C running only in one cluster, resources D, E, and F running only in the other cluster, and resources G and H running in either of the two clusters as determined by a ticket. It is also possible to have an additional resource J that could run in either of the two clusters as determined by a separate ticket.
29.2. Configuring multi-site clusters with Pacemaker Copy linkLink copied to clipboard!
You can configure a multi-site configuration that uses the Booth ticket manager with the following procedure.
These example commands use the following arrangement:
-
Cluster 1 consists of the nodes
cluster1-node1andcluster1-node2 - Cluster 1 has a floating IP address assigned to it of 192.168.11.100
-
Cluster 2 consists of
cluster2-node1andcluster2-node2 - Cluster 2 has a floating IP address assigned to it of 192.168.22.100
-
The arbitrator node is
arbitrator-nodewith an ip address of 192.168.99.100 -
The name of the Booth ticket that this configuration uses is
apacheticket
These example commands assume that the cluster resources for an Apache service have been configured as part of the resource group apachegroup for each cluster. It is not required that the resources and resource groups be the same on each cluster to configure a ticket constraint for those resources, since the Pacemaker instance for each cluster is independent, but that is a common failover scenario.
Note that at any time in the configuration procedure you can run the pcs booth config command to display the booth configuration for the current node or cluster or the pcs booth status command to display the current status of booth on the local node.
Procedure
Install the
booth-siteBooth ticket manager package on each node of both clusters.[root@cluster1-node1 ~]# dnf install -y booth-site [root@cluster1-node2 ~]# dnf install -y booth-site [root@cluster2-node1 ~]# dnf install -y booth-site [root@cluster2-node2 ~]# dnf install -y booth-siteInstall the
pcs,booth-core, andbooth-arbitratorpackages on the arbitrator node.[root@arbitrator-node ~]# dnf install -y pcs booth-core booth-arbitratorIf you are running the
firewallddaemon, execute the following commands on all nodes in both clusters as well as on the arbitrator node to enable the ports that are required by the Red Hat High Availability Add-On.# firewall-cmd --permanent --add-service=high-availability # firewall-cmd --add-service=high-availabilityYou may need to modify which ports are open to suit local conditions. For more information about the ports that are required by the Red Hat High-Availability Add-On, see Enabling ports for the High Availability Add-On.
Create a Booth configuration on one node of one cluster. The addresses you specify for each cluster and for the arbitrator must be IP addresses. For each cluster, you specify a floating IP address.
[cluster1-node1 ~] # pcs booth setup sites 192.168.11.100 192.168.22.100 arbitrators 192.168.99.100This command creates the configuration files
/etc/booth/booth.confand/etc/booth/booth.keyon the node from which it is run.Create a ticket for the Booth configuration. This is the ticket that you will use to define the resource constraint that will allow resources to run only when this ticket has been granted to the cluster.
This basic failover configuration procedure uses only one ticket, but you can create additional tickets for more complicated scenarios where each ticket is associated with a different resource or resources.
[cluster1-node1 ~] # pcs booth ticket add apacheticketSynchronize the Booth configuration to all nodes in the current cluster.
[cluster1-node1 ~] # pcs booth syncFrom the arbitrator node, pull the Booth configuration to the arbitrator. If you have not previously done so, you must first authenticate
pcsto the node from which you are pulling the configuration.[arbitrator-node ~] # pcs host auth cluster1-node1 [arbitrator-node ~] # pcs booth pull cluster1-node1Pull the Booth configuration to the other cluster and synchronize to all the nodes of that cluster. As with the arbitrator node, if you have not previously done so, you must first authenticate
pcsto the node from which you are pulling the configuration.[cluster2-node1 ~] # pcs host auth cluster1-node1 [cluster2-node1 ~] # pcs booth pull cluster1-node1 [cluster2-node1 ~] # pcs booth syncStart and enable Booth on the arbitrator.
NoteYou must not manually start or enable Booth on any of the nodes of the clusters since Booth runs as a Pacemaker resource in those clusters.
[arbitrator-node ~] # pcs booth start [arbitrator-node ~] # pcs booth enableConfigure Booth to run as a cluster resource on both cluster sites, using the floating IP addresses assigned to each cluster. This creates a resource group with
booth-ipandbooth-serviceas members of that group.[cluster1-node1 ~] # pcs booth create ip 192.168.11.100 [cluster2-node1 ~] # pcs booth create ip 192.168.22.100Add a ticket constraint to the resource group you have defined for each cluster.
[cluster1-node1 ~] # pcs constraint ticket add apacheticket apachegroup [cluster2-node1 ~] # pcs constraint ticket add apacheticket apachegroupYou can enter the following command to display the currently configured ticket constraints.
pcs constraint ticket [config]Grant the ticket you created for this setup to the first cluster.
Note that it is not necessary to have defined ticket constraints before granting a ticket. Once you have initially granted a ticket to a cluster, then Booth takes over ticket management unless you override this manually with the
pcs booth ticket revokecommand. For information about thepcs boothadministration commands, use thepcs booth --helpcommand on your system.[cluster1-node1 ~] # pcs booth ticket grant apacheticketIt is possible to add or remove tickets at any time, even after completing this procedure. After adding or removing a ticket, however, you must synchronize the configuration files to the other nodes and clusters as well as to the arbitrator and grant the ticket as is shown in this procedure.
For a full procedure to remove a Booth ticket, see Removing a Booth ticket. For information about additional Booth administration commands, use the
pcs booth --helpcommand.
29.3. Removing a Booth ticket Copy linkLink copied to clipboard!
After you remove a Booth cluster ticket by using the pcs booth ticket remove command, the state of the Booth ticket remains loaded in the Cluster Information Base (CIB). This is also the case after you remove a ticket from the Booth configuration on one site and pull the Booth configuration to another site by using the pcs booth pull command.
This might cause problems when you configure a ticket constraint, because a ticket constraint can be granted even after a ticket has been removed. As a consequence, the cluster might freeze or fence a node. To prevent this, you can remove a Booth ticket from the CIB with the pcs booth ticket cleanup command.
Prerequisites
- You have set up a multi-site configuration that uses the Booth ticket manager. For instructions, see Configuring multi-site clusters with Pacemaker.
The configured example uses the following arrangement:
-
Cluster 1 consists of the nodes
cluster1-node1andcluster1-node2. -
Cluster 2 consists of the nodes
cluster2-node1andcluster2-node2. -
The arbitrator node is named
arbitrator-node. -
The name of the Booth ticket that this configuration uses is
apacheticket.
-
Cluster 1 consists of the nodes
Procedure
From a cluster node in one cluster site of the Booth configuration:
Put the ticket to remove in
standbymode. The ticket that this example uses is namedapacheticket.[cluster1-node1 ~]# pcs booth ticket standby apacheticketRemove the ticket from the Booth configuration.
[cluster1-node1 ~]# pcs booth ticket remove apacheticketSynchronize the Booth configuration to all nodes in the current cluster.
[cluster1-node1 ~]# pcs booth syncRestart the Booth resource in the current cluster.
[cluster1-node1 ~]# pcs booth restartRemove the ticket from the CIB in the current cluster.
[cluster1-node1 ~]# pcs booth ticket cleanup
From a cluster node in each remaining cluster site of the Booth configuration:
Put the ticket to remove in
standbymode.[cluster2-node1 ~]# pcs booth ticket standby apacheticketDownload the Booth configuration file from a node with the updated configuration.
[cluster2-node1 ~]# pcs booth pull cluster1-node1Synchronize the Booth configuration to all nodes in the current cluster.
[cluster2-node1 ~]# pcs booth syncRestart the Booth resource in the current cluster.
[cluster2-node1 ~]# pcs booth restartRemove the ticket from the CIB in the current cluster.
[cluster2-node1 ~]# pcs booth ticket cleanup
From the arbitrator node, download the updated Booth configuration file from a node with the updated configuration:
[arbitrator-node ~]# pcs booth pull clusternode-node1
Verification
To check whether a Booth ticket was removed from the Booth configuration, run the
pcs booth configcommand on each cluster node and the arbitratror node.For example, after configuring a ticket named
apacheticketusing the procedure described in Configuring multi-site clusters with Pacemaker, the command displays the following output:[cluster1-node1 ~]# pcs booth config authfile = /etc/booth/booth.key/ site = 192.168.11.100 site = 192.168.22.100 arbitrator = 192.168.99.100 ticket = "apacheticket"After you remove the ticket from the Booth configuration, the command no longer displays
ticket= "apacheticket":[cluster1-node1 ~]# pcs booth config authfile = /etc/booth/booth.key site = 192.168.11.100 site = 192.168.22.100 arbitrator = 192.168.99.100To check whether a Booth ticket was removed from the CIB on a cluster node, use the
--query-xmloption of thecrm_ticketutility on any node in the cluster. For example, after you have configured a Booth ticket namedapacheticket, the utility displays the following output:[cluster1-node1 ~]# crm_ticket --query-xml State XML: <tickets> <ticket_state id="apacheticket" granted="true" booth-cfg-name="booth" owner="0" expires="1740986835" term="0" standby="false"/> </tickets>After you have removed the ticket from the CIB, the output no longer displays a
ticket_stateelement withid="apacheticket":[cluster1-node1 ~]# crm_ticket --query-xml State XML: <tickets/>
Chapter 30. Integrating non-corosync nodes into a cluster: the pacemaker_remote service Copy linkLink copied to clipboard!
The pacemaker_remote service integrates nodes that do not run corosync into the cluster. Pacemaker manages pacemaker_remote resources as it does on any other cluster node.
Among the capabilities that the pacemaker_remote service provides are the following:
-
The
pacemaker_remoteservice allows you to scale beyond the Red Hat support limit of 32 nodes. -
The
pacemaker_remoteservice allows you to manage a virtual environment as a cluster resource and also to manage individual services within the virtual environment as cluster resources.
The following terms are used to describe the pacemaker_remote service.
-
cluster node - A node running the High Availability services (
pacemakerandcorosync). -
remote node - A node running
pacemaker_remoteto remotely integrate into the cluster without requiringcorosynccluster membership. A remote node is configured as a cluster resource that uses theocf:pacemaker:remoteresource agent. -
guest node - A virtual guest node running the
pacemaker_remoteservice. The virtual guest resource is managed by the cluster; it is both started by the cluster and integrated into the cluster as a remote node. -
pacemaker_remote - A service daemon capable of performing remote application management within remote nodes and KVM guest nodes in a Pacemaker cluster environment. This service is an enhanced version of Pacemaker’s local executor daemon (
pacemaker-execd) that is capable of managing resources remotely on a node not running corosync.
A Pacemaker cluster running the pacemaker_remote service has the following characteristics.
-
Remote nodes and guest nodes run the
pacemaker_remoteservice (with very little configuration required on the virtual machine side). -
The cluster stack (
pacemakerandcorosync), running on the cluster nodes, connects to thepacemaker_remoteservice on the remote nodes, allowing them to integrate into the cluster. -
The cluster stack (
pacemakerandcorosync), running on the cluster nodes, launches the guest nodes and immediately connects to thepacemaker_remoteservice on the guest nodes, allowing them to integrate into the cluster.
The key difference between the cluster nodes and the remote and guest nodes that the cluster nodes manage is that the remote and guest nodes are not running the cluster stack. This means the remote and guest nodes have the following limitations:
- they do not take place in quorum
- they do not execute fencing device actions
- they are not eligible to be the cluster’s Designated Controller (DC)
-
they do not themselves run the full range of
pcscommands
On the other hand, remote nodes and guest nodes are not bound to the scalability limits associated with the cluster stack.
Other than these noted limitations, the remote and guest nodes behave just like cluster nodes in respect to resource management, and the remote and guest nodes can themselves be fenced. The cluster is fully capable of managing and monitoring resources on each remote and guest node: You can build constraints against them, put them in standby, or perform any other action you perform on cluster nodes with the pcs commands. Remote and guest nodes appear in cluster status output just as cluster nodes do.
30.1. Host and guest authentication of pacemaker_remote nodes Copy linkLink copied to clipboard!
Pacemaker supports two methods of securing the connection between Pacemaker nodes and pacemaker_remote nodes.
- Transport Layer Security (TLS) with pre-shared key (PSK) encryption and authentication over TCP
- TLS with SSL certificates. With this method, you can use existing certificates to secure the connection
TLS with PSK encryption
The connection between cluster nodes and pacemaker_remote nodes is secured using Transport Layer Security (TLS) with pre-shared key (PSK) encryption and authentication over TCP using port 3121 by default in the following situations:
-
When you configure a guest node with the
cluster node add-guestcommand -
When you configure a remote node with the
cluster node add-remotecommand
This means both the cluster node and the node running pacemaker_remote must share the same private key. By default this key must be placed at /etc/pacemaker/authkey on both cluster nodes and remote nodes.
The first time you run the pcs cluster node add-guest command or the pcs cluster node add-remote command, it creates the authkey and installs it on all existing nodes in the cluster. When you later create a new node of any type, the existing authkey is copied to the new node.
Configuring SSL/TLS certificates
You can encrypt Pacemaker Remote connections using X.509 (SSL/TLS) certificates. With this method, you can reuse existing host certificates for Pacemaker Remote connections rather than private shared keys.
To configure SSL/TLS certificates, create a remote connection with the pcs cluster node add-guest command or the pcs cluster node add-remote command. You can then convert the remote connection to use certificates.
Procedure
Create a remote connection with the
pcs cluster node add-guestcommand or thepcs cluster node add-remotecommand. This sets up theauthkeyfor guest nodes or remote nodes. The following example command creates a remote node and sets up theauthkeyfor that node.[root@clusternode1 ~] # pcs cluster node add-remote remote1Convert the connection you have created to use SSL/TLS certificates by updating the following variables in the
etc/sysconfig/pacemakerfile on all cluster nodes and Pacemaker remote nodes:-
PCMK_ca_file- The location of a file containing trusted Certificate Authorities, used to verify client or server certificates. This file must be in PEM format and it must allow read permissions to either thehaclusteruser or thehaclientgroup. -
PCMK_cert_file- The location of a file containing the signed certificate for the server side of the connection. This file must be in PEM format and it must allow read permissions to either thehaclusteruser or thehaclientgroup. -
PCMK_crl_file(optional) - The location of a Certificate Revocation List file, in PEM format. -
PCMK_key_file- The location of a file containing the private key for the matchingPCMK_cert_file, in PEM format. This file must be in PEM format and it must allow read permissions to either thehaclusteruser or thehaclientgroup.
-
-
Optionally, remove any
/etc/pacemaker/authkeyfiles from the cluster and remote nodes. Pacemaker uses certificates if certificates are configured, but removing theauthkeyfiles ensures that Pacemaker does not use PSK encryption if you have neglected to configure the certificates on a node.
30.2. Configuring KVM guest nodes Copy linkLink copied to clipboard!
A Pacemaker guest node is a virtual guest node running the pacemaker_remote service. The virtual guest node is managed by the cluster.
Guest node resource options
When configuring a virtual machine to act as a guest node, you create a VirtualDomain resource, which manages the virtual machine. For descriptions of the options you can set for a VirtualDomain resource, see the "Resource Options for Virtual Domain Resources" table in Virtual domain resource options.
In addition to the VirtualDomain resource options, metadata options define the resource as a guest node and define the connection parameters. You set these resource options with the pcs cluster node add-guest command. The following table describes these metadata options.
| Field | Default | Description |
|---|---|---|
|
| <none> | The name of the guest node this resource defines. This both enables the resource as a guest node and defines the unique name used to identify the guest node. WARNING: This value cannot overlap with any resource or node IDs. |
|
| 3121 |
Configures a custom port to use for the guest connection to |
|
|
The address provided in the | The IP address or host name to connect to |
|
| 60s | Amount of time before a pending guest connection will time out |
Integrating a virtual machine as a guest node
The following procedure is a high-level summary overview of the steps to perform to have Pacemaker launch a virtual machine and to integrate that machine as a guest node, using libvirt and KVM virtual guests.
Procedure
-
Configure the
VirtualDomainresources. Enter the following commands on every virtual machine to install
pacemaker_remotepackages, start thepcsdservice and enable it to run on startup, and allow TCP port 3121 through the firewall.# dnf install pacemaker-remote resource-agents pcs # systemctl start pcsd.service # systemctl enable pcsd.service # firewall-cmd --add-port 3121/tcp --permanent # firewall-cmd --add-port 2224/tcp --permanent # firewall-cmd --reload- Give each virtual machine a static network address and unique host name, which should be known to all nodes.
If you have not already done so, authenticate
pcsto the node you will be integrating as a quest node.# pcs host auth nodenameUse the following command to convert an existing
VirtualDomainresource into a guest node. This command must be run on a cluster node and not on the guest node which is being added. In addition to converting the resource, this command copies the/etc/pacemaker/authkeyto the guest node and starts and enables thepacemaker_remotedaemon on the guest node. The node name for the guest node, which you can define arbitrarily, can differ from the host name for the node.# pcs cluster node add-guest nodename resource_id [options]After creating the
VirtualDomainresource, you can treat the guest node just as you would treat any other node in the cluster. For example, you can create a resource and place a resource constraint on the resource to run on the guest node as in the following commands, which are run from a cluster node. You can include guest nodes in groups, which allows you to group a storage device, file system, and VM.# pcs resource create webserver apache configfile=/etc/httpd/conf/httpd.conf op monitor interval=30s # pcs constraint location webserver prefers nodename
30.3. Configuring Pacemaker remote nodes Copy linkLink copied to clipboard!
A remote node is defined as a cluster resource with ocf:pacemaker:remote as the resource agent. You create this resource with the pcs cluster node add-remote command.
The following table describes the resource options you can configure for a remote resource.
| Field | Default | Description |
|---|---|---|
|
| 0 | Time in seconds to wait before attempting to reconnect to a remote node after an active connection to the remote node has been severed. This wait is recurring. If reconnect fails after the wait period, a new reconnect attempt will be made after observing the wait time. When this option is in use, Pacemaker will keep attempting to reach out and connect to the remote node indefinitely after each wait interval. |
|
|
Address specified with | Server to connect to. This can be an IP address or host name. |
|
| TCP port to connect to. |
30.4. Remote node configuration overview Copy linkLink copied to clipboard!
Configure a Pacemaker Remote node and integrate it into the existing cluster.
Procedure
On the node that you will be configuring as a remote node, allow cluster-related services through the local firewall.
# firewall-cmd --permanent --add-service=high-availability success # firewall-cmd --reload successNoteIf you are using
iptablesdirectly, or some other firewall solution besidesfirewalld, simply open the following ports: TCP ports 2224 and 3121.Install the
pacemaker_remotedaemon on the remote node.# dnf install -y pacemaker-remote resource-agents pcsStart and enable
pcsdon the remote node.# systemctl start pcsd.service # systemctl enable pcsd.serviceIf you have not already done so, authenticate
pcsto the node you will be adding as a remote node.# pcs host auth remote1Add the remote node resource to the cluster with the following command. This command also syncs all relevant configuration files to the new node, starts the node, and configures it to start
pacemaker_remoteon boot. This command must be run on a cluster node and not on the remote node which is being added.# pcs cluster node add-remote remote1After adding the
remoteresource to the cluster, you can treat the remote node just as you would treat any other node in the cluster. For example, you can create a resource and place a resource constraint on the resource to run on the remote node as in the following commands, which are run from a cluster node.# pcs resource create webserver apache configfile=/etc/httpd/conf/httpd.conf op monitor interval=30s # pcs constraint location webserver prefers remote1WarningNever involve a remote node connection resource in a resource group, colocation constraint, or order constraint.
- Configure fencing resources for the remote node. Remote nodes are fenced the same way as cluster nodes. Configure fencing resources for use with remote nodes the same as you would with cluster nodes. Note, however, that remote nodes can never initiate a fencing action. Only cluster nodes are capable of actually executing a fencing operation against another node.
30.5. Changing the default port location Copy linkLink copied to clipboard!
If you need to change the default port location for either Pacemaker or pacemaker_remote, you can set the PCMK_remote_port environment variable that affects both of these daemons.
Procedure
Enable the environment variable by placing it in the
/etc/sysconfig/pacemakerfile:\#==#==# Pacemaker Remote ... # # Specify a custom port for Pacemaker Remote connections PCMK_remote_port=3121When changing the default port used by a particular guest node or remote node, the
PCMK_remote_portvariable must be set in that node’s/etc/sysconfig/pacemakerfile, and the cluster resource creating the guest node or remote node connection must also be configured with the same port number (using theremote-portmetadata option for guest nodes, or theportoption for remote nodes).
30.6. Upgrading systems with pacemaker_remote nodes Copy linkLink copied to clipboard!
Stopping the pacemaker_remote service on an active node triggers a graceful resource migration, facilitating maintenance without removing the node from the cluster. After shutdown, the cluster attempts to reconnect immediately. If the service does not restart before the monitor timeout expires, the cluster marks the resource as failed.
If you wish to avoid monitor failures when the pacemaker_remote service is stopped on an active Pacemaker Remote node, you can use the following procedure to take the node out of the cluster before performing any system administration that might stop pacemaker_remote.
Procedure
Stop the node’s connection resource with the
pcs resource disable resourcenamecommand, which will move all services off the node. The connection resource would be theocf:pacemaker:remoteresource for a remote node or, commonly, theocf:heartbeat:VirtualDomainresource for a guest node. For guest nodes, this command will also stop the VM, so the VM must be started outside the cluster (for example, usingvirsh) to perform any maintenance.pcs resource disable resourcename- Perform the required maintenance.
When ready to return the node to the cluster, re-enable the resource with the
pcs resource enablecommand.pcs resource enable resourcename
30.7. Verifying that Pacemaker remote nodes are no longer fenced unnecessarily Copy linkLink copied to clipboard!
On RHEL 10, the fence-remote-without-quorum cluster property is set to false by default. This ensures that a remote node is not fenced unnecessarily when its managing partition loses quorum. You can verify this behavior by temporarily changing the configuration to allow fencing.
Prerequisites
- A running RHEL High-Availability cluster with at least three cluster nodes and one remote node.
- A resource configured to run on the remote node.
- Fencing configured for all nodes.
Procedure
Configure the cluster to replicate the original behavior:
-
Set the
no-quorum-policyproperty tofreeze. Set the
fence-remote-without-quorumproperty totrue.NoteOn RHEL 10, the default value is
false. This step explicitly overrides the default to simulate the legacy behavior for verification purposes.[root@node1 ~]# pcs property set no-quorum-policy=freeze [root@node1 ~]# pcs property set fence-remote-without-quorum=true
-
Set the
Isolate one of the cluster nodes from the other nodes. In this example,
hvirt-325is the node that manages the remote nodehvirt-292:[root@hvirt-325 ~]# iptables -I INPUT -p udp --dport=5405 -j DROPObserve the cluster’s behavior.
- Check the logs on the isolated node. You’ll see that the node initiates fencing for the remote node.
- Check the fencing history for the cluster. You’ll see successful fencing actions for both the isolated cluster node and the remote node.
Rejoin the isolated node to the cluster and ensure all nodes and resources are online:
[root@hvirt-325 ~]# iptables -D INPUT -p udp --dport=5405 -j DROPRestore the default behavior. Set the
fence-remote-without-quorumproperty tofalse:[root@node1 ~]# pcs property set fence-remote-without-quorum=falseRepeat the network isolation test from step 2:
[root@hvirt-325 ~]# iptables -I INPUT -p udp --dport=5405 -j DROP
Verification
-
Check the cluster status and logs after isolating the node with
fence-remote-without-quorum=false. The logs on the isolated node now show that the remote node is not fenced.
Jul 14 11:55:30 hvirt-325 pacemaker-schedulerd[2934]: warning: Node hvirt-292 is unclean but cannot be fenced
Chapter 31. Performing cluster maintenance Copy linkLink copied to clipboard!
To perform maintenance on cluster nodes, you might need to stop or move the resources and services running on the cluster. In some cases, you can stop the cluster software without affecting services. Pacemaker provides several methods to support system maintenance.
31.1. Putting a node into standby mode Copy linkLink copied to clipboard!
When a cluster node is in standby mode, the node is no longer able to host resources. Any resources currently active on the node will be moved to another node.
The following command puts the specified node into standby mode. If you specify the --all option, this command puts all nodes into standby mode.
You can use this command when updating a resource’s packages. You can also use this command when testing a configuration, to simulate recovery without actually shutting down a node.
Procedure
Put the specified node into standby mode:
# pcs node standby node | --allRemove the specified node from standby mode. After running this command, the specified node is then able to host resources. If you specify the
--alloption, this command removes all nodes from standby mode:# pcs node unstandby node | --allNote that when you execute the
pcs node standbycommand, this prevents resources from running on the indicated node. When you execute thepcs node unstandbycommand, this allows resources to run on the indicated node. This does not necessarily move the resources back to the indicated node; where the resources can run at that point depends on how you have configured your resources initially.
31.2. Manually moving cluster resources Copy linkLink copied to clipboard!
You can override the cluster and force resources to move from their current location. There are two occasions when you would want to do this.
- When a node is under maintenance, and you need to move all resources running on that node to a different node
- When individually specified resources needs to be moved
To move all resources running on a node to a different node, you put the node in standby mode. For information about putting a cluster node in standby node, see Putting a node in standby mode.
You can move individually specified resources in either of the following ways.
-
You can use the
pcs resource movecommand to move a resource off a node on which it is currently running, as described in Moving a resource from its current node. -
You can use the
pcs resource relocate runcommand to move a resource to its preferred node, as determined by current cluster status, constraints, location of resources and other settings. For information about this command, see Moving a resource to its preferred node.
Moving a resource from its current node
To move a resource off the node on which it is currently running, use the following command, specifying the resource_id of the resource as defined. Specify the destination_node if you want to indicate on which node to run the resource that you are moving.
# pcs resource move resource_id [destination_node] [--promoted] [--strict] [--wait[=n]]
When you execute the pcs resource move command, this adds a constraint to the resource to prevent it from running on the node on which it is currently running. By default, the location constraint that the command creates is automatically removed once the resource has been moved. If removing the constraint would cause the resource to move back to the original node, as might happen if the resource-stickiness value for the resource is 0, the pcs resource move command fails. If you would like to move a resource and leave the resulting constraint in place, use the pcs resource move-with-constraint command.
-
If you specify the
--promotedparameter of thepcs resource movecommand, the constraint applies only to promoted instances of the resource. -
If you specify the
--strictparameter of thepcs resource movecommand, the command will fail if other resources than the one specified in the command would be affected. You can optionally configure a
--wait[=n]parameter for thepcs resource movecommand to indicate the number of seconds to wait for the resource to start on the destination node before returning 0 if the resource is started or 1 if the resource has not yet started. If you do not specify n, it defaults to a value of 60 minutes.For more information about location constraints, see Determining which nodes a resource can run on.
Moving a resource to its preferred node
After a resource has moved, either due to a failover or to an administrator manually moving the node, it will not necessarily move back to its original node even after the circumstances that caused the failover have been corrected. To relocate resources to their preferred node, use the following command. A preferred node is determined by the current cluster status, constraints, resource location, and other settings and may change over time.
# pcs resource relocate run [resource1] [resource2] ...
If you do not specify any resources, all resource are relocated to their preferred nodes.
This command calculates the preferred node for each resource while ignoring resource stickiness. After calculating the preferred node, it creates location constraints which will cause the resources to move to their preferred nodes. Once the resources have been moved, the constraints are deleted automatically. To remove all constraints created by the pcs resource relocate run command, you can enter the pcs resource relocate clear command. To display the current status of resources and their optimal node ignoring resource stickiness, enter the pcs resource relocate show command.
31.3. Disabling, enabling, and banning cluster resources Copy linkLink copied to clipboard!
In addition to the pcs resource move and pcs resource relocate commands, there are a variety of other commands you can use to control the behavior of cluster resources.
31.3.1. Disabling a cluster resource Copy linkLink copied to clipboard!
Stop a resource and prevent the cluster from restarting it. Constraints or failures may keep the resource active. Use --wait=n to pause until the resource stops (returns 0) or the timeout expires (returns 1). The default timeout is 60 minutes.
Simulating disabling a resource
Ensuring that disabling a resource would not have an effect on other resources can be impossible to do by hand when complex resource relations are set up. To determine what effect disabling a resource will have on other resources, use the pcs resource disable --simulate command to show the effects of disabling a resource while not changing the cluster configuration.
Safely disabling resources
You can specify that a resource be disabled only if disabling the resource would not have an effect on other resources.
-
The
pcs resource disable --safecommand disables a resource only if no other resources would be affected in any way, such as being migrated from one node to another. Thepcs resource safe-disablecommand is an alias for thepcs resource disable --safecommand. -
The
pcs resource disable --safe --no-strictcommand disables a resource only if no other resources would be stopped or demoted.
Determining the resource IDs of affected resources
The error report that the pcs resource disable --safe command generates if the safe disable operation fails contains the affected resource IDs. If you need to know only the resource IDs of resources that would be affected by disabling a resource, use the --brief option for the pcs resource disable --safe command, which does not provide the full simulation result and prints errors only.
Procedure
Stop a running resource and prevent the cluster from starting it again:
# pcs resource disable resource_id [--wait[=n]]
31.3.2. Enabling a cluster resource Copy linkLink copied to clipboard!
Enable a resource to allow the cluster to start it. Depending on the configuration, the resource might remain stopped. Use --wait=n to pause until the resource starts (returns 0) or the timeout expires (returns 1). The default timeout is 60 minutes.
Procedure
Use the following command to allow the cluster to start a resource:
pcs resource enable resource_id [--wait[=n]]
31.3.3. Preventing a resource from running on a particular node Copy linkLink copied to clipboard!
You can prevent a resource from running on a specified node, or on the current node if no node is specified.
Procedure
Prevent a resource from running on a specified node, or on the current node if no node is specified:
# pcs resource ban resource_id [node] [--promoted] [lifetime=lifetime] [--wait[=n]]NoteWhen you execute the
pcs resource bancommand, this adds a -INFINITY location constraint to the resource to prevent it from running on the indicated node. You can execute thepcs resource clearor thepcs constraint deletecommand to remove the constraint. This does not necessarily move the resources back to the indicated node; where the resources can run at that point depends on how you have configured your resources initially. For information about resource constraints, see Determining which nodes a resource can run on.-
If you specify the
--promotedparameter of thepcs resource bancommand, the scope of the constraint is limited to the promoted role and you must specify promotable_id rather than resource_id. -
You can optionally configure a
lifetimeparameter for thepcs resource bancommand to indicate a period of time the constraint should remain. -
You can optionally configure a
--wait[=n]parameter for thepcs resource bancommand to indicate the number of seconds to wait for the resource to start on the destination node before returning 0 if the resource is started or 1 if the resource has not yet started. If you do not specify n, the default resource timeout is used.
31.3.4. Forcing a resource to start on the current node Copy linkLink copied to clipboard!
Use pcs resource debug-start to force a resource to start on the current node for debugging. This command prints the output and ignores cluster recommendations. Do not use this for normal operations; Pacemaker manages starting cluster resources.
Procedure
Use the
debug-startcommand to force a specified resource to start on the current node:# pcs resource debug-start resource_id
31.4. Setting a resource to unmanaged mode Copy linkLink copied to clipboard!
When a resource is in unmanaged mode, the resource is still in the configuration but Pacemaker does not manage the resource.
Procedure
Set the indicated resources to
unmanagedmode:# pcs resource unmanage resource1 [resource2] ...Set resources to
managedmode, which is the default state:# pcs resource manage resource1 [resource2] ...You can specify the name of a resource group with the
pcs resource manageorpcs resource unmanagecommand. The command will act on all of the resources in the group, so that you can set all of the resources in a group tomanagedorunmanagedmode with a single command and then manage the contained resources individually.
31.5. Putting a cluster in maintenance mode Copy linkLink copied to clipboard!
When a cluster is in maintenance mode, the cluster does not start or stop any services until told otherwise. When maintenance mode is completed, the cluster does a sanity check of the current state of any services, and then stops or starts any that need it.
To put a cluster in maintenance mode, use the following command to set the maintenance-mode cluster property to true.
# pcs property set maintenance-mode=true
To remove a cluster from maintenance mode, use the following command to set the maintenance-mode cluster property to false.
# pcs property set maintenance-mode=false
For general information on setting and removing cluster properties, see Setting and removing cluster properties.
31.6. Updating a RHEL high availability cluster Copy linkLink copied to clipboard!
Updating packages that make up the RHEL High Availability Add-On, either individually or as a whole, can be done in one of two general ways:
- Rolling Updates: Remove one node at a time from service, update its software, then integrate it back into the cluster. This allows the cluster to continue providing service and managing resources while each node is updated.
- Entire Cluster Update: Stop the entire cluster, apply updates to all nodes, then start the cluster back up.
It is critical that when performing software update procedures for Red Hat Enterprise Linux High Availability clusters, you ensure that any node that will undergo updates is not an active member of the cluster before those updates are initiated.
For a full description of each of these methods and the procedures to follow for the updates, see the Red Hat Knowledgebase article Recommended Practices for Applying Software Updates to a RHEL High Availability or Resilient Storage Cluster.
31.7. Upgrading remote nodes and guest nodes Copy linkLink copied to clipboard!
Stopping the pacemaker_remote service on an active node triggers a graceful resource migration, enabling seamless maintenance. However, the cluster attempts to reconnect immediately. If the service does not restart within the monitor timeout, the cluster detects a failure.
To avoid monitor failures when the pacemaker_remote service is stopped on an active Pacemaker Remote node, use the following procedure to take the node out of the cluster before performing any system administration that might stop pacemaker_remote.
Procedure
Stop the node’s connection resource with the
pcs resource disable resourcenamecommand, which will move all services off the node. The connection resource would be theocf:pacemaker:remoteresource for a remote node or, commonly, theocf:heartbeat:VirtualDomainresource for a guest node. For guest nodes, this command will also stop the VM, so the VM must be started outside the cluster (for example, usingvirsh) to perform any maintenance.pcs resource disable resourcename- Perform the required maintenance.
When ready to return the node to the cluster, re-enable the resource with the
pcs resource enablecommand.pcs resource enable resourcename
31.8. Migrating VMs in a RHEL cluster Copy linkLink copied to clipboard!
Red Hat does not support live migration of active cluster nodes. To migrate a VM, stop the cluster services to remove the node from operation, migrate the VM, and then restart the services. For details, see Support Policies for RHEL High Availability Clusters - General Conditions with Virtualized Cluster Members.
The following steps outline the procedure for removing a VM from a cluster, migrating the VM, and restoring the VM to the cluster.
This procedure applies to VMs that are used as full cluster nodes, not to VMs managed as cluster resources (including VMs used as guest nodes) which can be live-migrated without special precautions. For general information about the fuller procedure required for updating packages that make up the RHEL High Availability and Resilient Storage Add-Ons, either individually or as a whole, see the Red Hat Knowledgebase article Recommended Practices for Applying Software Updates to a RHEL High Availability or Resilient Storage Cluster.
Before performing this procedure, consider the effect on cluster quorum of removing a cluster node. For example, if you have a three-node cluster and you remove one node, your cluster cannot withstand any node failure. This is because if one node of a three-node cluster is already down, removing a second node will lose quorum.
Procedure
- If any preparations need to be made before stopping or moving the resources or software running on the VM to migrate, perform those steps.
Run the following command on the VM to stop the cluster software on the VM.
# pcs cluster stop- Perform the live migration of the VM.
Start cluster services on the VM.
# pcs cluster start
31.9. Identifying clusters by UUID Copy linkLink copied to clipboard!
When you create a cluster it has an associated UUID. Since a cluster name is not a unique cluster identifier, a third-party tool such as a configuration management database that manages multiple clusters with the same name can uniquely identify a cluster by means of its UUID. You can display the current cluster UUID with the pcs cluster config [show] command, which includes the cluster UUID in its output.
Procedure
Add a UUID to an existing cluster:
# pcs cluster config uuid generateRegenerate a UUID for a cluster with an existing UUID:
# pcs cluster config uuid generate --force
31.10. Renaming a cluster Copy linkLink copied to clipboard!
You can change the name of an existing cluster using the pcs cluster rename command.
Procedure
To rename your cluster, run the
pcs cluster renamecommand from any cluster node. Replace<new-name>with the new name you want to assign to the cluster:# pcs cluster rename <new-name>
Chapter 32. Removing the cluster configuration Copy linkLink copied to clipboard!
Permanently remove all Pacemaker and Corosync cluster configuration from a system when a cluster is no longer needed or must be rebuilt from scratch.
Prerequisites
-
You have root access or a non-root user account with
sudoprivileges. -
The
pcscommand-line tool is installed.
Removing the cluster configuration is irreversible. After completion, the node is no longer part of any cluster and cannot be recovered without reconfiguration.
Procedure
Optional: Stop the cluster cleanly on the node.
# pcs cluster stopStopping the cluster reduces the risk of incomplete cleanup.
Remove all cluster configuration and disable cluster services.
# pcs cluster destroy
Verification
Verify that cluster services are no longer running.
# pcs statusIf the cluster is removed successfully, the command reports that no cluster is configured.
Optional: Confirm that cluster services are stopped.
# systemctl status pacemaker corosyncBoth services should be inactive or not found.
Chapter 33. Configuring disaster recovery clusters Copy linkLink copied to clipboard!
To provide disaster recovery for a high availability cluster, configure two clusters: a primary site cluster and a disaster recovery cluster.
In normal circumstances, the primary cluster is running resources in production mode. The disaster recovery cluster has all the resources configured as well and is either running them in demoted mode or not at all. For example, there may be a database running in the primary cluster in promoted mode and running in the disaster recovery cluster in demoted mode. The database in this setup would be configured so that data is synchronized from the primary to disaster recovery site. This is done through the database configuration itself rather than through the pcs command interface.
When the primary cluster goes down, users can use the pcs command interface to manually fail the resources over to the disaster recovery site. They can then log in to the disaster site and promote and start the resources there. Once the primary cluster has recovered, users can use the pcs command interface to manually move resources back to the primary site.
You can use the pcs command to display the status of both the primary and the disaster recovery site cluster from a single node on either site.
33.1. Considerations for disaster recovery clusters Copy linkLink copied to clipboard!
When planning and configuring a disaster recovery site that you will manage and monitor with the pcs command interface, note the following considerations.
- The disaster recovery site must be a cluster. This makes it possible to configure it with same tools and similar procedures as the primary site.
-
The primary and disaster recovery clusters are created by independent
pcs cluster setupcommands. - The clusters and their resources must be configured so that that the data is synchronized and failover is possible.
- The cluster nodes in the recovery site cannot have the same names as the nodes in the primary site.
-
The pcs user
haclustermust be authenticated for each node in both clusters on the node from which you will be runningpcscommands.
33.2. Displaying status of recovery clusters Copy linkLink copied to clipboard!
You can configure a primary and a disaster recovery cluster so that you can display the status of both clusters.
Setting up a disaster recovery cluster does not automatically configure resources or replicate data. Those items must be configured manually by the user.
In this example:
-
The primary cluster will be named
PrimarySiteand will consist of the nodesz1.example.com. andz2.example.com. -
The disaster recovery site cluster will be named
DRsiteand will consist of the nodesz3.example.comandz4.example.com.
This example sets up a basic cluster with no resources or fencing configured.
Procedure
Authenticate all of the nodes that will be used for both clusters.
[root@z1 ~]# pcs host auth z1.example.com z2.example.com z3.example.com z4.example.com -u hacluster -p password z1.example.com: Authorized z2.example.com: Authorized z3.example.com: Authorized z4.example.com: AuthorizedCreate the cluster that will be used as the primary cluster and start cluster services for the cluster.
[root@z1 ~]# pcs cluster setup PrimarySite z1.example.com z2.example.com --start {...} Cluster has been successfully set up. Starting cluster on hosts: 'z1.example.com', 'z2.example.com'...Create the cluster that will be used as the disaster recovery cluster and start cluster services for the cluster.
[root@z1 ~]# pcs cluster setup DRSite z3.example.com z4.example.com --start {...} Cluster has been successfully set up. Starting cluster on hosts: 'z3.example.com', 'z4.example.com'...From a node in the primary cluster, set up the second cluster as the recovery site. The recovery site is defined by a name of one of its nodes.
[root@z1 ~]# pcs dr set-recovery-site z3.example.com Sending 'disaster-recovery config' to 'z3.example.com', 'z4.example.com' z3.example.com: successful distribution of the file 'disaster-recovery config' z4.example.com: successful distribution of the file 'disaster-recovery config' Sending 'disaster-recovery config' to 'z1.example.com', 'z2.example.com' z1.example.com: successful distribution of the file 'disaster-recovery config' z2.example.com: successful distribution of the file 'disaster-recovery config'Check the disaster recovery configuration.
[root@z1 ~]# pcs dr config Local site: Role: Primary Remote site: Role: Recovery Nodes: z3.example.com z4.example.comCheck the status of the primary cluster and the disaster recovery cluster from a node in the primary cluster.
[root@z1 ~]# pcs dr status --- Local cluster - Primary site --- Cluster name: PrimarySite WARNINGS: No stonith devices and stonith-enabled is not false Cluster Summary: * Stack: corosync * Current DC: z2.example.com (version 2.0.3-2.el8-2c9cea563e) - partition with quorum * Last updated: Mon Dec 9 04:10:31 2019 * Last change: Mon Dec 9 04:06:10 2019 by hacluster via crmd on z2.example.com * 2 nodes configured * 0 resource instances configured Node List: * Online: [ z1.example.com z2.example.com ] Full List of Resources: * No resources Daemon Status: corosync: active/disabled pacemaker: active/disabled pcsd: active/enabled --- Remote cluster - Recovery site --- Cluster name: DRSite WARNINGS: No stonith devices and stonith-enabled is not false Cluster Summary: * Stack: corosync * Current DC: z4.example.com (version 2.0.3-2.el8-2c9cea563e) - partition with quorum * Last updated: Mon Dec 9 04:10:34 2019 * Last change: Mon Dec 9 04:09:55 2019 by hacluster via crmd on z4.example.com * 2 nodes configured * 0 resource instances configured Node List: * Online: [ z3.example.com z4.example.com ] Full List of Resources: * No resources Daemon Status: corosync: active/disabled pacemaker: active/disabled pcsd: active/enabledFor additional display options for a disaster recovery configuration, see the help screen for the
pcs drcommand.
Appendix A. Interpreting resource agent OCF return codes Copy linkLink copied to clipboard!
Pacemaker resource agents conform to the Open Cluster Framework (OCF) Resource Agent API.
The first thing the cluster does when an agent returns a code is to check the return code against the expected result. If the result does not match the expected value, then the operation is considered to have failed, and recovery action is initiated.
For any invocation, resource agents must exit with a defined return code that informs the caller of the outcome of the invoked action.
There are three types of failure recovery, as described in the following table.
| Type | Description | Action Taken by the Cluster |
|---|---|---|
| soft | A transient error occurred. | Restart the resource or move it to a new location . |
| hard | A non-transient error that may be specific to the current node occurred. | Move the resource elsewhere and prevent it from being retried on the current node. |
| fatal | A non-transient error that will be common to all cluster nodes occurred (for example, a bad configuration was specified). | Stop the resource and prevent it from being started on any cluster node. |
The following table provides The OCF return codes and the type of recovery the cluster will initiate when a failure code is received.Note that even actions that return 0 (OCF alias OCF_SUCCESS) can be considered to have failed, if 0 was not the expected return value.
| Return Code | OCF Label | Description |
|---|---|---|
| 0 |
| * The action completed successfully. This is the expected return code for any successful start, stop, promote, and demote command. * Type if unexpected: soft |
| 1 |
| * The action returned a generic error. * Type: soft * The resource manager will attempt to recover the resource or move it to a new location. |
| 2 |
| * The resource’s configuration is not valid on this machine. For example, it refers to a location not found on the node. * Type: hard * The resource manager will move the resource elsewhere and prevent it from being retried on the current node |
| 3 |
| * The requested action is not implemented. * Type: hard |
| 4 |
| * The resource agent does not have sufficient privileges to complete the task. This may be due, for example, to the agent not being able to open a certain file, to listen on a specific socket, or to write to a directory. * Type: hard * Unless specifically configured otherwise, the resource manager will attempt to recover a resource which failed with this error by restarting the resource on a different node (where the permission problem may not exist). |
| 5 |
| * A required component is missing on the node where the action was executed. This may be due to a required binary not being executable, or a vital configuration file being unreadable. * Type: hard * Unless specifically configured otherwise, the resource manager will attempt to recover a resource which failed with this error by restarting the resource on a different node (where the required files or binaries may be present). |
| 6 |
| * The resource’s configuration on the local node is invalid. * Type: fatal * When this code is returned, Pacemaker will prevent the resource from running on any node in the cluster, even if the service configuraiton is valid on some other node. |
| 7 |
| * The resource is safely stopped. This implies that the resource has either gracefully shut down, or has never been started. * Type if unexpected: soft * The cluster will not attempt to stop a resource that returns this for any action. |
| 8 |
| * The resource is running in promoted role. * Type if unexpected: soft |
| 9 |
| * The resource is (or might be) in promoted role but has failed. * Type: soft * The resource will be demoted, stopped and then started (and possibly promoted) again. |
| 190 | * The service is found to be properly active, but in such a condition that future failures are more likely. | |
| 191 | * The resource agent supports roles and the service is found to be properly active in the promoted role, but in such a condition that future failures are more likely. | |
| other | N/A | Custom error code. |