Chapter 30. Multi-site Pacemaker clusters
When a cluster spans more than one site, issues with network connectivity between the sites can lead to split-brain situations. When connectivity drops, there is no way for a node on one site to determine whether a node on another site has failed or is still functioning with a failed site interlink. In addition, it can be problematic to provide high availability services across two sites which are too far apart to keep synchronous. To address these issues, Pacemaker provides full support for the ability to configure high availability clusters that span multiple sites through the use of a Booth cluster ticket manager.
30.1. Overview of Booth cluster ticket manager
The Booth ticket manager is a distributed service that is meant to be run on a different physical network than the networks that connect the cluster nodes at particular sites. It yields another, loose cluster, a Booth formation, that sits on top of the regular clusters at the sites. This aggregated communication layer facilitates consensus-based decision processes for individual Booth tickets.
A Booth ticket is a singleton in the Booth formation and represents a time-sensitive, movable unit of authorization. Resources can be configured to require a certain ticket to run. This can ensure that resources are run at only one site at a time, for which a ticket or tickets have been granted.
You can think of a Booth formation as an overlay cluster consisting of clusters running at different sites, where all the original clusters are independent of each other. It is the Booth service which communicates to the clusters whether they have been granted a ticket, and it is Pacemaker that determines whether to run resources in a cluster based on a Pacemaker ticket constraint. This means that when using the ticket manager, each of the clusters can run its own resources as well as shared resources. For example there can be resources A, B and C running only in one cluster, resources D, E, and F running only in the other cluster, and resources G and H running in either of the two clusters as determined by a ticket. It is also possible to have an additional resource J that could run in either of the two clusters as determined by a separate ticket.
30.2. Configuring multi-site clusters with Pacemaker
You can configure a multi-site configuration that uses the Booth ticket manager with the following procedure.
These example commands use the following arrangement:
-
Cluster 1 consists of the nodes
cluster1-node1
andcluster1-node2
- Cluster 1 has a floating IP address assigned to it of 192.168.11.100
-
Cluster 2 consists of
cluster2-node1
andcluster2-node2
- Cluster 2 has a floating IP address assigned to it of 192.168.22.100
-
The arbitrator node is
arbitrator-node
with an ip address of 192.168.99.100 -
The name of the Booth ticket that this configuration uses is
apacheticket
These example commands assume that the cluster resources for an Apache service have been configured as part of the resource group apachegroup
for each cluster. It is not required that the resources and resource groups be the same on each cluster to configure a ticket constraint for those resources, since the Pacemaker instance for each cluster is independent, but that is a common failover scenario.
Note that at any time in the configuration procedure you can enter the pcs booth config
command to display the booth configuration for the current node or cluster or the pcs booth status
command to display the current status of booth on the local node.
Procedure
Install the
booth-site
Booth ticket manager package on each node of both clusters.Copy to Clipboard Copied! Toggle word wrap Toggle overflow dnf install -y booth-site dnf install -y booth-site dnf install -y booth-site dnf install -y booth-site
[root@cluster1-node1 ~]# dnf install -y booth-site [root@cluster1-node2 ~]# dnf install -y booth-site [root@cluster2-node1 ~]# dnf install -y booth-site [root@cluster2-node2 ~]# dnf install -y booth-site
Install the
pcs
,booth-core
, andbooth-arbitrator
packages on the arbitrator node.Copy to Clipboard Copied! Toggle word wrap Toggle overflow dnf install -y pcs booth-core booth-arbitrator
[root@arbitrator-node ~]# dnf install -y pcs booth-core booth-arbitrator
If you are running the
firewalld
daemon, execute the following commands on all nodes in both clusters as well as on the arbitrator node to enable the ports that are required by the Red Hat High Availability Add-On.Copy to Clipboard Copied! Toggle word wrap Toggle overflow firewall-cmd --permanent --add-service=high-availability firewall-cmd --add-service=high-availability
# firewall-cmd --permanent --add-service=high-availability # firewall-cmd --add-service=high-availability
You may need to modify which ports are open to suit local conditions. For more information about the ports that are required by the Red Hat High-Availability Add-On, see Enabling ports for the High Availability Add-On.
Create a Booth configuration on one node of one cluster. The addresses you specify for each cluster and for the arbitrator must be IP addresses. For each cluster, you specify a floating IP address.
Copy to Clipboard Copied! Toggle word wrap Toggle overflow [cluster1-node1 ~] # pcs booth setup sites 192.168.11.100 192.168.22.100 arbitrators 192.168.99.100
[cluster1-node1 ~] # pcs booth setup sites 192.168.11.100 192.168.22.100 arbitrators 192.168.99.100
This command creates the configuration files
/etc/booth/booth.conf
and/etc/booth/booth.key
on the node from which it is run.Create a ticket for the Booth configuration. This is the ticket that you will use to define the resource constraint that will allow resources to run only when this ticket has been granted to the cluster.
This basic failover configuration procedure uses only one ticket, but you can create additional tickets for more complicated scenarios where each ticket is associated with a different resource or resources.
Copy to Clipboard Copied! Toggle word wrap Toggle overflow [cluster1-node1 ~] # pcs booth ticket add apacheticket
[cluster1-node1 ~] # pcs booth ticket add apacheticket
Synchronize the Booth configuration to all nodes in the current cluster.
Copy to Clipboard Copied! Toggle word wrap Toggle overflow [cluster1-node1 ~] # pcs booth sync
[cluster1-node1 ~] # pcs booth sync
From the arbitrator node, pull the Booth configuration to the arbitrator. If you have not previously done so, you must first authenticate
pcs
to the node from which you are pulling the configuration.Copy to Clipboard Copied! Toggle word wrap Toggle overflow [arbitrator-node ~] # pcs host auth cluster1-node1 [arbitrator-node ~] # pcs booth pull cluster1-node1
[arbitrator-node ~] # pcs host auth cluster1-node1 [arbitrator-node ~] # pcs booth pull cluster1-node1
Pull the Booth configuration to the other cluster and synchronize to all the nodes of that cluster. As with the arbitrator node, if you have not previously done so, you must first authenticate
pcs
to the node from which you are pulling the configuration.Copy to Clipboard Copied! Toggle word wrap Toggle overflow [cluster2-node1 ~] # pcs host auth cluster1-node1 [cluster2-node1 ~] # pcs booth pull cluster1-node1 [cluster2-node1 ~] # pcs booth sync
[cluster2-node1 ~] # pcs host auth cluster1-node1 [cluster2-node1 ~] # pcs booth pull cluster1-node1 [cluster2-node1 ~] # pcs booth sync
Start and enable Booth on the arbitrator.
NoteYou must not manually start or enable Booth on any of the nodes of the clusters since Booth runs as a Pacemaker resource in those clusters.
Copy to Clipboard Copied! Toggle word wrap Toggle overflow [arbitrator-node ~] # pcs booth start [arbitrator-node ~] # pcs booth enable
[arbitrator-node ~] # pcs booth start [arbitrator-node ~] # pcs booth enable
Configure Booth to run as a cluster resource on both cluster sites, using the floating IP addresses assigned to each cluster. This creates a resource group with
booth-ip
andbooth-service
as members of that group.Copy to Clipboard Copied! Toggle word wrap Toggle overflow [cluster1-node1 ~] # pcs booth create ip 192.168.11.100 [cluster2-node1 ~] # pcs booth create ip 192.168.22.100
[cluster1-node1 ~] # pcs booth create ip 192.168.11.100 [cluster2-node1 ~] # pcs booth create ip 192.168.22.100
Add a ticket constraint to the resource group you have defined for each cluster.
Copy to Clipboard Copied! Toggle word wrap Toggle overflow [cluster1-node1 ~] # pcs constraint ticket add apacheticket apachegroup [cluster2-node1 ~] # pcs constraint ticket add apacheticket apachegroup
[cluster1-node1 ~] # pcs constraint ticket add apacheticket apachegroup [cluster2-node1 ~] # pcs constraint ticket add apacheticket apachegroup
You can enter the following command to display the currently configured ticket constraints.
Copy to Clipboard Copied! Toggle word wrap Toggle overflow pcs constraint ticket [config]
pcs constraint ticket [config]
Grant the ticket you created for this setup to the first cluster.
Note that it is not necessary to have defined ticket constraints before granting a ticket. Once you have initially granted a ticket to a cluster, then Booth takes over ticket management unless you override this manually with the
pcs booth ticket revoke
command. For information about thepcs booth
administration commands, use thepcs boot --help
command.Copy to Clipboard Copied! Toggle word wrap Toggle overflow [cluster1-node1 ~] # pcs booth ticket grant apacheticket
[cluster1-node1 ~] # pcs booth ticket grant apacheticket
It is possible to add or remove tickets at any time, even after completing this procedure. After adding or removing a ticket, however, you must synchronize the configuration files to the other nodes and clusters as well as to the arbitrator and grant the ticket as is shown in this procedure.
For a full procedure to remove a Booth ticket, see Removing a Booth ticket. For information about additional Booth administration commands, use the pcs booth --help
command.
30.3. Removing a Booth ticket
After you remove a Booth cluster ticket by using the pcs booth ticket remove
command, the state of the Booth ticket remains loaded in the Cluster Information Base (CIB). This is also the case after you remove a ticket from the Booth configuration on one site and pull the Booth configuration to another site by using the pcs booth pull
command. This might cause problems when you configure a ticket constraint, because a ticket constraint can be granted even after a ticket has been removed. As a consequence, the cluster might freeze or fence a node. In RHEL 9.6 and later, you can prevent this by removing a Booth ticket from the CIB with the pcs booth ticket cleanup
command.
Prerequisites
- You have set up a multi-site configuration that uses the Booth ticket manager. For instructions, see Configuring multi-site clusters with Pacemaker.
The configured example uses the following arrangement:
-
Cluster 1 consists of the nodes
cluster1-node1
andcluster1-node2
. -
Cluster 2 consists of the nodes
cluster2-node1
andcluster2-node2
. -
The arbitrator node is named
arbitrator-node
. -
The name of the Booth ticket that this configuration uses is
apacheticket
.
-
Cluster 1 consists of the nodes
Procedure
To remove a Booth ticket, perform the following steps.
From a cluster node in one cluster site of the Booth configuration:
Put the ticket to remove in
standby
mode. The ticket that this example uses is namedapacheticket
.Copy to Clipboard Copied! Toggle word wrap Toggle overflow pcs booth ticket standby apacheticket
[cluster1-node1 ~]# pcs booth ticket standby apacheticket
Remove the ticket from the Booth configuration.
Copy to Clipboard Copied! Toggle word wrap Toggle overflow pcs booth ticket remove apacheticket
[cluster1-node1 ~]# pcs booth ticket remove apacheticket
Synchronize the Booth configuration to all nodes in the current cluster.
Copy to Clipboard Copied! Toggle word wrap Toggle overflow pcs booth sync
[cluster1-node1 ~]# pcs booth sync
Restart the Booth resource in the current cluster.
Copy to Clipboard Copied! Toggle word wrap Toggle overflow pcs booth restart
[cluster1-node1 ~]# pcs booth restart
Remove the ticket from the CIB in the current cluster.
Copy to Clipboard Copied! Toggle word wrap Toggle overflow pcs booth ticket cleanup apacheticket
[cluster1-node1 ~]# pcs booth ticket cleanup apacheticket
From a cluster node in each remaining cluster site of the Booth configuration:
Put the ticket to remove in
standby
mode.Copy to Clipboard Copied! Toggle word wrap Toggle overflow pcs booth ticket standby apacheticket
[cluster2-node1 ~]# pcs booth ticket standby apacheticket
Download the Booth configuration file from a node with the updated configuration.
Copy to Clipboard Copied! Toggle word wrap Toggle overflow pcs booth pull cluster1-node1
[cluster2-node1 ~]# pcs booth pull cluster1-node1
Synchronize the Booth configuration to all nodes in the current cluster.
Copy to Clipboard Copied! Toggle word wrap Toggle overflow pcs booth sync
[cluster2-node1 ~]# pcs booth sync
Restart the Booth resource in the current cluster.
Copy to Clipboard Copied! Toggle word wrap Toggle overflow pcs booth restart
[cluster2-node1 ~]# pcs booth restart
Remove the ticket from the CIB in the current cluster.
Copy to Clipboard Copied! Toggle word wrap Toggle overflow pcs booth ticket cleanup apacheticket
[cluster2-node1 ~]# pcs booth ticket cleanup apacheticket
From the arbitrator node, download the updated Booth configuration file from a node with the updated configuration.
Copy to Clipboard Copied! Toggle word wrap Toggle overflow pcs booth pull clusternode-node1
[arbitrator-node ~]# pcs booth pull clusternode-node1
Verification
To check whether a Booth ticket was removed from the Booth configuration, run the
pcs booth config
command on each cluster node and the arbitratror node.For example, after configuring a ticket named
apacheticket
using the procedure described in Configuring multi-site clusters with Pacemaker, the command displays the following output:Copy to Clipboard Copied! Toggle word wrap Toggle overflow pcs booth config
[cluster1-node1 ~]# pcs booth config authfile = /etc/booth/booth.key/ site = 192.168.11.100 site = 192.168.22.100 arbitrator = 192.168.99.100 ticket = "apacheticket"
After you remove the ticket from the Booth configuration, the command no longer displays
ticket= "apacheticket"
:Copy to Clipboard Copied! Toggle word wrap Toggle overflow pcs booth config
[cluster1-node1 ~]# pcs booth config authfile = /etc/booth/booth.key site = 192.168.11.100 site = 192.168.22.100 arbitrator = 192.168.99.100
To check whether a Booth ticket was removed from the CIB on a cluster node, use the
--query-xml
option of thecrm_ticket
utility on any node in the cluster. For example, after you have configured a Booth ticket namedapacheticket
, the utility displays the following output:Copy to Clipboard Copied! Toggle word wrap Toggle overflow crm_ticket --query-xml
[cluster1-node1 ~]# crm_ticket --query-xml State XML: <tickets> <ticket_state id="apacheticket" granted="true" booth-cfg-name="booth" owner="0" expires="1740986835" term="0" standby="false"/> </tickets>
After you have removed the ticket from the CIB, the output no longer displays a
ticket_state
element withid="apacheticket"
:Copy to Clipboard Copied! Toggle word wrap Toggle overflow crm_ticket --query-xml
[cluster1-node1 ~]# crm_ticket --query-xml State XML: <tickets/>